Getting Started With CDC Replication from MongoDB
Notebook
SingleStore's native data replication gives you the ability to do one-time snapshot, and continuous change data capture CDC from MongoDB® to SingleStoreDB. This provides a quick and easy way to replicate data and power up analytics on MongoDB® data.
What you will learn in this notebook:
Setup replication of a collection to SingleStore and see the live updates on MongoDB® collection replicate to SingleStore.
Install libraries and import modules
In [1]:
!pip3 install pymongo --quietimport pymongoimport random
Replicate a collection to Singlestore
In [2]:
%%sqlDROP DATABASE IF EXISTS cdcdemo;CREATE DATABASE cdcdemo;
In [3]:
source_mongo_url = "mongodb+srv://mongo_sample_reader:SingleStoreRocks27017@cluster1.tfutgo0.mongodb.net/?retryWrites=true&w=majority"
Create a link to Source MongoDB
In [4]:
s2client = pymongo.MongoClient(connection_url_kai) #Initiatizing client for Kais2db = s2client["cdcdemo"]res = s2db.command("createLink", "mongolink",uri=source_mongo_url)print(res, res["ok"])if res["ok"] != 1:raise Exception("Failed to create link: %s" % "local")
Specify the source database and collection and start replication
In [5]:
create_col_args = {"from": {"link": "mongolink", "database": "cdcdemo", "collection": "scores"}}res = s2db.create_collection("scores", **create_col_args)
The following command waits till the entire collection from MongoDB is synced to SingleStore
In [6]:
%%sqlUSE cdcdemo;SYNC PIPELINE scores;
Printing some documents that are replicated
In [7]:
s2collection = s2db["scores"]scores_cursor = s2collection.find().limit(5)for scores in scores_cursor:print(scores)
Total documents count
In [8]:
s2collection.count_documents({})
Insert a document in the source MongoDB collection
In [9]:
data = {"student_id": random.randint(0, 100),"class_id": random.randint(0, 500),"exam_score": random.uniform(0, 100) # Generate random score between 0 and 100 as a double}
In [10]:
sourceclient = pymongo.MongoClient(source_mongo_url)sourcecol = sourceclient["cdcdemo"]["scores"]res = sourcecol.insert_one(data)
In [11]:
sourcecol.count_documents({})
The newly added document is now replicated to singlestore, increasing the documents count by 1 demonstrating real time sync
In [12]:
s2collection.count_documents({})
This native replication capability from Singlestore makes it easy to setup and run continuous data replication from your MongoDB at no additional cost or infrastructure requirements
Details
About this Template
Setup Zero ETL data replication from MongoDB to SingleStore
This Notebook can be run in Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.