When managing large quantities of data, MongoDB® is a commonly used NoSQL database for storing vast amounts of unstructured data in a document-oriented format.
While the data may be stored in MongoDB, it needs a separate software layer to implement fast search functionality — like ElasticSearch. A search and analytics engine, Elasticsearch provides solutions for indexing and analyzing data across both structured and unstructured formats.
Many organizations rely on both databases since Elasticsearch excels in search-related operations, while MongoDB offers efficient scalability for handling large and complex data sets. However, they are independent of one another and must be properly introduced. This introduction has proven to be both difficult and complex, which only gets worse when attempting to handle intricate data. In this blog, we’ll take a look at the most common ways companies can integrate the two systems as well as an alternative that can save you time and money — SingleStore Kai™.
Option 1: Integrating MongoDB and ElasticSearch
Method 1: MongoDB Connector
MongoDB Connector is a data-export tool developed by MongoDB with Python. For small to medium workloads, it can handle continuous synchronization of your secondary dataset on its own.
Pros
- Fairly easy to set up
- Real-time data synchronization via replications of the MongoDB operations log (oplog)
Cons
- Can’t efficiently scale past a small-medium size workload
- Complexity increases linearly with the complexity of your MongoDB schema
- Not flexible out of the box, requires a high degree of product knowledge
Here’s a crash course to setup your first connector:
- Install the MongoDB Connector tool with pip
pip install mongo-connector
- With Elasticsearch installed, download your version’s corresponding doc manager
pip install mongo-connector[elastic5]# Match ‘5’ with your version of ES, and supplement the parameter with -aws for the Amazon version of Elastic. Example: mongo-connector[elastic5-aws]
- To start the connector, you need a config.json file. The following is a general outline to do so, but please tune your settings to suit your needs.
{"mainAddress": "localhost:27017","oplogFile": "oplog.timestamp","noDump": false,"batchSize": -1,"verbosity": 2,"continueOnError": true,"logging": {"type": "file","filename": "mongo-connector.log"},"authentication": {"adminUsername": "yourAdminUsername","password": "yourPassword"},"fields": {"includeFields": ["field1","field2"],"excludeFields": ["field3"]},"namespaces": {"include": ["db1.collection1","db2.collection2"],"exclude": ["db3.collection3"]},"target": {"url": "http://localhost:9200","index": "myindex","docType": "mytype"}}
- Once your config.json file is complete, you can run this command to get the connection up and running!
mongo-connector -c config.json
Method 2: LogStash
Elasticsearch runs on something called the ELK stack. This stack houses Elasticsearch, Kibana, Beats and Logstash, which all work together to provide companies with search and analytics capabilities.
Pros
- Logstash and its ingest pipelines have inherent flexibility and support auto-indexing
- The ELK community has constructed a wide range of document support to aid in configuration
Cons
- Configuring logstash has a steep learning curve
- Syncing data from MongoDB to logstash will come with latency
Logstash
- Download logstash with the instructions found here.
- Install the MongoDB input plugin:
bin/logstash-plugin install logstash-input-mongodb
- Create a logstash configuration file ‘logstash.conf’ and give it the following pathing:
input {mongodb {uri => 'mongodb://localhost:27017/mydb'placeholder_db_dir => '/path/to/logstash-mongodb/'placeholder_db_name => 'logstash_sqlite.db'collection => 'mycollection'batch_size => 5000}}output {elasticsearch {hosts => ['localhost:9200']index => 'myindex'}}
- Run LogStash with the previously created config file:
bin/logstash -f logstash.conf
Method 3: Custom scripts
Many developers have resorted to their own solutions using the ElasticSearch libraries, namely in Node.js and Python. The following program will fetch data from your database and index it into Elasticsearch.
Pros
Of course, a custom solution will yield full control over what happens to your data
- Adding new features is as simple as creating a new function
Cons
- A custom solution would require expertise in the ElasticSearch and MongoDB languages
- Intense levels of maintenance throughout the duration of your solution
- Very little reusability as each solution is custom-made
from pymongo import MongoClientfrom elasticsearch import Elasticsearchfrom elasticsearch.helpers import bulk# Connect to MongoDBmongo_client = MongoClient("mongodb://localhost:27017/")mongo_db = mongo_client["mydb"]mongo_collection = mongo_db["mycollection"]# Connect to Elasticsearches = Elasticsearch(["http://localhost:9200"])# Fetch data from MongoDB and index into Elasticsearchdef fetch_and_index():actions = [{"_index": "myindex", "_id": str(doc["_id"]), "_source": doc}for doc in mongo_collection.find()]bulk(es, actions)fetch_and_index()
Option 2: Integrating MongoDB and SingleStore Kai™
If you’re looking for a more streamlined solution, SingleStore Kai is an API that targets and corrects the common drawbacks of apps built on MongoDB.
To use SingleStore Kai, all you have to do is spin up a SingleStore workspace and change your endpoint. Then, all of your queries can be executed in the already-familiar Mongo syntax. No new languages, no lengthy and stressful configuration. Both OLAP and OLTP workloads are possible with speeds of up to 900x faster than MongoDB, with full-text search over standard text, as well as JSON data. It can be scaled up or down at will without sacrificing performance, and can run in any cloud environment.
Benefits of SingleStore Kai and MongoDB vs. an Elastic integration:
Complexity | Single database solution | Scalability | |
SingleStore Kai™ | Low: Only Mongo query language, setup in three easy steps | Yes, with OLAP + OLTP support | SingleStore Kai can handle petabyte-scale data |
Elasticsearch + MongoDB | High: Several query languages and hard to learn | No, data is copied via your MongoDB oplog into Elasticsearch for OLAP workloads | Elasticsearch loses performance as more nodes are added to a cluster |
The SingleStore Kai benchmarks are conveniently listed and explained here. Try it free today.