in Engineering


Easy Integration of MongoDB® and Elasticsearch

Ryan Sarginson

Solutions Engineering Intern

When managing large quantities of data, MongoDB® is a commonly used NoSQL database for storing vast amounts of unstructured data in a document-oriented format.

Easy Integration of MongoDB® and Elasticsearch

While the data may be stored in MongoDB, it needs a separate software layer to implement fast search functionality — like ElasticSearch. A search and analytics engine, Elasticsearch provides solutions for indexing and analyzing data across both structured and unstructured formats.

Many organizations rely on both databases since Elasticsearch excels in search-related operations, while MongoDB offers efficient scalability for handling large and complex data sets. However, they are independent of one another and must be properly introduced. This introduction has proven to be both difficult and complex, which only gets worse when attempting to handle intricate data. In this blog, we’ll take a look at the most common ways companies can integrate the two systems as well as an alternative that can save you time and money — SingleStore Kai™.

method-1-mongo-db-connectorMethod 1: MongoDB Connector

MongoDB Connector is a data-export tool developed by MongoDB with Python. For small to medium workloads, it can handle continuous synchronization of your secondary dataset on its own.

Pros

  • Fairly easy to set up
  • Real-time data synchronization via replications of the MongoDB operations log (oplog)

Cons

  • Can’t efficiently scale past a small-medium size workload
  • Complexity increases linearly with the complexity of your MongoDB schema
  • Not flexible out of the box, requires a high degree of product knowledge

Here’s a crash course to setup your first connector:

  • Install the MongoDB Connector tool with pip pip install mongo-connector
  • With Elasticsearch installed, download your version’s corresponding doc manager

pip install mongo-connector[elastic5]
# Match ‘5’ with your version of ES, and supplement the parameter
with -aws for the Amazon version of Elastic. Example:
mongo-connector[elastic5-aws]
  • To start the connector, you need a config.json file. The following is a general outline to do so, but please tune your settings to suit your needs. 
{
"mainAddress": "localhost:27017",
"oplogFile": "oplog.timestamp",
"noDump": false,
"batchSize": -1,
"verbosity": 2,
"continueOnError": true,
"logging": {
"type": "file",
"filename": "mongo-connector.log"
},
"authentication": {
"adminUsername": "yourAdminUsername",
"password": "yourPassword"
},
"fields": {
"includeFields": ["field1", "field2"],
"excludeFields": ["field3"]
},
"namespaces": {
"include": ["db1.collection1", "db2.collection2"],
"exclude": ["db3.collection3"]
},
"target": {
"url": "http://localhost:9200",
"index": "myindex",
"docType": "mytype"
}
}
  • Once your config.json file is complete, you can run this command to get the connection up and running!
mongo-connector -c config.json

method-2-log-stashMethod 2: LogStash

Elasticsearch runs on something called the ELK stack. This stack houses Elasticsearch, Kibana, Beats and Logstash, which all work together to provide companies with search and analytics capabilities.

Pros

  • Logstash and its ingest pipelines have inherent flexibility and support auto-indexing
  • The ELK community has constructed a wide range of document support to aid in configuration

Cons

  • Configuring logstash has a steep learning curve
  • Syncing data from MongoDB to logstash will come with latency


Logstash

  • Download logstash with the instructions found here.
  • Install the MongoDB input plugin:

bin/logstash-plugin install logstash-input-mongodb
  • Create a logstash configuration file ‘logstash.conf’ and give it the following pathing:
input {
mongodb {
uri => 'mongodb://localhost:27017/mydb'
placeholder_db_dir => '/path/to/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'mycollection'
batch_size => 5000
}
}
output {
elasticsearch {
hosts => ['localhost:9200']
index => 'myindex'
}
}
  • Run LogStash with the previously created config file:
bin/logstash -f logstash.conf

method-3-custom-scriptsMethod 3: Custom scripts

Many developers have resorted to their own solutions using the ElasticSearch libraries, namely in Node.js and Python. The following program will fetch data from your database and index it into Elasticsearch.

Pros

  • Of course, a custom solution will yield full control over what happens to your data

  • Adding new features is as simple as creating a new function

Cons

  • A custom solution would require expertise in the ElasticSearch and MongoDB languages
  • Intense levels of maintenance throughout the duration of your solution
  • Very little reusability as each solution is custom-made

from pymongo import MongoClient
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
# Connect to MongoDB
mongo_client = MongoClient('mongodb://localhost:27017/')
mongo_db = mongo_client['mydb']
mongo_collection = mongo_db['mycollection']
# Connect to Elasticsearch
es = Elasticsearch(['http://localhost:9200'])
# Fetch data from MongoDB and index into Elasticsearch
def fetch_and_index():
actions = [
{
"_index": "myindex",
"_id": str(doc["_id"]),
"_source": doc
}
for doc in mongo_collection.find()
]
bulk(es, actions)
fetch_and_index()

Option 2: Integrating MongoDB and SingleStore Kai™
If you’re looking for a more streamlined solution, SingleStore Kai is an API that targets and corrects the common drawbacks of apps built on MongoDB.

To use SingleStore Kai, all you have to do is spin up a SingleStore workspace and change your endpoint. Then, all of your queries can be executed in the already-familiar Mongo syntax. No new languages, no lengthy and stressful configuration. Both OLAP and OLTP workloads are possible with speeds of up to 900x faster than MongoDB, with full-text search over standard text, as well as JSON data. It can be scaled up or down at will without sacrificing performance, and can run in any cloud environment.

Benefits of SingleStore Kai and MongoDB vs. an Elastic integration:

ComplexitySingle database solutionScalability
SingleStore Kai™Low: Only Mongo query language, setup in three easy stepsYes, with OLAP + OLTP supportSingleStore Kai can handle petabyte-scale data
Elasticsearch + MongoDBHigh: Several query languages and hard to learnNo, data is copied via your MongoDB oplog into Elasticsearch for OLAP workloadsElasticsearch loses performance as more nodes are added to a cluster

The SingleStore Kai benchmarks are conveniently listed and explained here. Try it free today.


Share