The Ultimate Guide to Scaling Elasticsearch

Clock Icon

23 min read

Pencil Icon

Jan 7, 2025

Elasticsearch is an extremely popular tool that many developers and applications use for distributed search and analytics.

The Ultimate Guide to Scaling Elasticsearch

This functionality is essential for many applications needing real-time data functionalities. Its scalability is a key strength, allowing it to handle growing data volumes and user demands. However, scaling Elasticsearch isn't quite as simple you might expect. To scale Elasticsearch properly and effectively, developers require a good understanding of its architecture — and some careful planning to pull it off. Part of this challenge revolves around the fact that Elasticsearch data is organized into shards. This helps the platform with efficient searching and retrieval, but adds complexity when it comes to managing large datasets and ensuring high availability through data replication.

This guide explores key considerations and best practices for scaling Elasticsearch, ensuring your Elasticsearch instance is optimally performing. We’ll cover topics like shard management, data ingestion strategies, query optimization and the role monitoring plays when it comes to running Elasticsearch at scale. Whether you’re just getting started with Elasticsearch or looking to fine-tune your existing deployment, this guide has the foundations for everything you need to know.

introduction-to-scaling-elasticsearchIntroduction to scaling Elasticsearch

As with most pieces of tech infrastructure, as data and workload increase, platforms usually need some help to compensate and scale. When it comes to scaling Elasticsearch, staying ahead of the demand curve is crucial for ensuring the performance and reliability of your search engine. As your data grows, it’s essential to plan and design a scalable Elasticsearch cluster to handle the increased load.

Elasticsearch is designed to be highly scalable, and can handle large amounts of data and high query volumes. As proven by many companies that leverage Elasticsearch, using it at scale is definitely possible. What some companies don't see, however, is that scaling Elasticsearch requires careful planning and consideration of several factors, including data nodes, disk space, primary shards, query performance and replica shards. By understanding these factors and designing a scalable cluster, you can ensure your Elasticsearch deployment can handle the demands of your growing data and user base. Let's dig a little further into how these concepts are intertwined when it comes to scalability in the next section.

understanding-elasticsearch-scalabilityUnderstanding Elasticsearch scalability

As mentioned, to scale Elasticsearch effectively, it’s crucial to grasp how it handles data storage and retrieval. In this section, we will break down the core concepts within an Elasticsearch deployment. Elasticsearch's scalability stems from its ability to distribute and balance data and queries across multiple nodes in a cluster. This distribution is achieved through sharding, a common concept and technique within the database realm. With this technique, each index is divided into smaller units called shards. By using shards, an index can be replicated across multiple nodes, which helps with redundancy and increases the system's query capacity.

data-storage-and-shardingData storage and sharding

At its core, Elasticsearch organizes data into indices. In the case of Elasticsearch, you can think of an index like a database table. To distribute data and manage growth, each Elasticsearch index is logically grouped into one or more physical shards. These physical shards are self-contained units that hold a subset of the index’s data, allowing for the distribution of documents across multiple shards to ensure redundancy and efficiency in data retrieval. Another by-product of this setup is that fault tolerance comes out of the box, a key feature for a distributed system.

Each shard can have multiple copies. These are known as replicas, and serve two primary purposes:

  • Redundancy. If a shard becomes unavailable due to a node failure, a replica takes over, ensuring data availability

  • Scalability. Replicas can handle search requests, distribute the query load, and improve performance

shard-configuration-and-query-performanceShard configuration and query performance

When creating an index, you define the number of primary shards. It's crucial to plan the number of primary shards carefully at index creation, as changing it later generally requires reindexing the data, which can be a resource-intensive process. At the same time, it's technically possible to increase primary shards using the split index API (introduced in version 7.x); this is often not the preferred approach. However, you have flexibility with replica shards, which can be adjusted dynamically as your needs evolve.

Finding the right balance in shard configuration is key. Although there are many ways to configure the size and number of shards, that decision comes with different advantages (and some downsides). With larger shards, you can reduce overhead by minimizing the number of shards to manage. However, using an approach with smaller shards can lead to faster query performance since queries can be executed in parallel across multiple shards — allowing for the load to be balanced across a great number of shards.

Custom routing for query efficiency

Custom routing allows you to direct documents to specific shards based on a chosen key, like a customer ID or product category. This technique can significantly improve search efficiency by targeting queries to relevant shards, avoiding the need to search all shards in the index. For example, if your data is segmented by customer regions, you can ensure that related documents are stored on the same shard. This not only reduces query latency but also optimizes resource utilization by limiting the number of shards involved in search operations.

The optimal shard configuration and use of custom routing depend on several factors:

  • Data volume. Larger datasets might benefit from being spread across more shards

  • Query patterns. Complex queries might perform better with smaller shards

  • Hardware resources. The number of nodes and their capacity influence shard distribution, and the amount of hardware/infrastructure needed

Thorough testing with your specific data and query patterns is essential to determine the most efficient shard configuration. Potentially experimenting with multiple configurations — along with monitoring their performance (something we will cover later) — will be the best way to make a decision on what configuration is most efficient for your specific application and data. This decision is still only one piece of the puzzle of scaling up your Elasticsearch instance.

planning-to-scale-your-elasticsearch-clusterPlanning to scale your Elasticsearch cluster

Planning to scale your Elasticsearch cluster involves several key considerations beyond just planning out shard configuration. First, you need to determine the number of data nodes required to handle your data volume and query load. When it comes to this factor, a general rule of thumb is to have at least three data nodes for a production cluster, with each node having sufficient disk space to store the data. This will give you the minimum configuration needed to help with availability and performance.

Next, as we already touched on, you need to consider the number of primary and replica shards for each index. Primary shards determine how the data is split across the cluster, while replica shards provide redundancy and improve query performance. The ideal number of shards and replicas depends on factors like data volume, query patterns, and hardware resources. The best way to ensure your configuration is optimal is to test the setup with your specific data and queries, dialing things in until you find the optimal configuration.

With those pieces figured out, you also need to consider the query performance and how to balance the load across the cluster. This can be achieved by designing multi-node clusters with multiple shards and replicas. By distributing the data and query load across multiple nodes, you can improve query performance and ensure that your cluster can handle high query volumes. The extent in which you implement this will depend on how much data and traffic you expect to hit your clusters.

scaling-data-ingestionScaling data ingestion

One of the most critical parts of using Elasticsearch — and probably the most basic — is getting your data into your Elasticsearch instance. As your application grows and handles more data, data ingestion becomes critical to Elasticsearch's performance, especially if you’re using it for real-time queries. Elasticsearch can handle high-volume indexing, but it’s important to understand the challenges and implement strategies to keep performance optimal.

indexing-challenges-and-solutionsIndexing challenges and solutions

When indexing documents into Elasticsearch, the following can impact performance:

  • Data volume. The amount of data being ingested

  • Shard distribution. Indexing documents across multiple shards and nodes

  • Refresh interval. How often does Elasticsearch make new data searchable?

Taking these factors into consideration, some best practices for ingestion into Elasticsearch include:

  1. Distributed indexing. Break down large indexing tasks into smaller, parallel processes. This can be achieved by having multiple clients send data to different nodes in the cluster, ensuring balanced load distribution.

  2. Add data nodes. Adding more data nodes to your cluster increases the resources available to handle the indexing load. This also improves shard distribution and reduces contention, enabling better performance.

  3. Optimize refresh interval. Temporarily increasing or even disabling the refresh interval (setting it to -1) during bulk ingestion can significantly improve indexing speed. Once ingestion is complete, restore the refresh interval to balance performance with real-time data availability.

  4. Bulk API. Use the Elasticsearch Bulk API to send multiple documents in a single request. This reduces overhead and increases efficiency compared to individual indexing operations.

  5. Ingestion pipelines. Leverage ingestion pipelines to pre-process documents before indexing. Tasks like data transformation, enrichment, and filtering can improve data quality and reduce downstream processing overhead.

  6. Monitor and configure thread utilization. The number of threads per node is configurable and plays a crucial role in indexing performance. Monitor thread utilization and adjust settings as needed to avoid bottlenecks and maintain throughput.

Following these pointers allows you to handle high-volume data ingestion without sacrificing performance. Beyond these suggestions for ingestion, there are also some overall Elasticsearch scalability best practices we should cover more broadly.

best-practices-for-scaling-elasticsearchBest practices for scaling Elasticsearch

Although many different paths exist to scaling Elasticsearch, there are a few steadfast ways to ensure that you are adhering to best practices for ramping up your instance. Scaling Elasticsearch successfully and properly involves more than just adding some nodes to your instance. It requires expertise, planning and careful consideration of the hardware, cluster configuration and operational practices you apply to the effort. Let's begin by looking at the hardware side of things.

hardware-provisioning-and-configurationHardware provisioning and configuration

Elasticsearch employs a tightly coupled architecture where each node in the cluster typically handles both the storage and processing of data. This differs from architectures where storage and compute are separated, including in popular platforms like Snowflake and Databricks (which have built their products on this architecture). This means that when scaling Elasticsearch, you often need to increase both storage and compute resources simultaneously for the system to react as expected.

Consequently, scaling Elasticsearch goes hand-in-hand with provisioning new hardware to take on the increased load. Here are some key considerations when it comes to provisioning hardware for your Elasticsearch instance:

  • Resource balancing. Ensure a balance between CPU, memory and disk resources. Elasticsearch is resource intensive, and bottlenecks in any of these areas can hinder performance. The role of a primary shard is crucial in managing data distribution and redundancy, as it holds documents and balances workloads across multiple nodes to ensure scalability and resilience.

  • Disk I/O. Fast disk I/O is crucial for Elasticsearch, especially for indexing and searching. Consider using SSDs or NVMe drives to ensure these resources meet your performance needs. Although provisioning these resources is a bit more expensive, they are definitely worth the investment when it comes to the performance increase they deliver over traditional disk solutions.

  • Network bandwidth. As your cluster grows, network bandwidth becomes increasingly important for communication between nodes. Ensure sufficient bandwidth to avoid bottlenecks and continually monitor the network load to tweak the configuration if performance is suffering.

managed-servicesManaged services

If you want best practices for scaling out of the box, a managed Eslasticsearch instance (or equivalent) might be a good route. This can help you get around managing Elasticsearch at scale — which can be complex — requiring your team to have experience and specialized expertise. Of course, the downside to managed services is that they tend to be more expensive than a self-managed solution; but this cost can be worth it for the ease and scale they can accommodate without user intervention/configuration.

A few options exist in this domain. If you're interested in trying out a managed version, consider leveraging managed service offerings like:

  • Elastic Cloud. Provides a fully managed Elasticsearch experience handling infrastructure, upgrades and security.

  • Amazon OpenSearch Service (Amazon ES). Offers a scalable and reliable service based on OpenSearch, a fork of Elasticsearch, on AWS, with integration with other AWS services. While similar to Elasticsearch, OpenSearch has a distinct development path and feature set.

  • SingleStore Helios®. A great alternative for Elasticsearch capabilities in a fully managed environment that is optimized for scale. On top of the ability to replace Elasticsearch, SingleStore can also replace other components of the data stack and enable zero-ETL and real-time data use cases.

Managed services can significantly reduce operational burden, and are also easy to get started with versus bringing up your own infrastructure. This is good for developers since it allows you to focus on building your application and using Elasticsearch's capabilities rather than getting bogged down in infrastructure management.

capacity-planningCapacity planning

One of the biggest factors in data infrastructure issues at scale is a lack of capacity planning when building or expanding a component. Luckily, there are some tools and best practices that can help developers avoid the common pitfalls of simply reacting to capacity issues within your Elasticsearch deployment. Developers can implement proactive capacity planning by:

  • Monitoring. Continuously monitor key metrics like CPU usage, memory pressure, disk I/O and search latency

  • Forecasting. With reliable metrics from your monitoring solution to build upon, predict future resource needs based on historical data and growth projections. This will show you how your Elasticsearch instance will need to evolve as volumes grow

  • Scaling gradually. Scale your cluster incrementally to avoid overprovisioning and unnecessary costs. It's always tempting to overprovision to accommodate increases in data and queries, but it also leads to a sharp increase in unnecessary costs. You can scale more gradually and accurately using the insights from monitoring and forecasting activities

By following these best practices, you'll have a good foundation and plan to keep your Elasticsearch cluster performing and scalable as data and user demands grow. A crucial part of scaling Elasticsearch is the ability to add new nodes as demand ramps up. That will be what we will cover next at a high level.

adding-a-new-node-to-an-existing-elasticsearch-clusterAdding a new node to an Existing Elasticsearch cluster

Expanding your Elasticsearch cluster is often necessary to accommodate data growth or increased query load. Adding a new node can be done in a few steps using Elasticsearch’s native tools and APIs to take on this task. At a high level, here are the steps involved in adding a new node your existing Elasticsearch cluster:

  1. Provision the new node. Set up a new server that meets Elasticsearch’s hardware and operating system requirements. Ensure it has adequate resources (CPU, memory, disk) to handle its expected workload.

  2. Install Elasticsearch. Install the same version of Elasticsearch on the new node as your existing cluster. This ensures compatibility and smooth integration into the cluster.

  3. Configure the new node.

    1. Cluster name. In the elasticsearch.yml configuration file, set the cluster name to match the name of the existing cluster.

    2. Node role. Based on your needs, define the new node's role (e.g., data, master or coordinating node).

    3. Discovery configuration. In the discovery.seed_hosts setting, provide the IP addresses of your existing master-eligible nodes. This enables the new node to locate and join the cluster.

  4. Start the node. Start the Elasticsearch service on the new node. It will attempt to join the existing cluster using the settings provided in the configuration file.

  5. Monitor the node joining process.

    1. Use Elasticsearch’s APIs, like cat/nodes and cat/health, to verify the new node’s status and ensure it has joined the cluster correctly.

    2. The _cluster/stats API can also be used to check the overall health of the cluster and verify that the new node is contributing as expected.

  6. Verify node functionality. After the node has joined, confirm it is performing as intended. You can use Kibana or Elasticsearch’s REST APIs to monitor its resource usage (CPU, memory, disk) and role-specific metrics, ensuring it effectively handles its assigned load.

By following these steps and utilizing Elasticsearch’s built-in tools, you can add new nodes to your Elasticsearch cluster and scale your deployment to support growing data and query demands. Expanding your Elasticsearch cluster is often necessary to accommodate growth. While you can manually configure a new node, Elasticsearch provides APIs and tools to streamline this process. Third-party cluster management tools further simplify node addition and cluster administration. On top of just adding new nodes, there are also quite a few other facets to maintain and monitor as you scale up your Elasticsearch usage. The next section will look at the situation in more depth.

monitoring-and-maintaining-a-scalable-elasticsearch-clusterMonitoring and maintaining a scalable Elasticsearch cluster

Making sure that an Elasticsearch cluster is healthy and performant requires a significant amount of proactive monitoring and ongoing maintenance. Far from "set and forget," a self-managed instance of Elasticsearch has a big lift in terms of the commitment required to keep the infrastructure working as it should. Fortunately, Elasticsearch provides some help in the form of a suite of tools to help developers have an easier time with this task. Here's a breakdown of these components and tools:

integrated-monitoring-and-managementIntegrated monitoring and management

Built-in monitoring and management is one of the best features of platforms like Elasticsearch. As part of their offering, Elasticsearch includes built-in features for a few different types of monitoring and management, including:

  • Security management. Secure your cluster with authentication, authorization and encryption to protect your data

  • Performance and resource monitoring. Collect and analyze metrics on cluster health, performance and resource usage

  • Platform administration. Manage cluster settings, indices and nodes through APIs and tools

kibana-as-a-control-centerKibana as a control center

Kibana, the visualization and management tool for Elasticsearch, serves as a central control panel for your cluster. It offers features like:

  • Real-time monitoring. Visualize key metrics and identify potential issues quickly

  • Downsampling. Reduce storage costs and improve performance by aggregating historical data

  • Index lifecycle management. Automate the management of indices throughout their lifecycle, from creation to archiving or deletion

key-monitoring-aspectsKey monitoring aspects

Regardless of how you decide to monitor your Elasticsearch installation, you'll want to keep an eye on a few very important metrics as you scale. Monitoring these factors can help you to gauge the overall health and efficiency of your instance, including metrics like:

  • Cluster health. Track overall cluster status, shard allocation and node availability

  • Node performance. Monitor resource utilization (CPU, memory, disk I/O) on each node

  • Index size and growth. Track index sizes and growth trends to plan for capacity

  • Search performance. Analyze query latency, request rates and search efficiency

By actively monitoring your Elasticsearch cluster and leveraging the available tools, you can proactively identify and address potential issues before they impact users, optimize performance and resource utilization, and ensure the long-term health and scalability of the deployment.

common-pitfalls-and-troubleshootingCommon pitfalls and troubleshooting

While Elasticsearch is designed for scalability, there are certain things that developers can do that will hinder performance and even lead to cluster instability. These pitfalls are relatively easy to avoid by following many of the best practices and topics we already covered. That being said, let's take a quick look at a few areas that generally tend to cause issues for Elasticsearch deployments at scale:

incorrect-shard-sizingIncorrect shard sizing

As has been mentioned multiple times, shard size is one of the most critical aspects of performance at scale when it comes to Elasticsearch. Too many small shards lead to increased overhead, which negatively affect indexing and search performance. In the same way, too few large shards make it challenging to scale horizontally and increase recovery time after failures.

The root solution to this really starts at the onset of planning. Developers should really understand the pros and cons of different shard size strategies. Once implemented, changing the configuration later can be complex and resource intensive. Ideally, you'll aim for a balance between overhead and scalability, with shard sizes typically ranging between 10GB and 50GB. Regardless of what configuration you decide to go with, always test with your specific data and queries before rolling out to production.

inadequate-monitoringInadequate monitoring

As with any piece of infrastructure, proper monitoring is critical to identifying performance bottlenecks and anticipating capacity issues within your setup. Monitoring and alerts are critical to identifying problems early, and are also critical to benchmarking different configurations before you push them out into production. Solving the problem of inadequate monitoring is easy, though, since developers can use Elasticsearch's built-in tools and Kibana. Once implemented, developers can track key scalability metrics like CPU usage, memory pressure, disk I/O and search latency.

ignoring-cachingIgnoring caching

Caching is a fundamental technique for any data-intensive system, and Elasticsearch is no exception. Repeatedly hitting the same queries can put unnecessary load on your cluster and increase latency and resource consumption. That’s where caching comes in.

Elasticsearch has caching mechanisms like the query cache and field data cache, which store frequently accessed results in memory. Subsequent requests for the same data can then be served from the cache, reducing load and response times. However, over-relying on caching without understanding query patterns can lead to inefficiencies like memory usage. Make sure to monitor and fine-tune cache settings — including eviction policies — to optimize your cluster.

bad-mapping-designBad mapping design

Mappings in Elasticsearch are blueprints that define how your data is structured and indexed. Using the wrong data types or not defining mappings at all can lead to suboptimal indexing and search performance. Although the system might work, it won’t be efficient.

To optimize your Elasticsearch cluster, design your mappings for your indices carefully. Choose the right data types for your fields like text, keyword, date or numeric types. Use analyzers to process text fields for search, and don’t overuse dynamic mappings, which can lead to inconsistent data structures. By planning your mappings, you ensure efficient storage, better search performance and reduced resource usage (something you'd be able to see after implementing the monitoring recommendation above!).

ignoring-index-lifecycle-management-ilmIgnoring Index Lifecycle Management (ILM)

As your Elasticsearch cluster grows, old or unused data is inevitable. This data consumes resources and impacts performance, taking up space and making it harder to find the needed data.

Index Lifecycle Management (ILM) policies automate the management of indices throughout their lifecycle. You can define policies to move older indices to cheaper storage tiers, shrink them to save space or delete them when no longer needed. For cloud-hosted Elasticsearch clusters, ILM can help reduce costs by moving infrequently accessed data to slower storage. Review and update your ILM policies regularly to keep your cluster efficient, cost-effective and performant.

underestimating-hardwareUnderestimating hardware

Elasticsearch is resource hungry, and you need to provide enough hardware to be optimal. Under-provisioning CPU, memory or disk I/O will create bottlenecks that will severely limit your cluster’s capabilities.

To avoid performance issues, make sure your hardware meets Elasticsearch’s requirements. Allocate sufficient CPU cores, provide at least 32GB to 64GB of RAM per node (with no more than 50% allocated to the JVM heap) and use fast storage like SSDs or NVMe drives. You'll also want to monitor resource utilization to identify and address bottlenecks and consider horizontal scaling by adding more nodes to the cluster when needed.

delaying-cluster-updatesDelaying cluster updates

Keeping your Elasticsearch cluster up to date is critical for performance, stability and security. Older versions have unresolved bugs and security vulnerabilities or lack performance enhancements in newer releases.

To avoid risks, stay current with the latest stable releases of Elasticsearch and Kibana. Updates often include important bug fixes, performance improvements and new features. But always test updates in a staging environment to identify potential breaking changes before deploying to production. By updating your cluster regularly, you’ll have a secure, stable and high-performing Elasticsearch deployment.

With these common pitfalls and solutions in mind, you should have a good base for scaling your Elasticsearch implementation as your usage increases. Of course, for those looking for an alternative that is easier to maintain and scale, SingleStore might be precisely what you need!

simplifying-elasticsearch-scaling-withSimplifying Elasticsearch scaling with SingleStore

While Elasticsearch offers powerful search and analytics capabilities, scaling it effectively can be complex and demanding. If you're finding that managing your Elasticsearch instance is becoming difficult, or if you're hitting limitations in scalability and performance, it might be time to consider an alternative that can overcome these shortcomings.

SingleStore is a unified database that excels at both transactional and analytical workloads, making it a perfect replacement for Elasticsearch in many use cases (as well as other components within your data stack). As the only database you'll ever need, here's how SingleStore can replace Elasticsearch and simplify scaling the functionalities exposed through the platform:

unified-architectureUnified architecture

SingleStore's unified architecture helps eliminate complexity within the data stack. SingleStore combines transactional and analytical capabilities in a single system, eliminating the need for separate databases and complex data pipelines. This simplifies architecture and reduces operational overhead.

scalability-and-performanceScalability and performance

SingleStore scales effortlessly and is designed for horizontal scalability. It allows you to easily add nodes to handle increased data volume and query load, avoiding the complexities of shard management and rebalancing often encountered in Elasticsearch. The high-performance capabilities of SingleStore leverage a combination of in-memory processing and disk storage, optimizing for high performance across transactional and analytical workloads.

simplified-operationsSimplified operations

That platform helps to reduce management overhead through automated features for data management, including automatic sharding, data compression and performance tuning. This frees you from manual tasks and reduces the risk of misconfiguration. Similar to Elasticsearch, SingleStore also offers cloud-native options (as well as on-premise options for those who need it) and is available as a fully managed cloud service, further simplifying deployment and operations.

addressing-elasticsearch-pitfallsAddressing Elasticsearch pitfalls

Looking at some of the common pitfalls we mentioned earlier in the blog, here is how SingleStore alleviates these issues:

  • Shard management. SingleStore automatically handles sharding and data distribution, eliminating the need for manual configuration and the risk of incorrect shard sizing

  • Caching. SingleStore incorporates intelligent caching mechanisms to optimize query performance and reduce resource consumption

  • Index Lifecycle Management. SingleStore provides data retention and tiering tools (column and rowstore support), allowing you to manage data efficiently throughout its lifecycle

migrating-from-elasticsearch-to-single-storeMigrating from Elasticsearch to SingleStore

SingleStore offers tools and resources to simplify migration from Elasticsearch. You can leverage connectors and data migration utilities to move your data seamlessly. Additionally, SingleStore supports SQL and JSON, making it compatible with many existing applications and tools.

If you're looking for a simpler, more scalable and performant alternative to Elasticsearch, SingleStore is the perfect addition to your data stack. It addresses many common challenges associated with scaling Elasticsearch and provides a unified platform for all your data needs.

try-single-store-todayTry SingleStore today

Scaling Elasticsearch for optimal performance is a multifaceted endeavor that requires careful planning, diligent monitoring and a deep understanding of its architecture. By mastering concepts like sharding, data ingestion, query optimization and hardware provisioning, you can ensure your Elasticsearch cluster handles growing demands.

However as we've explored, scaling Elasticsearch can also introduce complexities and operational challenges. If you're seeking a more streamlined and scalable solution for your search and analytics needs, SingleStore could be a better fit. Its unified architecture, automated features and high performance can simplify operations and reduce the burden of managing a distributed search infrastructure.

Ready to experience the difference?

Try SingleStore free and explore its capabilities for yourself. Migrate your data from Elasticsearch and discover how SingleStore can simplify your architecture, enhance performance and unlock new possibilities for your applications.


Share