Elastic’s OLAP Weaknesses

Clock Icon

5 min read

Pencil Icon

Oct 15, 2024

Elasticsearch is a distributed search and analytics engine built on the Apache Lucene libraries.

Elastic’s OLAP Weaknesses

Its ability to handle large quantities of structured and unstructured data makes it a popular choice for use cases including:

  • Full-text search for websites and applications
  • Semantic search for AI/ML applications
  • Log and event data analysis
  • Application performance monitoring
  • General data analytics

Elasticsearch relies on the rest of the ELK stack (Elasticsearch, LogStash and Kibana) to provide its services. LogStash serves as the data ingestion layer, where Kibana provides tools to visualize Elastic’s performance. The three complement each other to provide fast solutions to basic analytical workloads (OLAP).

However the real data isn’t clean, and OLAP workloads are never basic. Data is constantly changing, and meaningful analytics are becoming vastly more complex. Today, we’ll go through some of the struggles we’ve encountered with Elastic — so you don’t have to.

where-elastic-strugglesWhere Elastic struggles

Elasticsearch, while powerful for search and analytics, faces inherent limitations as a NoSQL database when it comes to complex analytical workloads. Its document-oriented structure is designed for fast retrieval and scalability. However, much like MongoDB®,, it lacks the relational capabilities found in traditional SQL databases.

This fundamental difference leads to several challenges when attempting to use Elasticsearch for thorough analytics. The absence of native JOIN support presents the need for complex workarounds that often lead to additional data engineering headaches. Let’s zoom in and discuss these in depth.

lack-of-join-supportLack of JOIN support

Joining across multiple tables is the core of modern analytics. Without native JOIN support, Elastic users have been forced to create their own workarounds.

Application-side solutions

This popular approach requires the data to be shared in separate Elasticsearch indices, and executes all the required joins in your application layer at the time of the query. This can lead to heavy compute costs, especially during high concurrency.

The drawbacks here include:

  • Complex, hand-tuned queries in the application code
  • Time-intensive maintenance as data or queries change
  • Increased application complexity and processing overhead
  • Slower query response times, especially for large datasets

Denormalization at ingest time

Normalized data is what Elastic likes to search across. However, this normalization results in “tall” data structures. Flattening it out via denormalization into more easily queryable data may improve analytical performance, but at the cost of slower searching.

Drawbacks include:

  • Additional engineering overhead, especially for constantly changing data
  • Data duplication leads to increased storage requirements
  • Consistency challenges when updating duplicated data across multiple documents
  • Potential for data inconsistencies if updates are not properly managed

Lookup runtime fields

This feature in Elasticsearch has been available since version 8.2, and involves attaching certain fields from separate indices at runtime. While this can simulate a single columnar JOIN, there are still several issues:

  • Random data is returned if multiple documents match the given condition
  • Large performance costs, as fields are calculated at query time and aren’t indexed
  • Limited functionality — you can't query or aggregate on lookup runtime fields

Each of these solutions can add several layers of complexity to your data management and query structure within Elasticsearch. This increase in difficulty comes with opportunity costs in the form of query performance, data consistency and engineering complexity. As the volume of your data grows and relationships become more complex, these costs will increase, leading to alternative solutions such as SingleStore that better support your needs.

data-integrity-issuesData integrity issues

It’s hard enough to cultivate rich data — it shouldn’t be hard to analyze it. The items below represent a few of the most common data persistence problems that Elastic users have to dodge.

ACID transactions

For those of us not familiar with ACID (Atomicity, Consistency, Isolation, Durability) transactions, they are essential in relational data models. An ACID transaction ensures the transaction runs from start to finish and produces accurate results. They are a cornerstone of modern data, and Elastic cannot provide them. This can lead to stifling problems like inconsistencies across data and partial updates across a distributed system.

Eventual consistency model

In the absence of ACID transactions, Elasticsearch uses something called an eventual consistency model. It’s a model that is again, fast, but only on the most local node. The other nodes reach in, read the data and self-propagate to keep data current. In the event data is updated frequently, a hefty backlog can stall operations. At this stage, you’re left with users asking, “Is my data the same across my entire cluster?” I don’t know, but it will be! Eventually.

Even after jumping through all the hoops mentioned before, your analytics would be misleading without up to date information.

why-single-storeWhy SingleStore?

Elasticsearch is fast and powerful, and they excel at simple analytics done in near real time. The only problem is that OLAP workloads are notoriously complex within a NoSQL framework, and modern analytics only amplify the difficulty. While it is an enticing solution to get started, businesses should carefully consider any joins or complex queries, especially those spanning multiple nodes. These analytical elements tend to be the most telling as they include data from across a distributed system —  it would be a shame if they weren’t accurate.

SingleStore mitigates the challenges of Elasticsearch by executing OLAP workloads in standard SQL, eliminating the need for custom JOINs and aggregation solutions. Its distributed architecture, with separation of storage and compute, ensures horizontal scalability without performance degradation, handling complex queries with low-latency response times. Unlike the multi-component ELK stack, SingleStore consolidates OLTP and OLAP capabilities into one platform, supporting real-time data freshness and full ANSI SQL, including advanced features like joins and window functions.

Its ACID compliance guarantees data consistency, while its ability to handle diverse data types (SQL, JSON, geospatial, vector) enables richer analytics. With lower total cost of ownership and robust performance, SingleStore provides a simpler, more efficient solution for complex OLAP workloads compared to Elasticsearch.

Building on these advantages, SingleStore's unified approach to vector search workloads offers significant benefits over the ELK stack. SingleStore provides a seamless environment where vector and full-text search can be implemented in the same SQL query as complex analytical queries. This integration eliminates the need for data movement between specialized systems, reducing latency and simplifying the overall data pipeline. For AI applications that require real-time analytics, SingleStore's ability to handle both transactional and analytical workloads in a single platform allows for immediate analysis of fresh data.

Start free with SingleStore


Share