SingleStore’s Latest Performance Improvements

Clock Icon

5 min read

Pencil Icon

Dec 5, 2024

SingleStore delivers a 20x speedup for JSON array analytics.

SingleStore’s Latest Performance Improvements

At SingleStore, our DNA is centered on developing the only database you’ll ever need to scale from one to millions of customers; we handle SQL, JSON, full-text and vector workloads — all in one unified platform that provides high-performance, scalable transactional and analytical processing.

We continuously refine and deliver enhancements to our core engine to meet all your workload needs with exceptional performance and efficiency. In this blog, we will explore some of the latest performance improvements made to the SingleStore platform.

sub-segment-eliminationSub-segment elimination

Columnstore tables group data into logical segments using a sort key; each logical segment consists of roughly 1,024,000 records. One of the most important considerations in selecting sort keys for columnstores is increasing the amount of segment elimination that happens during query execution. Segment elimination uses column minimum/maximum value metadata during query execution to determine whether a segment can match a filter; if the segment cannot match a filter, the segment is skipped entirely and no data is examined, significantly improving performance.

Now imagine if we could do segment elimination for a much smaller block of data, we'd be able to eliminate more data and improve performance even further. Each segment can have up to 250 sub-segments — meaning each sub-segment will have 4,096 records; As part of the new sub-segment elimination feature, we collect range stats (min max) per sub-segment, and those stats are used to determine whether a sub-segment can match a filter. If not, the sub-segment is skipped entirely and no data is examined. The elimination of this additional data will significantly improve performance for queries with highly selective filters.

Here are performance results from a set of internal benchmark queries.

We tested with a range of 50 different queries with three different filter types and varying block elimination rate. You can see that this feature has improved query performance significantly up to 70x — especially when you have high selective filters or high block elimination rate.

improvements-to-queries-on-json-columnsImprovements to queries on JSON columns

improved-load-performance-for-json-computed-columnsImproved load performance for JSON computed columns

In 8.9, we have optimized loading of columns computed from JSON fields. When using INSERT, LOAD DATA or Pipelines to load data with multiple columns derived from JSON fields, each JSON field is now parsed only once and is used to produce all the derived columns. Previously, source JSON fields were parsed numerous times when there were many derived columns, once for each derived column. This extra parsing is removed with this enhancement.

This optimization works best when there are a high number of computed columns and shorter keypaths. That is, the optimization scales well as the number of computed columns increases. Here are performance measurements from an internal benchmark.

We tested with a range of computed columns and varying key lengths. You can see that this optimization has improved query performance significantly especially when you have many computed columns. To enable this optimization, set optimize_json_computed_column to TRUE. Details on configuration are in the documentation here.

optimized-json-array-operationsOptimized JSON array operations

In 8.9, we further enhanced the analytics capabilities on JSON arrays. This capability enables combining relational and semi-structured data processing seamlessly and can eliminate the need for ETL or flattening the JSON data. This makes the SingleStore Kai™ apps even more powerful and faster, as it's a common pattern to have nested fields and arrays in MongoDB®.

As you can see in the examples listed here, queries that filter before aggregating with GROUP BY are now much faster.  

Analytics on JSON is highly optimized in SingleStore, owing to the columnar storage and vectorized query execution. This makes SingleStore a powerful multi-model database that can efficiently handle both structured and semistructured data.

optimization-to-node-recovery-timeOptimization to node recovery time

In 8.9, we have optimized the recovery process to reuse previously computed autostats instead of rebuilding those stats from scratch, when possible. This improvement means the system can often avoid the expensive process of rebuilding autostats during recovery and avoid unnecessarily using space in the blob cache, improving performance. We have observed up to 270% improvement in recovery time with this new feature.

full-text-search-v-2-throughput-improvementFull-text search V2 throughput improvement

Full-text search version 2 based on Java Lucene was introduced in mid-2024. Since then, we've tuned our implementation to improve concurrency, memory management and internal caching, leading to approximately a 10x throughput improvement measured in queries per second. These improvements are visible under heavy, high-concurrency full-text search workloads. This has been validated with one of our customers, a Fortune 100 company, who has worked closely with us as a design partner.

try-single-store-8-9-features-todayTry SingleStore 8.9 features today

In SingleStore 8.9 we’ve delivered amazing performance gains in the database engine, especially for high selective queries and JSON. You may see stunning speed improvements — over 70x in some cases with no application changes required. And that's not all. We've delivered many other functional and usability enhancements in 8.9 too. For the full list of features in 8.9, see the release notes.

Try SingleStore today to experience the feel of riding a Ferrari of databases.


Share