Choosing Your Next Database

If selling a database is hard, buying a database is 100x harder.

In five years at SingleStore, I’ve seen upwards of 500 database POCs — and in this blog I’ll share some stories about what our customers found important when evaluating real-time analytics databases. For anyone looking for a Rockset alternative, this one’s for you!

No matter how many POCs we do here at SingleStore, selection criteria always comes down to five key components:

Data size
Data ingestion
Query execution
Query complexity
Concurrency

I always tell our customers and prospects: If you have a need for three of these criteria, you’re looking in the right place. Fun fact: in the last year, we’ve won 82% of POCs — the other 18% had two or less of the criteria!

IEX Cloud

The CTO of IEX Cloud compared SingleStore against Clickhouse and said, “It’s really really fast for writes, sped up our overnight ETL but we wanted that to be the same system that served our user traffic — and we quickly realized it’s not meant for a lot of parallel requests. They offered a clustering option but we quickly realized it didn't work as advertised. It just didn’t meet our need.”

Data ingestion: 100mb per second. Clickhouse and SingleStore both have great connectivity and performance for ingestion, so this perspective alone was not enough to make a decision.
Query execution: Sub-100ms on deeply nested JSON. SingleStore’s analytical performance on JSON is bar-none, thanks to the seekable JSON feature set. The competitive technologies performed well on structured data, but SingleStore was differentiated here.
Concurrency: 1.2B requests per day. IEX Cloud required a database that could dynamically (and automatically) scale . Clickhouse, PostgreSQL and Yugabyte all showed great characteristics for writes but were not able to meet the read need — SingleStore did.

Heap

The engineering team at Heap is probably the most talented one I’ve ever interacted with. Heap auto-captures clickstream data from its customers to provide complex analyses of their digital journeys. They admittedly tried every database under the sun, including PostgreSQL, Clickhouse, Snowflake and others.

Data size: 2PB+ of large, sparse JSON objects. Trust me, these were gnarly! Thanks to SingleStore’s seekable JSON and sparse compression, Heap was able to navigate these very quickly.
Data ingestion: 500,000 records/second. Heap scaled out their cluster horizontally to ensure maximum parallelism on ingest. They also had to make sure data was available to be queried in under five minutes.
Query execution: P95 three seconds and P95 nine seconds for 10x scale. SingleStore achieved this, albeit with some rewrites from the original Postgres code by our talented solution engineers.
Query complexity. The preceding query speeds may not seem that crazy, but if you saw these queries you would get it! Tons of recursive CTEs, thousands of lines of codes and complex joins.
Concurrency: 40 queries per second. Heap achieved this easily with SingleStore — and was even able to ensure they hit their intended targets at 10x the scale (400qps).

Adobe

Matt Newman and team at Adobe support 500,000 monthly active users on the Workfront platform, and were looking for an alternative after struggling for years with both PostgreSQL and Elastic. SingleStore and Rockset were evaluated.

Data size: 70TB of unstructured data. SingleStore’s multimodal database supported both the structured and unstructured requirements that Adobe had.
Data ingestion: Less than five seconds from ingest to query. SingleStore excelled here as data is ingested directly into memory for immediate availability, then committed to disk at 70%+ compression rates to ensure price-performance.
Query execution: Sub-second joins across 10B records. Elastic struggled with this requirement as it required significant flattening of data to get performance that was fast, but not exceptional.

GoGuardian

The developers at GoGuardian compared SingleStore against Druid and were most interested in data ingestion, query execution, query complexity and concurrency.

Data ingestion: 750,000 records per second. This was relatively easy to hit on SingleStore given our native Pipelines, which leverage parallelism of leaf nodes. GoGuardian’s workload necessitated updates of individual rows which Druid could not handle.
Query execution: 30ms reads. This was a no-brainer victory for SingleStore against BigQuery, Druid and Presto — specifically given the presence of point reads and large-scale aggregations. See: Universal Storage.
Query complexity. GoGuardian has thousands of dynamic dimensions to their queries, which (given the query plancache) were super fast to execute on SingleStore.
Concurrency: Dynamic reads at 400/sec. While the concurrency of this workload wasn’t all that high, the dynamic nature was a challenge for Presto and Druid.

The great thing about leading a customer-facing team is you’re solving new business problems every day, but the repeatability through a technical blueprint is what enables us to move faster for our customers every day. If you’re looking for a Rockset alternative (or any database), feel free to schedule time with my team here.