SingleStoreDB is a high-performance, general purpose database that allows users to transact, analyze and contextualize data in real time.
A key feature allowing our customers to do this is our patented default table type, Universal Storage. Universal Storage takes advantage of three separate storage tiers: in-memory rowstore, on-disk columnar store and object storage.
The on-disk columnar store is also referred to as the disk cache, storing warm data whilst cold data gets evicted from disk and lives solely in object storage. Having said that, there are certain situations where none of your data will be on the disk cache — significantly impacting query performance. This blog runs through when this can occur, how to “warm-up” your cluster in these scenarios and the latency impact of querying data which lives on disk vs. object store.
Impact of not having a warm disk
First, let’s take a look at the impact of running a query against cached vs. uncached data. To do this, we’ve tested an OLTP style query and an OLAP style query on a table called customers
with 15 million rows (Appendix A contains the table DDL and queries ran). The following table contains the results.
Uncached | Cached | Factor improvement | |
OLTP | 5s | 7ms | 714x |
OLAP | 1.5s | 29ms | 52x |
As you can see in the table, queries pulling down data from object storage will be significantly slower than queries reading data directly from disk. The size of the impact will depend on the total size of the blobs which need to be fetched from object storage to complete the query.
In general, hitting blob storage isn’t going to be a problem for users. This is because SingleStoreDB uses a Least Recently Used (LRU) algorithm to determine what data gets evicted from disk. So as long as you provision a cluster with enough disk space to hold the subset of data you need fast access to, the LRU algorithm will ensure it remains on disk.
Okay, so when will it be a problem?
There are two situations where you will find your disk cache layer is not storing any of the data in your database:
- On workspace resume. This is a common situation for customers/prospects who are using SingleStoreDB’s suspend/resume functionality to save on credits when the workspace is not being used.
- When attaching an existing database to your workspace.
In both these situations, all the Universal storage tables in the database will live solely in object storage.
How to warm the disk
To hydrate your disk in these situations, you’ll need to query your data. This results in blobs being read from object storage to be pulled onto and persisted to disk. If you know in advance what queries you are going to run and the parameters used in those queries, you can create a script to run through those queries to warm your disk cache. That way on subsequent runs, there is no latency impact resulting from reading data from object storage.
However, we have plenty of customers/prospects who don’t know ahead of time what parts of their database are going to be queried. In this scenario, you may want to cache your entire database. To do this, there are two types of files in blob storage you’ll need to pull down to disk — columnar and index.
To warm up the columnar files for a given table, you can run the following query.
SELECT AVG(LENGTH(<column_name_a>)),AVG(LENGTH(<column_name_b>)),…,AVG(LENGTH(<column_name_n>))FROM <table_name>;
If you don’t need to warm the whole table (for example, if it contains historical data and you only need the last three years), you can add filters to the query to only pull down the relevant columnar files to disk.
To warm the disk cache with the index blobs, the following command can be run on specific indexes/tables.
OPTIMIZE TABLE <table_name> WARM BLOB CACHE FOR INDEX <index_name>;
If you have more than a handful of tables, you’re likely already thinking this sounds like a lot of manual work. To take the burden off you, we’ve created a notebook and shared it in the gallery section of SingleStore Spaces. This notebook automates the process of creating and executing both these queries against the tables and indexes in a selected database. You can view the notebook here.
Conclusion
After resuming a workspace or attaching a database to an existing workspace, none of your Universal Storage tables will have their data cached on disk. In this scenario, it is particularly important for customers who are powering latency-sensitive apps, or prospects looking at testing the performance of SingleStoreDB to warm up their workspace using the steps outlined in this blog. This will ensure your queries don’t incur a latency penalty as a result of queries reading data from object storage.
Appendix
Appendix A
-- Table DDL --CREATE TABLE customer (c_custkey int(11) NOT NULL,c_name varchar(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,c_address varchar(40) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,c_nationkey int(11) NOT NULL,c_phone char(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,c_acctbal decimal(15,2) NOT NULL,c_mktsegment char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,c_comment varchar(117) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,PRIMARY KEY (c_custkey),SHARD KEY (c_custkey),INDEX (c_nationkey));-- OLTP query --SELECT *FROM customerWHERE c_custkey = 121;-- OLAP query --SELECT c_nationkey,COUNT(*) AS total_customers,ROUND(AVG(c_acctbal),2) AS avg_acctballFROM customerGROUP BY c_nationkeyORDER BY total_customers DESC;