The funny thing about SingleStore is that it is simultaneously familiar and leading edge. On one hand, it is a relational database management system with a SQL interface – you probably already know how to use one of those! On the other hand, it stores data in memory and runs in a cluster – you may or may not have experience with a modern system that leverages these features.
This post addresses common questions that people ask the first time they encounter SingleStore. By reading these five questions and answers, you’ll understand most of the fundamental characteristics of SingleStore, what makes it special, and how you are already armed with most of what you need to get started.
1. What is SingleStore?
SingleStore is a relational database management system with a SQL interface. It has the following additional properties:
- Distributed architecture: SingleStore typically runs on a “cluster” of servers. However, users address a single interface to get data in or out of SingleStore. SingleStore handles details like which servers store certain volumes of data, and distributed query execution logic. Users benefit from the resources of many servers without needing distributed systems expertise.
- Memory-optimized: SingleStore gives users the option to read and write data directly to and from main memory. Accessing data in DRAM is orders of magnitude faster than disk or even flash/SSD, and this provides extreme performance benefits for high-throughput transaction processing and real-time analytics. However, SingleStore does not require that all data fit in memory, and includes the option to store data, in a compressed format, on flash/SSD/disk.
- Software only: SingleStore is distributed as software, and can run in variety of Linux environments including on physical hardware (“bare metal”), in virtual machines, or in software containers. It can be run on premises, in your data center, or in a public cloud. Unlike legacy database vendors that require the use of proprietary hardware, SingleStore can run on commodity servers.
2. Where does SingleStore store its data?
When creating a table in SingleStore, you specify whether the data for that table will reside in memory or on disk. SingleStore has two table types: rowstore tables and columnstore tables. Rowstore tables are primarily used for operational or transactional workloads, particularly workloads that require rapid updates. SingleStore rowstore tables store data entirely in memory, with logs and full database snapshots written to disk for durability. Columnstore tables, on the other hand, are typically used for analytical workloads and data warehousing, as the format naturally lends itself to compression, efficient scanning, and rapidly appending data. SingleStore columnstore tables store data primarily on disk, but cache data in memory when possible.
3. How does SingleStore ensure durability for data stored in memory?
SingleStore writes logs and full database snapshots to disk, which can be used to recover state in the event that a machine turns off. SingleStore writes logs as data changes, and periodically triggers a full backup of the data in memory (snapshot). Users can configure the frequency of full database snapshots.
SingleStore supports transaction processing, and exposes parameters for the user to tune performance. The most common question we get is how do we ensure durability when processing transactions in memory. In SingleStore, a transaction is “committed” when it has been written to the in-memory transaction buffer. SingleStore keeps a thread running in the background that is constantly writing blocks of data from the in-memory transaction buffer to disk as logs. The size of the transaction buffer is configurable. For instance, setting the size of the transaction buffer to 0 MB means that a transaction will not be committed until it has been logged to disk. In practice, it is not necessary to set the transaction buffer to 0 MB since using a buffer allows SingleStore to write large chunks of data to disk all at once, and because doing so requires only sequential I/O.
4. How is SingleStore architected?
A SingleStore cluster consists of two types of nodes: aggregator and leaf nodes. Client applications connect to an aggregator, which serves as the query router. Aggregators are aware of the entire cluster and know where specific data reside. Leaf nodes handle data storage and most of the computation during query execution. When the client sends a query, the aggregator splits it into several queries which are sent to each leaf node. The leaf computes its query and sends the result back to the aggregator. The aggregator consolidates the results from each leaf and sends the final result back to the client.
Every cluster has a special aggregator called the Master Aggregator (non-Master aggregators are called child aggregators). Any aggregator, master or child, can process data manipulation language (DML) queries including SELECT, INSERT, UPDATE, and DELETE. Data definition language (DDL) commands, such as CREATE, DROP, or ALTER TABLE, must run on the Master Aggregator. In the event that the Master Aggregator machine fails, an administrator can “promote” a child aggregator to Master.
Database administrators can add (and remove) nodes to the cluster at any time while keeping the cluster online, even while running a workload. Simply provision additional nodes, then add them to the SingleStore cluster. The easiest way to do this is through SingleStore Ops, but it can also be accomplished by sending commands to the Master Aggregator.
5. How does SingleStore licensing work? How do I know how many machines I will need in my cluster?
SingleStore is licensed based on the total RAM capacity of a cluster (the sum of the RAM in each server/VM/container in the cluster). Unlike some vendors in the database space, we don’t license based on number of CPU cores. This allows our customers to maximize CPU resources without licensing overhead.
When planning, note that the cluster must have enough RAM for the following:
- Rowstore data storage in memory
- Rowstore and columnstore query execution
- The operating system (kernel)
SingleStore enables several types of workloads ranging from stream processing and real-time analytics, to transaction processing, to data warehousing. Each of these workloads has different performance characteristics and latency requirements. If you need help designing your cluster, we are happy to assist! For general inquiries, you can reach us at info@singlestore.com.
Get Started with SingleStore
Now, you have the tools and fundamental concepts needed to start using SingleStore. Visit singlestore.com/cloud-trial to download SingleStore today.