We're taking you through frequently asked questions (FAQ) we hear from Mongo developers when migrating to SingleStore from a sharded MongoDB deployment.
Can I choose which cluster and tables (collections) to shard vs. not shard?
No, in SingleStore you do not have the option to deploy a sharded vs. non-sharded cluster.
This concept also applies at the table level, where tables are sharded by default. This automatic sharding is integral to SingleStore's architecture, designed to optimize performance and scalability across distributed servers. This is a key difference from MongoDB, where you choose whether to shard specific collections based on your application's needs.
SingleStore does however have a table type called a reference table which is not sharded. Instead, reference tables contain a full copy of the data on each leaf node. Reference tables are particularly useful for storing dimension tables which are infrequently updated.
How does the cluster architecture differ from a sharded Mongo cluster?
In SingleStore, the cluster architecture is composed of aggregator and leaf nodes. Aggregator nodes are responsible for query parsing, query routing and aggregating results from the leaf nodes. The leaf nodes store the actual data and perform most of the query processing. Additionally, a load balancer typically sits in front of the SingleStore cluster, directing queries to the appropriate aggregator node to ensure efficient load distribution and fault tolerance.
Aggregator nodes in SingleStore take on roles similar to both the query router (mongos) and config servers in MongoDB. While MongoDB's query routers direct queries to the appropriate shard, and config servers store metadata about the cluster, SingleStore's aggregator nodes handle both query routing and metadata management within the cluster. This means that SingleStore’s architecture is more streamlined, reducing the need for multiple components to manage and route data across the cluster.
What type of sharding is available in SingleStore?
In SingleStore, data is distributed across partitions using a hash function on the shard key. The shard key is a column or set of columns in a table — chosen at the create table level — used to control how the rows of that table are distributed. To determine the partition responsible for a given row, SingleStore computes a hash from all the columns in the shard key to the partition ID.
Therefore, rows with the same shard key will reside on the same partition. This is similar to MongoDB’s hashed sharding. However, unlike MongoDB, SingleStore does not support range-based sharding.
Is there a balancer job to move chunks and rebalance the shards/partitions?
Unlike MongoDB, SingleStore does not have a balancer job that moves chunks or rebalances shards. In MongoDB, the balancer job manages the distribution of chunks across shards and periodically rebalances data to ensure even distribution.
In SingleStore, the concept of chunks does not exist. Instead, data distribution is determined by hashing the shard key, which decides the partition where each row resides. Once hashed, the data remains in the assigned partition.
What features should I be aware of to help with performance on sharded tables?
To optimize performance on sharded tables in SingleStore, consider the following features:
- Sort key. Defining a sort key on sharded tables significantly improves query performance by reducing the amount of data to be scanned.
- Reference tables. Use reference tables for small, frequently accessed datasets. Reference tables are fully replicated across all leaf nodes, reducing data shuffling and improving query speed.
- Projections. SingleStore’s projections give users the ability to choose secondary shard and sort keys. This significantly improves query performance for queries that don’t benefit from the primary shard and sort key on the table.
- Rowstore tables. Although SingleStore defaults to columnstore tables, high-throughput OLTP workloads can benefit from SingleStore’s rowstore table type.
What are best practices for sharding in SingleStore?
The shard key selected can have a significant impact on performance. The following is a list of best practices when choosing a shard key in SingleStore:
- Ensure even distribution of data across leaf nodes by choosing a high cardinality shard key. This practice is similar to MongoDB's recommendation, and helps maintain balanced load and performance across the cluster.
- For tables frequently joined on, shard those tables on the columns used in the join. This ensures the joined rows will be collocated on the same leaf node, reducing data movement across the network when performing join operations.
- For high-concurrency OLTP workloads, select a shard key that ensures queries are confined to a single partition. This minimizes cross-partition communication and improves query response times.
- For heavy analytical workloads, it’s best to take advantage of the full cluster's resources by choosing a shard key which distributes data you will be scanning evenly across the cluster. This ensures your queries scanning and aggregating millions of rows will have multiple cores processing the data.
Note, sometimes these best practices may conflict with each other. For example, a subset of queries may benefit from a shard key on the id
column, while another set of queries may benefit from a different shard key. In this scenario, projections are very useful.
Are there any limitations when choosing a shard key?
If your table has a primary key, the shard key must be either the primary key itself or a subset of it. So, for example, if you have a primary key on the event_id
and account_id
columns, you have three options for your shard key:
event_id
andaccount_id
event_id
onlyaccount_id
only
Can I update my shard key after the table is created?
In SingleStore, once a table is created with a specified shard key, you cannot update or change the shard key. This limitation differs from MongoDB, where you have the option to update the shard key for a collection after it has been created.
If you need to change the shard key, you have to recreate the table with the new shard key configuration and migrate the data accordingly. This process involves creating a new table with the desired shard key, copying data from the old table to the new one and switching your applications to use the new table.
Do more with your Mongo-native applications with SingleStore Kai™
SingleStore Kai is an API for 100x faster operations on MongoDB — no query or code transformations required. From greater interoperability to vector search on JSON data and aggregations, SingleStore Kai solves common pitfalls and challenges developers experience with MongoDB.
Try SingleStore Kai — and start free — now.