Hi!
In my org we have two SingleStore clusters, primary and backup. Data is loaded to both in parallel by equivalent processes running in different data centers.
In our backup data center, I rebuilt some rowstore tables such that instead of using an explicitly specified shard key, they would be sharded on their primary keys by default. My assumption, perhaps misguided, was that this should reduce the memory footprint of a table. In practice it seems to have increased it.
Example, consider the following:
CREATE ROWSTORE TABLE foo (
id int,
some_value varchar,
...
PRIMARY KEY (id),
SHARD KEY id (id)
)
In this configuration, we have a “SHARD KEY” explicitly specified but it’s actually the same as the primary key. In SingleStore studio for this table, I see “PRIMARY KEY” takes 2.2 GB of memory and “SHARD KEY” takes 1.4 GB.
I rebuilt the table in our backup cluster as:
CREATE ROWSTORE TABLE foo (
id int,
some_value varchar,
...
PRIMARY KEY (id)
)
So that the primary key would also be the shard key. The primary key still takes 2.2 GB of memory, but the overall memory usage of the table in our backup cluster (where I rebuilt the table) is 1.3 GB greater than our primary cluster (which still has the separate shard key specified). That is, the entire table is 18 GB on backup and 16.7 GB on primary with all other factors (data, indexes, etc) being the same.
Does this track - using the primary key as the default shard key instead of specifying a shard key uses more memory? Or am I missing something?
We’re using version 7.5.8 on premises