Shard key vs default to primary key memory usage

erica.anderson · August 9, 2022, 8:03pm

Hi!

In my org we have two SingleStore clusters, primary and backup. Data is loaded to both in parallel by equivalent processes running in different data centers.

In our backup data center, I rebuilt some rowstore tables such that instead of using an explicitly specified shard key, they would be sharded on their primary keys by default. My assumption, perhaps misguided, was that this should reduce the memory footprint of a table. In practice it seems to have increased it.

Example, consider the following:

CREATE ROWSTORE TABLE foo (
id int,
some_value varchar,
...
PRIMARY KEY (id),
SHARD KEY id (id)
)

In this configuration, we have a “SHARD KEY” explicitly specified but it’s actually the same as the primary key. In SingleStore studio for this table, I see “PRIMARY KEY” takes 2.2 GB of memory and “SHARD KEY” takes 1.4 GB.

I rebuilt the table in our backup cluster as:

CREATE ROWSTORE TABLE foo (
id int,
some_value varchar,
...
PRIMARY KEY (id)
)

So that the primary key would also be the shard key. The primary key still takes 2.2 GB of memory, but the overall memory usage of the table in our backup cluster (where I rebuilt the table) is 1.3 GB greater than our primary cluster (which still has the separate shard key specified). That is, the entire table is 18 GB on backup and 16.7 GB on primary with all other factors (data, indexes, etc) being the same.

Does this track - using the primary key as the default shard key instead of specifying a shard key uses more memory? Or am I missing something?

We’re using version 7.5.8 on premises

erica.anderson · August 19, 2022, 3:18am

Can anyone help with this? I’m trying to get the right information about this because we’d really like to conserve memory usage as much as possible. Thanks!

MariaSilverhardt · August 19, 2022, 7:58pm

Happy Friday Erica!

I’m sorry to hear that things didn’t go as planned. We understand that reducing memory footprint is very important and your issue has been forwarded to the internal support team.

In the meantime, we encourage anyone in the community who has experience with this to provide any tips or feedback they can share.

Thanks for providing your version number and we appreciate your patience.

erica.anderson · March 24, 2023, 5:56pm

Has anyone been able to look into this? I tried it again on another table that was 43 GB, 4.4 GB of which was used by the shard key which was the same column as the primary key. Removing the shard key but keeping the primary key, the table size went up to 57 GB according to the numbers in SingleStore Studio.

hanson · March 24, 2023, 11:41pm

Hi Erica. When you specify a primary key but no shard key, the table is sharded by the primary key. And there will be an index on the primary key. If you specify a shard key explicitly, it will also create a skiplist index on the shard key. It seems the system may have a duplicate index on the same column if the explicit shard key is the same as the primary key. I’ll have the developers check on that.

I’d recommend just using PRIMARY KEY and no explicit shard key in this case.

Here’s some output on memory usage for indexes for this situation. I put about 100 rows in each table.

singlestore> using information_schema select database_name, table_name, sum(memory_use) from INDEX_STATISTICS where table_name in ("rs","rs2") group by all;
+---------------+------------+-----------------+
| database_name | table_name | sum(memory_use) |
+---------------+------------+-----------------+
| db1           | rs2        |           10712 |
| db1           | rs         |            6592 |
+---------------+------------+-----------------+
2 rows in set (0.03 sec)

singlestore> show create table rs2;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| rs2   | CREATE ROWSTORE TABLE `rs2` (
  `a` int(11) NOT NULL DEFAULT '0',
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`),
  SHARD KEY `a` (`a`)
) AUTOSTATS_CARDINALITY_MODE=PERIODIC AUTOSTATS_HISTOGRAM_MODE=CREATE SQL_MODE='STRICT_ALL_TABLES' |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

singlestore> show create table rs;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                        |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| rs    | CREATE ROWSTORE TABLE `rs` (
  `a` int(11) NOT NULL DEFAULT '0',
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`)
) AUTOSTATS_CARDINALITY_MODE=PERIODIC AUTOSTATS_HISTOGRAM_MODE=CREATE SQL_MODE='STRICT_ALL_TABLES' |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

erica.anderson · March 27, 2023, 7:15pm

Thanks @hanson - I can confirm that I see the amount of memory used by the indexes is lower when I let the shard key default to the primary key (instead of explicitly specifying the shard key, creating the extraneous index).

However, there seems to be a side-effect wherein the total memory footprint of the table seems to increase significantly when defaulting to shard on the primary key.

For my table, we have:

CREATE ROWSTORE TABLE foo1 (
pk int(11) NOT NULL,
other_col int(11) DEFAULT NULL,
...
PRIMARY KEY (pk),
SHARD KEY (pk)
);

x ~60m rows
Memory usage by indexes: about 15 GB
Memory usage of entire table: 43 GB

compared to

CREATE ROWSTORE TABLE foo2 (
pk int(11) NOT NULL,
other_col int(11) DEFAULT NULL,
...
PRIMARY KEY (pk)
);

x ~60m rows
Memory usage by indexes: about 10 GB
Memory usage of entire table: 57 GB

^ This is based on what is displayed in SingleStore studio but also confirmed via these queries

select database_name, table_name, sum(memory_use) from information_schema.INDEX_STATISTICS where table_name in ("foo1","foo2") group by 1,2;

select database_name, table_name, sum(memory_use) from information_schema.TABLE_STATISTICS  where table_name in ("foo1","foo2") group by 1,2;

hanson · March 27, 2023, 10:12pm

I can’t see an obvious explanation for that. Let me check with the develpers.

hanson · March 28, 2023, 10:52pm

I opened a bug to track this. Not clear if it is a true bug, or a reporting problem, or something else. To set expectations, it’s not obvious this is a blocker for anybody, so I expect it will be low priority. But we’ll keep it on file and may get to it if we’re able.