Shard key having Nulls

Hi,

What are the rules regarding shard key and key for clustered columnStore ?

I need to make a column as Shard key and also for Clustered columnStore, but it may contain Nulls

What will be the impact of keeping a Nullable column as Shard key ?

CREATE TABLE test (
name varchar(25) DEFAULT NULL,
ID int(11) DEFAULT NULL,
update_date date DEFAULT NULL,
SHARD KEY (update_date) USING CLUSTERED COLUMNSTORE
)

You can do this, but the SHARD KEY should be a high-cardinality column with even data distribution. Otherwise, your partition sizes will be skewed. If a lot of values of update_date are NULL, they will all land in the same partition given the schema you provided.

Moreover, sharding on a date column is usually not a great idea if you have date range filters in many of your queries. That’s because then you won’t get much parallelism for range filters that only touch a narrow range of data–a few nodes might be doing all the work.

You could instead shard on ID (assuming it is unique) and have your columnstore sort key on date. That’s more typical. Like this:

CREATE TABLE test(
name varchar(25) DEFAULT NULL,
ID int(11) DEFAULT NULL,
update_date date DEFAULT NULL,
SHARD(ID),
KEY ( update_date ) USING CLUSTERED COLUMNSTORE
);

See also Optimizing Table Data Structures · SingleStore Documentation