By combining multiple columns using CONCAT & SHA2 ,we are creating a primary key column. Please guide us which is the best DATA TYPE to store SHA2 retrun values in a colum store table for better compression and perfomance.
If the column you create this way is going to have unique values, and it is unique in this case since it is a primary key, then you are better off converting it to binary and storing it that way if you want to save space. E.g. you can use UNHEX on a hex string to get binary.
Of course, you will have to pass in search arguments in binary in your queries if you want to filter on this column.
We are creating a primary key using this SHA2 function. Bascically it is a combination of 6 columns. We are planning to do sharding also based on this column . The table type which we are using is column-store table.
is it advisable to use SHA2 generated column in shard key ?
Is there any adverse effect if I store it in a varchar(64) column insted of binary?
re:
is it advisable to use SHA2 generated column in shard key ?
- it will spread the data evenly, and should be unique, but are you going to search on this? If not, then all your searches will have to look at all the partitions. Maybe that’s okay for you. Also, it’s not a great idea to join on a wide key like this because it won’t be that fast. You’re better off using an int or bigint id to join on.
re: Is there any adverse effect if I store it in a varchar(64) column insted of binary?
yes, it will take twice as much space as a binary version of the same thing, assuming the string is all hex digits.