Consider a table like:
CREATE TABLE `log` (
`id` bigint(20) unsigned NOT NULL,
`timestamp` datetime NOT NULL SERIES TIMESTAMP,
`daytimestamp` as date_trunc('day', timestamp) PERSISTED datetime,
`some_id` bigint(20) unsigned NOT NULL,
`some_other_id` bigint(20) unsigned DEFAULT null,
`text` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
`customer_id` int(10) unsigned NOT NULL,
SHARD KEY (`some_id`),
KEY (`customer_id`,`daytimestamp`) USING CLUSTERED COLUMNSTORE,
UNIQUE KEY (`customer_id`,`id`, `some_id`) USING HASH
);
Each customer stores millions (sometimes hundreds of millions) of logs. Every now and then a customer leaves and we need to delete their data.
Currently we’re doing the delete in batches of 50k, to try and avoid too much lock contention on the table, but it’s painfully slow:
delete from log where customer_id = @customer_id limit 50000
This lead us to wondering why it’s not possible to truncate by sort key. By defining the sort key, we tell SingleStore that the data should always be sorted by the customer_id
first, hence we would believe that the data is stored according to this in some manner. So being able to saying something like:
truncate table log where customer_id = @customer_id
would be really beneficial. The statement should of course require that the constraint is in order of the sort key.
We’re also using SingleStore Cloud DB, so a lot of the data is in the cold storage, where it would be way more beneficial to simply drop the objects in the cold storage, without first pulling them into hot storage, flagging the rows for deletion and then persisting those changes to cold storage. I’m guessing that’s how it works, but I might be off
If this is already doable, then I apologize. Any pointers on how to best do bulk deleting without causing too much lock contention is also greatly appreciated.