We’re running an analytics application, where customers expect to see both historical data and real-time data. Customers can define rules for how we should process their data, sometimes these rules change so much, that we need to reprocess all of their historical data. In this case we use S3 pipelines to ingest the historical data again after it’s been processed using the new rules.
As mentioned, then we also provide real-time data. This means, that we continuously append data to our cluster. Data of the same type are inserted into the same table for the customer, meaning that S3 pipelines and our continuous append will target the same table.
What we’re experiencing is, that when the pipelines are running, the continuous appends are blocked. The data should not conflict due to our shard key which includes the date of the data, hence historical data should not conflict with the continuous data.
We can easily ingest millions of data points per minute continuously, but when the pipelines are running, we’re down to less than 100k a minute, causing a rather severe backlog of data to be ingested and our real-time data is no way close to real-time. The files being read from S3 are of varying size, some small, some large (we’re capping at 1GB).
Our assumption is, that the table is being locked at the partition level, but since there are a lot of files all containing more than columnstore_table_lock_threshold entries, then all table partitions are being locked, causing a full table lock.
We would like to know if our assumption is plausible and possibly if others have experienced similar. More importantly, we’d like to know if somebody has some prior knowledge on how to best overcome this problem. We have ideas, but if there’s an easy fix or a lesser known configuration we can change, then we’d prefer that.
The first thing that comes to mind is to ingest the data in smaller batches. I’m not sure of the best way to do that. Maybe consider using kafka pipelines and have each message contain batches of a few thousand rows at a time (under the columnstore_row_value_table_lock_threshold ).
You could also increase that variable. Not sure of the ramifications of that. Of course, it may take more memory for locks.
We have some logic in place that makes it a lot easier for us to do all the processing and put data in S3 prior to ingesting. It’s of course valid, to have some other process read the data we put on S3 and push to kafka, so thanks for the suggestion
I was not aware of the columnstore_row_value_table_lock_threshold variable. So the variable I mentioned, columnstore_table_lock_threshold, is solely when updating or deleting rows from a columnstore table. The columnstore_row_value_table_lock_threshold is specifically for inserts. This is interesting, specially as the default is 1M entries We’re in many cases above 1M entries, but that could be something we can look at.
We thought the value of columnstore_table_lock_threshold would be the one causing locks and that value is at default 5k (we’ve increased to 50k in our cluster), but having a couple of hundred thousand files with 50k entries in each is also not efficient use of S3. However, if it’s files of 1M entries, then we can try to play a long with that.
I am also reading you answers, as a confirmation that when a pipeline goes above the columnstore_row_value_table_lock_threshold for all partitions, then it effectively locks the entire table. When a pipeline runs, if using the default configuration, it’ll reserve one batch per partition and execute all those simultaneous. However, the pipeline will wait for all batches to be done, before reserving new ones. Do you know if a partition that has been locked as part of the pipeline insert, will remain locked until all reserved batches are done? Or will the lock be released immediately when the insert is done for a specific partition?
Lastly, and in regards to the above, why is it that the pipelines reserves batches and waits for all to be done, before reserving new ones? Surely developers should strive to make even payloads for the pipelines to ingest, but it’s sometimes out of the hands of the developers. So would it be possible to add an option telling the pipeline to reserve continuously instead of awaiting all reserves batches to be done first? I know this is sort of off topic, but I’m curious
Thanks for taking your time to look into this and answering @hanson, I really appreciate it.