We have to do around 100k records upsert into a table. The table has around 200-300 columns. What we are currently doing is dividing that 100k records into 1k batch and doing the upserts. But this is causing a high memory spike. The table is columnar.
What is the best way to do heavy upsert into a columnar table?
It’s possible even 1k batches are too big, though not likely. Consider reducing the batch size to 500 or 250, say.
I’m not sure what’s causing you to run out of memory. What error message are you getting? It might be that the in-memory rowstore segment is getting too big. Normally it would flush automatically though, so that’s probably not it. But you could try running OPTIMIZE TABLE FLUSH after every 10 batches or so. That would flush the contents of the in-memory rowstore segment to disk.
If you are running FLUSH on a table and you are querying the table at the same time, they will be able to run at the same time concurrently without blocking each other – FLUSH is an online operation. But of course they will be competing with each other for CPU time (sharing time).
However, there is a brief blocking period at the start of the optimization process. The OPTIMIZE TABLE command will wait until all DML queries that were already running on the table finish before it begins optimizing. This allows in-progress queries to complete and ensures consistency of results ( 1, 2 ).