For transaction logs:
3) applies to version 7.0 and higher and for reference databases we create smaller log files by default (64 mb). Both of those are controllable by global variables log_file_size_partitions and log_file_size_ref_dbs.
For columnstore files:
2) the rowstore segment is stored like a normal rowstore table (its data is logged to the transaction log and the rows are held in-memory). Once enough rows are stored in the rowstore segment in-memory the data is converted to columnstore format and written to disk (and removed from the rowstore segment) by the background flusher.
4) The INSERT/LOAD/UPDATE query will create the files in columnstore format directly themselves if they’re writing enough data (at each partition). If they are writing small amounts of data, they will go through the process described at 2) (accumulate in-memory until we have enough rows to write out to disk in columnstore format). The background merger doesn’t play a role in the initial writes of new rows. Its mainly responsible for keeping the data sorted and for cleaning up deleted rows. Some more details here: SingleStoreDB Cloud · SingleStore Documentation
Q1. If columnstore_disk_insert_threshold=0 all writes will skip the in-memory segment (so even a 1 row write will be converted to columnstore format and written to disk). This can cause issues with lots of small files on disk (the background merger will be running and building bigger files out of the smaller files, but it may not keep up).
Q2. Its pretty hard to get a row that won’t fit into memory (You will run into the max_allowed_packet first and the query will be rejected before it runs). If you do create one though (say via a bunch of string concats), the query will fail with an “out of memory” error. The row needs to fit in-memory to be compressed.
Q3. Both affect it. All rowstore segments for all columnstore tables (summed up) can’t exceed maximum_table_memory. Queries will start to throttle if memory use by in-memory segments gets too close to maximum_table_memory (to allow the flushers to catch up). columnstore_flush_bytes is how many bytes we want for before doing any flushing to disk. Its the most direct know to control the size of the in-memory segment.
Q4. Blob in MemSQL columnstore terminology means a compressed file on disk stored in columnstore format (not a BLOB datatype in a table)