Hi guys, I would like to understand more about the rowstore and columnstore techniques in MemSQL.
Let’s say I have a base already populated and I’m going to insert a new tuple.
With a rowstore table:
- Search for the fragmentation key (shard-key or PK).
- If you have the key, make your distribution using the hash technique.
- If you do not have the key, make your distribution using the round robin technique.
- In the node, it stores the fragment in RAM.
With a columnstore table:
- Search for the fragmentation key (shard-key or PK);
- If you have the key, make your distribution using the hash technique.
- If you do not have the key, make your distribution using the round robin technique.
- In the node, it identifies the row segment that can receive the tuple, respecting the value range of the key column or creates a new row segment.
- Place the tuple in a column segment of a given row segment.
- Update the RAM metadata of all the column segments involved, update the log and re-store the compressed data of the column segment on the HD / SSD.
The step-by-step I put in was what I was able to understand about its operation.
Could you correct or confirm this?
In columnstore, I also considered the possibility of the master node already consulting the aggregator’s metadata to know at which exact point in the cluster that tuple should be inserted. It would be great to keep the data sorted, but doing so, what is the point of setting up a shard-key for this table?
I read a lot in documentation, but it was not clear (I am writing about MemSQL).