Feature Request: Rowsegment Length Tolerance

Greetings SingleStore Team,

I was wondering if the OPTIMIZE FULL operation could be substantially sped up if the rowsegment length requirement were relaxed?

I see the result of our OPTIMZE FULL resulting in equal length row segments.

If a merge inserts rows into the first segment, it seems like the evicted rows get pushed into the next segment and so on such that the entire columnstore get re-written?

If that is the case, it seems like if you merge in 500k rows news into the first row segment, you could instead just merge those rows plus the original 1mm rows (total 1.5mm rows) into two 750k rowsegments and leave all the other rowsegments untouched. I have to imagine the query performance profile of having rowsegments in the range of 70 to 130% of the “optimal” rowsegment length would be fine.

This could be specified in the OPTIMIZE TABLE FULL syntax where there is an option like

OPTIMIZE TABLE my_table FULL ROWSEGMENT_LENGTH_TOLERANCE N

where N is a parameter (default 0 =current behavior, maybe max 50) that permits re-built / merged-in rowsegments to be up to N% smaller or larger than the ideal size.

I think this feature would permit very fast full optimization of even the largest columnstore tables.

-Rich

OPTIMIZE TABLE FULL by design is supposed to resort all the data and generate even-sized segments. So it is an expensive operation.

Have you tried OPTIMIZE TABLE FLUSH?

What is your performance tuning goal, or what problem do you want solved? Any other approach besides a full sort would give less query performance benefit.

For our own reference, we created internal feature request PM-3019 to track this.

Our engineer for this area says the way to get close to the benefit of OPTIMIZE TABLE FULL but not have to do a full table rewrite for a columnstore table is to do “OPTIMIZE TABLE“ without the FULL option.

We don’t really understand the request about size tolerance. The slow part about optimize table full is due to it sorting all data, not because of the segment sizes.