Query plan not being re-used

george.kolampas · September 11, 2020, 5:11pm

Hi, i’m having issues with certain queries when it comes to plan reuse. It seems that queries that contain WITH(v = ‘some_number’) the some_number is not being parameterized. So basically I have multiple query plans in the plancache that basically only differ by the some_number value. When I run explain on the different queries everything is identical except for the ColumnStoreFilter Line. And even that line is the same except for the x in the following example. ColumnStoreFilter [RAND(CONV(SUBSTR(MD5(CONCAT((81609),’ ', PARTITION_ID())), 1, 15), 16, 10)) <= xe-03]. Is there any way to get around this. For more context the queries have OPTION(INTERPRETER_MODE=INTERPRET_FIRST) and the way I know they are not being reused is because there a many queries in the plancache with a single commit that again only differ in the way I explain above. Thanks

hanson · September 12, 2020, 12:38am

Hi George. Welcome to the Forums! Can you give us a full repro schema and query? I’m not sure I understand the full picture yet.

george.kolampas · September 14, 2020, 2:32pm

When running SHOW PLANCACHE;
There are multiple entries with commit 1 that look like the following:
SELECT ORDERED_SERIALIZE(hidden1), ORDERED_SERIALIZE(hidden2), ORDERED_SERIALIZE(hidden3), ORDERED_SERIALIZE(hidden4), ORDERED_SERIALIZE(hidden5), ORDERED_SERIALIZE(hidden6), ORDERED_SERIALIZE(hidden7) FROM hidden8 WITH (sample_ratio = 0.188707729469) OPTION(INTERPRETER_MODE=INTERPRET_FIRST).
The only difference between all the entries is the number after sample_ratio.
When I run EXPLAIN on all those queries the result is identical expect for the ColumnStoreFilter Line which for the most part is also identical but only differs by the number which is marked by x in the following ColumnStoreFilter [RAND(CONV(SUBSTR(MD5(CONCAT((81609),’ ', PARTITION_ID())), 1, 15), 16, 10)) <= xe-03]

jack · September 14, 2020, 7:47pm

This can happen because each time you run ANALYZE TABLE, MemSQL internally runs several queries to collect statistics, which run with a different sample ratio depending on the table size; and different sample ratios do require generating different plans.

As this only affects stats collection, and the queries involved are pretty cheap to compile, it shouldn’t be that detrimental to performance. If you run ANALYZE a lot, there could potentially be a large amount of memory used in the plancache for these plans, if that is an issue then it is safe to drop these plans (they will also expire to disk after they are unused for a period of time, like other plans).

However we will be making an improvement in a future release to make this more efficient and reduce the amount of different plans generated by only picking powers of two for the sample ratio. This means we would typically only have a small number of different sample ratios and plans.

george.kolampas · September 15, 2020, 6:17pm

We have other queries that aren’t like this that we use on a view that are not reused. Are plancaches supposed to work on views aswell. Here is an example of one that we run:
SELECT SUM(alias_1) AS alias_1, alias_2 AS alias_2 FROM (SELECT hidden_column AS primary_key,@ AS alias_1,hidden_column1 AS alias_2 FROM table1.hidden_column2 WHERE @=@ AND hidden_column3 in (@) AND ( hidden_column3 IN (^))) hidden_column4 LEFT JOIN (SELECT hidden_column AS hidden_column4 FROM hidden_table fact WHERE @=@ AND ymdh >= ^ AND ymdh < ^ AND ymdh < ^ AND fact.hidden_column IN (^) GROUP BY hidden_column ) fact_table on fact_table.hidden_column = dhidden_column.primary_key GROUP BY alias_2

hanson · September 24, 2020, 8:28pm

Yes, queries on views are compiled and the plans are stored in the plan cache and can be reused.