Helios and plancache - how does it work, can I set it to zero for performance tests?

rkoch · October 7, 2019, 3:14pm

Is there anyway to not have plancache used or can I set it to zero for my Helios cluster so I can run solid concurrent performance testing?

I found forum documentation and memSQL self-managed documentation about plancache (not sure if you can set values to zero even there). one of the links says that the doc is not for Helios but doesn’t explain how Helios works with plancache…

Here are the links I found:

Thanks,

Rich

rob · October 7, 2019, 7:45pm

What are you trying to accomplish with your performance testing? I ask this because the plancache in memSQL only caches the query plans, not the results of the query, so subsequent runs will still need to fully re-execute the query. As a result, if you just want to see the affect of many concurrent queries on the system, it would not be necessary to clear the plancache.

rkoch · October 9, 2019, 6:49pm

Thanks for the reply.

The concern was that the same exact query would run a lot faster after the 1st time it was executed. That is the behavior we were seeing with other systems to which we were comparing memSQL/Helios. I was able to turn off ‘query caching’ so far in the other systems and wanted to do the same in memSQL to make sure we were comparing apples to apples. I was running from 1 to 10 threads, executing 100 queries total, using the same query pool of 25 different queries so there would be some repeats.

Having run some small performance testing now against Helios - it does not seem to matter that plancache parameters are set–every repeated query timing seems similar right now.

jack · October 16, 2019, 9:52pm

Plan caching is different than query caching in a lot of other systems. Many other systems have query caching where if an identical query is run a second time, and the data hasn’t changed, the results cached from the first execution are sent back and the query doesn’t need to run again. Therefore the runtime of the second query is basically 0.

On the other hand, in MemSQL we only cache query plans, not query results, like Rob said above. Second runs of the same plan are faster because we can use the previously compiled plan, but still fully re-execute the query. This is much more general than query caching - any query with the same shape, but with different values of parameters, will be able to reuse the same plan, and of course the same plan can be used as the data changes. See SingleStoreDB Cloud · SingleStore Documentation and SingleStoreDB Cloud · SingleStore Documentation for more information.

What makes the most sense to performance test depends on what your application workload looks like. If you mostly run a fixed set of query shapes/templates with different parameters subbed in, then those will be able to reuse cached plans. While if you mostly run unique new query shapes with e.g. different filter expressions, joins, groupings being added, then you would care about the first-run performance, in which case you would want to include the time it takes to compile the query plans in your performance testing. It is tricky to test that accurately, especially in a concurrent setting. But a reasonable place to start is to run DROP ALL FROM PLANCACHE on all nodes immediately after each query (note that plan_expiration_minutes must not be set to 0 for this to work).

Also, setting plan_expiration_minutes = 0 will only expire plans from memory, but they will remain on disk, so it won’t do what you want.