We have a simple query that does a join on three tables and returns the result via an API. the first qurery is slow (eg 7 seconds). the 2nd one (exact same request) is much faster, like 3 seconds. When we leave the system for a whitle, the same thing happens again. Why is that?
I know for MySQL there is cache which helps to increase the 2nd and 3rd query. But shouldn’t MEMSQL’s first query response be fast?
Any idea you have on what we can do to make this faster?
Hey! The reason for this is almost certainly that MemSQL JIT-compiles queries, so that the first time you run a particular query shape, it takes a while, but then subsequent queries are fast.
Thanks Alex. so for our use cases, users come in for about 2 mins, and then return the next day. so if the first case is taking that long, it is unacceptable for our purposes.
Any idea what we can do to improve? I am not on the Technical development side here, but is it possible for MEMSQL to have the API stored so there is no compilation? (just brainstorming here).
I looked at the link you gave (10-15 seconds precompile). that was helpful. I searched around in the blogs but didn’t find it. I don’t think the thread reached a resolution. any idea?
Hi, Could you share the profile of a warm and cold run of your test query to help us pinpoint where the time is spent? Run, profile select ... followed by show profile json and post the output.
users come in for about 2 mins, and then return the next day. so if the first case is taking that long, it is unacceptable for our purposes.
That makes sense! So, this is an important part of your statement:
When we leave the system for a whi[]le, the same thing happens again.
When you run a query, SingleStore checks to see if the query has been compiled already into a fast version. The fast versions are cached in something called the plancache. If the query isn’t in the plancache, it can take a few seconds to compile it, and then it’ll be faster thereafter. However, it sounds like your plancache is getting cleared out earlier than you want. If you check out Managing Plancache Memory and Disk Usage · SingleStore Documentation, you can do e.g. SET GLOBAL plan_expiration_minutes = 2880 to make sure the compiled queries don’t expire until 48 hours, and so if you run the queries more often than 48 hours, you should get that performance you want (3 seconds instead of 7 seconds).
Disclaimer: I don’t for SingleStore, but this is just based on my understanding / reading the docs
Thanks Alex. this is very helpful. we’ll try that.
Is there something i ‘lose’ by setting the plancache for longer duration? why not set it for 10 days? There must be a trade off and I lose something in return right?
I want to make sure I understand what is ‘a query’. Let’s say i use the same api to return requests, but i change the userid. if a 100 users use the same api, but each one customizes the response by sending in their unique userid, is that considered 100 queries, or is that considered 1 query since the same API is being used? I am presuming that is 100 queries since the userid is now embedded into the ‘precompiled’ query and each user has their own unique pre-compiled query.
Glad to be of help! And the only thing you lose, to my knowledge, is disk space and memory space. For reference, the system my company has created generates pretty complex queries on the fly, and multiple times we’ve ended up using something like 10GB of plancache in memory and about the same on disk, which has brought our system to a complete halt several times (we didn’t set up much disk space).
This brings me to “what counts as a query”. When we were reaching these crazy plancache sizes, we were generating queries that weren’t “structurally identical”. By “structurally identical”, I mean that SingleStore appears to do what e.g. JDBC does with query parameterization and just compare the parameterized versions of the queries. For instance select * from whatever_table where x = 1 is in this sense “structurally identical” to select * from whatever_table where x = 2 — here 1 and 2 are treated as parameters to a query. We ended up sorting some clauses in our query to make sure we had as much structural identity as possible (e.g. we would generate the clauses x = ? and z = ? and y = ?, z = ? and y = ? and x = ?, etc., and we sorted this part to always have it be x = ? and y = ? and z = ?). You probably aren’t generating queries on the fly so that part probably doesn’t apply to you, but just was throwing it out there.
Anyway, to directly answer question 2, these seem like all one (structurally identical) query for plancache/precompilation purposes.