Our backup command is failing because oom.
Leaf Error (memsql-leaf-01:3307): Memory used by MemSQL (28703.00 Mb) has reached the ‘maximum_memory’ setting (28665 Mb) on this node. Possible causes include (1) available query execution memory has been used up for table memory (in use table memory: 18855.88 Mb) and (2) the query is large and complex and requires more query execution memory than is available (in use query execution memory 0.00 Mb
Not only this is failing, the queries, instead of just taking more time are failing due to OOM. A simple count distinct or count_approx_distinct fails because of that. Other databases just run the queries until they timeout (I would prefer that instead).
I already analyzed the available memory (without any query running) and it’s even more confusing:
IP_ADDR | TYPE | MAX_MEMORY_MB | memory_used_mb | max_table_memory_mb | table_memory_used_mb |
---|---|---|---|---|---|
memsql-cluster-aggl | CA | 14285 | 4579 | 12856 | 3410 |
memsql-cluster-leaf-04l | LEAF | 28665 | 26186 | 25798 | 18411 |
memsql-cluster-leaf-03 | LEAF | 28665 | 26099 | 25798 | 18413 |
memsql-cluster-leaf-02 | LEAF | 28665 | 25954 | 25798 | 18351 |
memsql-cluster-leaf-01 | LEAF | 28665 | 27227 | 25798 | 18501 |
memsql-cluster-master | MA | 14276 | 8327 | 12848 | 3443 |
we see 26GB of memory use which is a lot a probably the cause of the failure. It doesn’t decrease even if I run analyze table XXX flush;
Analyzing what’s is consuming that memory I ran:
select sum(memory_use)/(1024*1024)
from information_schema.table_statistics
select sum(memory_use)/(1024*1024)
from information_schema.INDEX_STATISTICS is2
Total: 9979 + 1652 = 11GB.
Any idea how can we resolve this issue? We have almost all the queries failing by memory issues.
MEMSQL VERSION: 7.3.12
Thanks in advance for your help.