The operating system failed to allocate memory

Hi,

When I tried to execute join clause on a two tables which holds billion records, received “Memory allocation issue”.

QUERY:
SELECT
COUNT(1)
FROM
tableA a
JOIN
tableB b
ON
a.id = b.id
WHERE
a.category = ‘Sample’;

ERROR:
Leaf Error (172.31.17.174:3306): The operating system failed to allocate memory (MemSQL memory use 26845.88 Mb).

CHECKS:
As a preliminary check, verified skew of the tables and they are looking fine. Verified total_cluster_memory, table memory. I’m using 32GB and 4 cores machine conf with two leaf nodes.

select @@global.maximum_memory;
±------------------------+
| @@global.maximum_memory |
±------------------------+
| 28233 |
±------------------------+
1 row in set (0.06 sec)

memsql> select @@global.maximum_table_memory;
±------------------------------+
| @@global.maximum_table_memory |
±------------------------------+
| 25409 |
±------------------------------+
1 row in set (0.06 sec)

SHOW VARIABLES LIKE “maximum_%”;
±---------------------±------+
| Variable_name | Value |
±---------------------±------+
| maximum_memory | 28233 |
| maximum_table_memory | 25409 |
±---------------------±------+
2 rows in set (0.00 sec)

memsql> SHOW STATUS EXTENDED LIKE “Alloc_table_memory”;
±-------------------±--------------------+
| Variable_name | Value |
±-------------------±--------------------+
| Alloc_table_memory | 60.833 (+60.833) MB |
±-------------------±--------------------+
1 row in set (0.00 sec)

±--------------±-----------------±----------±----------±----------±---------±----------------±----------------±----------------±-----------±------------+
| DATABASE_NAME | TABLE_NAME | MIN(ROWS) | MAX(ROWS) | avg_rows | row_skew | MIN(MEMORY_USE) | MAX(MEMORY_USE) | total_memory_mb | avg_memory | memory_skew |
±--------------±-----------------±----------±----------±----------±---------±----------------±----------------±----------------±-----------±------------+
| test | A | 128819526 | 152154057 | 138114683 | 0.052 | 0 | 0 | 0.0000 | 0 | NULL |
| test | B | 134280452 | 142694918 | 138114683 | 0.017 | 0 | 0 | 0.0000 | 0 | NULL |

Do we need to add new leaf node and rebalance the entire cluster? or Can we maximize the cluster capacity.

Thanks
Arun

Can you share the explain for the query?

Any resolution on this error?

The queries individually in my case are running good, it’s when executed via the API… multiple queries is the problem.

Also, mine is a self hosted cluster (AWS t2.xlarge)