Memory Allocation Error in SingleStore for SQL Query with Aggregation

Hello There Community,

We are encountering a recurring issue with running a specific query on our SingleStore cluster, even after restarting services and increasing the memory capacity. The query in question is as follows:

DROP TABLE DB.TABLE_NAME;
CREATE TABLE DB.TABLE_NAME AS
SELECT 
    A.TIME_PERIOD_ID AS time_period_id,
    A.SALES_NUMBER AS sales_number,
    SUM(COALESCE(T1.DATA_FIELD_1, 0)) AS data_field_1,
    SUM(COALESCE(T1.DATA_FIELD_2, 0)) AS data_field_2,
    SUM(COALESCE(T1.DATA_FIELD_3, 0)) AS data_field_3,
    SUM(COALESCE(T1.DATA_FIELD_4, 0)) AS data_field_4,
    SUM(COALESCE(T1.DATA_FIELD_5, 0)) AS data_field_5,
    SUM(COALESCE(T1.DATA_FIELD_6, 0)) AS data_field_6,
    SUM(COALESCE(T1.DATA_FIELD_7, 0)) AS data_field_7,
    SUM(COALESCE(T1.DATA_FIELD_8, 0)) AS data_field_8,
    SUM(COALESCE(T1.DATA_FIELD_9, 0)) AS data_field_9,
    SUM(COALESCE(T1.DATA_FIELD_10, 0)) AS data_field_10,
    SUM(COALESCE(T1.DATA_FIELD_11, 0)) AS data_field_11,
    SUM(COALESCE(T1.DATA_FIELD_12, 0)) AS data_field_12,
    SUM(COALESCE(T1.DATA_FIELD_13, 0)) AS data_field_13,
    SUM(COALESCE(T1.DATA_FIELD_14, 0)) AS data_field_14,
    SUM(COALESCE(T1.DATA_FIELD_15, 0)) AS data_field_15,
    SUM(COALESCE(T1.DATA_FIELD_16, 0)) AS data_field_16,
    SUM(COALESCE(T1.DATA_FIELD_17, 0)) AS data_field_17,
    SUM(COALESCE(T1.DATA_FIELD_18, 0)) AS data_field_18,
    SUM(COALESCE(T1.DATA_FIELD_19, 0)) AS data_field_19,
    SUM(COALESCE(T1.DATA_FIELD_20, 0)) AS data_field_20,
    SUM(COALESCE(T1.DATA_FIELD_21, 0)) AS data_field_21,
    SUM(COALESCE(T1.DATA_FIELD_22, 0)) AS data_field_22,
    SUM(COALESCE(T1.DATA_FIELD_23, 0)) AS data_field_23,
    SUM(COALESCE(T1.DATA_FIELD_24, 0)) AS data_field_24,
    SUM(COALESCE(T1.DATA_FIELD_25, 0)) AS data_field_25,
    SUM(COALESCE(T1.DATA_FIELD_26, 0)) AS data_field_26,
    SUM(COALESCE(T1.DATA_FIELD_27, 0)) AS data_field_27,
    SUM(COALESCE(T1.DATA_FIELD_28, 0)) AS data_field_28,
    SUM(COALESCE(T1.DATA_FIELD_29, 0)) AS data_field_29,
    SUM(COALESCE(T1.DATA_FIELD_30, 0)) AS data_field_30,
    SUM(COALESCE(T1.DATA_FIELD_31, 0)) AS data_field_31,
    SUM(COALESCE(T1.DATA_FIELD_32, 0)) AS data_field_32,
    SUM(COALESCE(T1.DATA_FIELD_33, 0)) AS data_field_33,
    SUM(COALESCE(T1.DATA_FIELD_34, 0)) AS data_field_34,
    SUM(COALESCE(T1.DATA_FIELD_35, 0)) AS data_field_35,
    SUM(COALESCE(T1.DATA_FIELD_36, 0)) AS data_field_36,
    SUM(COALESCE(T1.DATA_FIELD_37, 0)) AS data_field_37,
    SUM(COALESCE(T1.DATA_FIELD_38, 0)) AS data_field_38,
    SUM(COALESCE(T1.DATA_FIELD_39, 0)) AS data_field_39,
    SUM(COALESCE(T1.DATA_FIELD_40, 0)) AS data_field_40,
    SUM(COALESCE(T1.DATA_FIELD_41, 0)) AS data_field_41,
    SUM(COALESCE(T1.DATA_FIELD_42, 0)) AS data_field_42,
    SUM(COALESCE(T1.DATA_FIELD_43, 0)) AS data_field_43,
    SUM(COALESCE(T1.DATA_FIELD_44, 0)) AS data_field_44,
    SUM(COALESCE(T1.DATA_FIELD_45, 0)) AS data_field_45
FROM DB.SOURCE_TABLE A
LEFT JOIN DB.JOIN_TABLE T1
    ON A.SALES_NUMBER = T1.SALES_NUMBER
    AND A.TIME_PERIOD_ID = T1.time_period_id
GROUP BY A.TIME_PERIOD_ID, A.SALES_NUMBER;

QUIT;

The error we are getting is:

The operating system failed to allocate memory (MemSQL memory use 38552.75 Mb). The request was not processed.

Additional Table Information:

  1. Source Table (DB.SOURCE_TABLE):
    o Contains approximately 90+ million rows.
    o Total size is 300 MB with 80% compression.
  2. Join Table (DB.JOIN_TABLE):
    o Contains approximately 90+ million rows.
    o Total size is 20 GB with 60% compression.

Steps Taken:

  1. Increased leaf node capacity: We added more memory and scaled the cluster horizontally with additional leaf nodes.
  2. Restarted nodes and services: Cleared memory and restarted nodes temporarily to get past the issue.
  3. Rebalanced partitions on schemas and memsqlctl update-config --set-global --key maximum_memory --value 65536 executed successfully.
    However, despite these measures, the problem persists. Our cluster still encounters memory issues when processing this query.

Questions:

  1. Are there any other optimizations we can make to this query to prevent memory overload?
  2. Could this issue be related to SingleStore’s setup configurations? Is there any specific YAML configuration or related config file we can adjust to avoid this problem
    Any advice or insights would be highly appreciated!

Can you make a debug json profile and upload it, if you can get a successful run? Consider filtering on the time period to make it small enough to run if it won’t finish, so you can get a debug profile. Or, take out all the SUMs and just put a count(*) and get a debug profile of that. That should save a lot of RAM and probably use a similar query plan.

What’s version are you running? The latest versions of SingleStore have hash join spilling, which might help.

Also, make sure you are sharding both tables on both the join columns if you can, or at least sales_number, so you can get a collocated join.

One thing you could do if all else fails would be to logically divide the grouping key range in two or more parts and run a query for each part of the key range.

1 Like