Hey,
I have deployed a cluster using 1 master node and 7 leaf nodes. Here is the configuration.
I have created a database using the command CREATE DATABASE IF NOT EXISTS tpch;
and then a table using the command
CREATE TABLE lineitem(
l_orderkey INT,
l_partkey INT,
l_suppkey INT,
l_linenumber INT,
l_quantity INT,
l_extendedprice INT,
l_discount INT,
l_tax INT,
l_returnflag INT,
l_linestatus INT,
l_shipdate INT,
l_commitdate INT,
l_receiptdate INT,
l_shipinstruct INT,
l_shipmode INT,
l_comment VARCHAR(44))
Finally I have imported the data by creating and executing the following pipeline (the total size of the data is 108GB).
CREATE PIPELINE lineitem
AS LOAD DATA FS "/mnt/scratch/dkoutsou/dbgen/modified-dbgen/lineitem.*.parquet"
INTO TABLE lineitem
(
l_orderkey <- l_orderkey,
l_partkey <- l_partkey,
l_suppkey <- l_suppkey,
l_linenumber <- l_linenumber,
l_quantity <- l_quantity,
l_extendedprice <- l_extendedprice,
l_discount <- l_discount,
l_tax <- l_tax,
l_returnflag <- l_returnflag,
l_linestatus <- l_linestatus,
l_shipdate <- l_shipdate,
l_commitdate <- l_commitdate,
l_receiptdate <- l_receiptdate,
l_shipinstruct <- l_shipinstruct,
l_shipmode <- l_shipmode,
l_comment <- l_comment
)
FORMAT PARQUET
START PIPELINE lineitem FOREGROUND;
As you can see in the first screenshot, my leaf nodes are using too much memory (~100GB) and therefore I can’t import any of the other database tables. I suspect that this is because of replication. If I see what happens in a leaf node, I have the following picture: !
Is there a way to disable replication or find the cause of this problem? I would expect every node to have around 50 GB of data and definitely not 100GB.
The problem is also visible here:
Based on memSQL statistics I should have had around 330GB of memory but I have almost double of that. Therefore there should be some way to disable the replication.
I would appreciate any answer!
Thank you very much for your time!