Question about memsql7.0.beta

Hi there,

Testing the beta version of memsql7.1.
This phenomenon(offline > recovering ) of the leaf node is repeated without any errors while loading data by applying a unique key to the column store table.

Is it unstable because it is a beta version?
When will the official version be released?

thanks in advance.

Hi,

Can you take a cluster report and send it to bug-report@memsql.com as well maybe some more detail around which table you added a unique key to.

The Beta will have some bugs and its not release candidate quality as of yet.

-Adam

Hi,

When the unique key was tested in the column store, the situation I inquired about was repeated.
The problem I’ve seen is that when load 3GB of data, the amount of memory that Index key takes up will increase to 20GB. After loading, memory space occupied over time is lost, and only compressed data is stored on disk(1.8GB).

How much memory space should be considered when applying unique key? What should be considered?

Do you have any suggestions for using less memory space?

Please refer to the following.
The number of data I’m trying to load is 92M rows.
However, during loading, you can see that rows is over 100M. I wonder what mechanism this phenomenon is caused by.
Thanks.

CREATE TABLE salaries (
    emp_no      INT             NOT NULL,
    salary      INT             NOT NULL,
    from_date   DATE            NOT NULL,
    to_date     DATE            NOT NULL,
    A as concat_ws(",",emp_no,from_date) PERSISTED CHAR(16),
    SHARD KEY (A),
    UNIQUE KEY (A) USING HASH,
    KEY (from_date) USING CLUSTERED COLUMNSTORE
);

Thanks for the details. We’ll investigate on our end.

-Adam

Columnstore tables with unique keys insert into the in-memory segment. I’d be interested what numbers you get if you change salaries to be a rowstore table. My simulation suggests these numbers are in line with what we see for rowstore tables.

Unique key inserting straight to disk is slated for 7.5.

2020-05-13 19:23:24.202790 create database db;
2020-05-13 19:23:26.367455 use db;
2020-05-13 19:23:26.367615 create table ai(id int auto_increment, a char(16), shard key(id));
2020-05-13 19:23:26.414820 insert ai(a) values (null);
2020-05-13 19:23:26.434448 insert ai(a) select a from ai;
2020-05-13 19:23:26.446610 insert ai(a) select a from ai;
2020-05-13 19:23:26.449789 insert ai(a) select a from ai;
2020-05-13 19:23:26.453297 insert ai(a) select a from ai;
2020-05-13 19:23:26.456515 insert ai(a) select a from ai;
2020-05-13 19:23:26.459757 insert ai(a) select a from ai;
2020-05-13 19:23:26.460906 insert ai(a) select a from ai;
2020-05-13 19:23:26.461789 insert ai(a) select a from ai;
2020-05-13 19:23:26.462725 insert ai(a) select a from ai;
2020-05-13 19:23:26.463983 insert ai(a) select a from ai;
2020-05-13 19:23:26.465723 insert ai(a) select a from ai;
2020-05-13 19:23:26.468323 insert ai(a) select a from ai;
2020-05-13 19:23:26.472506 insert ai(a) select a from ai;
2020-05-13 19:23:26.480496 insert ai(a) select a from ai;
2020-05-13 19:23:26.494401 insert ai(a) select a from ai;
2020-05-13 19:23:26.521402 insert ai(a) select a from ai;
2020-05-13 19:23:26.574805 insert ai(a) select a from ai;
2020-05-13 19:23:26.683676 insert ai(a) select a from ai;
2020-05-13 19:23:26.900184 insert ai(a) select a from ai;
2020-05-13 19:23:27.610899 insert ai(a) select a from ai;
2020-05-13 19:23:29.404897 insert ai(a) select a from ai;
2020-05-13 19:23:32.921005 insert ai(a) select a from ai;
2020-05-13 19:23:40.342757 insert ai(a) select a from ai;
2020-05-13 19:23:55.373456 create table rs(a char(16), shard key(a));
2020-05-13 19:23:55.575927 create table csu(a char(16), shard key(a), key(a) using clustered columnstore, unique key(a) using hash);
2020-05-13 19:23:56.265031 create table cs(a char(16), shard key(a), key(a) using clustered columnstore);
2020-05-13 19:23:56.749664 insert rs select id from ai;
2020-05-13 19:24:08.911213 insert cs select id from ai;
2020-05-13 19:24:18.652115 insert csu select id from ai;
sleep 10
2020-05-13 19:24:54.714452 select database_name, table_name, rows, format(sum(memory_use),0) mem
    from information_schema.table_statistics 
    group by 1, 2;
+---------------+------------+---------+---------------+
| database_name | table_name | rows    | mem           |
+---------------+------------+---------+---------------+
| db            | csu        | 1301393 | 1,693,712,480 |
| db            | ai         | 1049313 | 1,493,565,440 |
| db            | rs         | 1046604 | 1,419,640,832 |
| db            | cs         | 1046604 | 925,216       |
+---------------+------------+---------+---------------+
4 rows in set
2020-05-13 19:24:54.717794 select database_name, table_name, format(sum(compressed_size), 0) disk
    from information_schema.COLUMNAR_SEGMENTS
    group by 1, 2;
+---------------+------------+------------+
| database_name | table_name | disk       |
+---------------+------------+------------+
| db            | csu        | 39,049,154 |
| db            | cs         | 82,756,406 |
+---------------+------------+------------+
2 rows in set
2020-05-13 19:24:54.721979 optimize table cs flush;
2020-05-13 19:24:54.744249 optimize table csu flush;
2020-05-13 19:25:03.877798 select database_name, table_name, rows, format(sum(memory_use),0) mem
    from information_schema.table_statistics 
    group by 1, 2;
+---------------+------------+---------+---------------+
| database_name | table_name | rows    | mem           |
+---------------+------------+---------+---------------+
| db            | csu        | 1353207 | 1,306,793,248 |
| db            | ai         | 1049313 | 1,493,565,440 |
| db            | rs         | 1046604 | 1,419,640,832 |
| db            | cs         | 1046604 | 1,310,720     |
+---------------+------------+---------+---------------+
4 rows in set
2020-05-13 19:25:03.879202 select database_name, table_name, format(sum(compressed_size), 0) disk
    from information_schema.COLUMNAR_SEGMENTS
    group by 1, 2;
+---------------+------------+------------+
| database_name | table_name | disk       |
+---------------+------------+------------+
| db            | csu        | 82,800,382 |
| db            | cs         | 82,791,227 |
+---------------+------------+------------+
2 rows in set
# Wait for GC
master-agg $ memsql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1381
Server version: 5.5.58 MemSQL source distribution (compatible; MySQL Enterprise & MySQL Commercial)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

memsql> use db
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
memsql> select database_name, table_name, rows, format(sum(memory_use),0) mem
    ->     from information_schema.table_statistics 
    ->     group by 1, 2;
+---------------+------------+---------+---------------+
| database_name | table_name | rows    | mem           |
+---------------+------------+---------+---------------+
| db            | csu        | 1046604 | 0             |
| db            | ai         | 1049313 | 1,493,565,440 |
| db            | rs         | 1046604 | 1,419,640,832 |
| db            | cs         | 1046604 | 0             |
+---------------+------------+---------+---------------+
4 rows in set (0.01 sec)

memsql>
1 Like

The workaround is to load fewer rows per transaction.

1 Like