I’m sure this has been asked before, yet couldn’t find any case (via Google) specifically dealing with MemSQL.
Scenario:
Let's say I'd like to index historical (multi-year) transactional data from various files, using NodeJS & 'for loops', into MemSQL. The method I'm using today writes that data into 1 large columnstore table, reaching several million rows over time, and to keep it simple has the following info:
Table Name: tbl_transactions
Columns: transaction_id (Primary Key) | client_from | client_to | amount | timestamp
I have queries executed from a front-end website to display client specific information when provided a client_id. In the example of client_id = 100:
“SELECT * FROM tbl_transactions WHERE (client_from = '100' OR client_to = '100');
”
The Problem:
These queries often take several seconds to a few minutes to execute just from the sheer size & amount of data in that table. Adding specific columns as shards hasn't really helped either ('client_from' & 'client_to' are shards).
I’d like to greatly reduce the amount of time it takes to query my several million-row table for specific client information, as storing the table as a rowstore was not possible since that’d exceed the machine’s total available RAM.
Attempted Solution:
I'd then thought of an idea to create separate tables for each client, then populate fields whenever my NodeJS script finds data specific to each client (instead of in one large table), for ex:
tbl_transactions_client_100
tbl_transactions_client_101
…
Before I could begin, I had to increase both the maximum pooled connections & maximum MemSQL / MySQL connections:
memsql-admin update-config --key max_pooled_connections --value 100000 --all
memsql-admin restart-node --all
<login to memsql / mysql to increase the max_connections & verify changes>
SET GLOBAL max_connections = 100000;
show variables like 'max%connections';
NodeJS Code Snippet:
[FOR LOOP CODE]
client_id = i;
//==============================
// Create New Table If New Client ID
//==============================var script_01 = ‘CREATE TABLE IF NOT EXISTS ’ + ‘tb_transactions_’ + client_id + ’ (transaction_id VARCHAR(10), client_from VARCHAR(10), client_to VARCHAR(10), amount NUMERIC (11, 2), timestamp DATETIME, KEY(‘transaction_id’) USING CLUSTERED COLUMNSTORE);’;
con.query(script_01, function (err) {
if (err) throw err;//==============================
// Insert Data Into Table
//==============================var values = … // Fill these in
var script_02 = ‘INSERT INTO ‘tb_transactions_’ + client_id + ’ VALUES (?,?,?,?,?);’;con.query(script_02, values, function (err) {
if (err) throw err;
});});
[CLOSE FOR LOOP]
Resulting Error:
As you can imagine I end up with several thousands of tables generated. This provided a temporary solution, but over time, MemSQL fully utilizes all of the system's available RAM (128GB) due to the table creations and displays the following error in Terminal:
Error: ER_UNKNOWN_PARTITION: Unable to connect to leaf @127.0.0.1:3309 with user distributed, using password YES: [2004] Cannot connect to ‘127.0.0.1’:3309. Errno=111 (Connection refused)
Follow Up:
Two questions here:
- Is there a way to circumvent this error and the amount of RAM utilized by MemSQL?
- Is this a correct approach: creating multiple tables per client, to improve overall query performance & execution time, or is there a better solution for this?
Again, the end goal is to greatly reduce the amount of time it takes to query my several million row table for specific client information.
Thanks