Hi,
I’m working with a large sample of trading data from multiple years, with each row as a single trade. Columns included in this table are the trade_datetime (MM-DD-YYYY HH:mm:ss.SSS), trade_date (MM-DD-YYYY), and trade_hour (HH).
My below query works as intended with a smaller set of data; this retains the latest trade data over a 1hr interval, then deletes everything else.
However with anything larger, 8.55M rows of data, I receive the following error in MemSQL:
ERROR 1956 ER_PARAMETER_CAPACITY_EXCEEDED: Query cannot be completed because the parameter array capacity of 1048576 was exceeded
I have plenty of resources available on this machine, along with 128GB or RAM, yet was wondering if there was a setting within MemSQL placing constraints on this parameter?
Query:
DELETE FROM tbl_trades
WHERE row IN
(SELECT row
FROM tbl_trades A
WHERE trade_datetime <
(SELECT MAX(trade_datetime)
FROM tbl_trades
WHERE ticker = A.ticker
AND trade_date = A.trade_date
AND trade_hour = A.trade_hour)
GROUP BY ticker, trade_price, trade_volume, trade_datetime, trade_date, trade_hour);
Here is how the table was created:
CREATE TABLE IF NOT EXISTS tbl_trades
(
row INT AUTO_INCREMENT,
ticker TEXT NOT NULL,
trade_price DECIMAL(12,2) NOT NULL,
trade_volume DECIMAL(12,2) NOT NULL,
trade_datetime DATETIME(6) NOT NULL,
trade_date DATE NOT NULL,
trade_hour INT NOT NULL,
upload_time TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
KEY(`row`) USING CLUSTERED COLUMNSTORE,
SHARD KEY (`ticker`)
);
Thanks