We just experienced a huge speed improvement (x16 times) by running ANALYZE on a few tables after debugging some queries with unusual plans using the VISUAL EXPLAIN feature in MemSQL Studio.
The huge impact was probably caused by a larger data movement/regeneration yesterday, but it made us wonder if we should schedule some ANALYZE queries to always keep the statistics up-to-date.
The documentation mentions some automatic statistics collections, but strongly recommends executing ANALYZE queries manually.
We do mostly have rowstores but also a single columnstore with histograms.
I guess we would benefit from a daily/weekly statistics collection, but wonders why this isn’t happening automatically by MemSQL. The only real draw appears to be plan invalidation, which probably isn’t the worst drawback since you gave us the INTERPRET_FIRST option.
Is this the right direction to follow, or am I missing something?
For MemSQL 6.8 and earlier, I would recommend running ANALYZE after large loads or updates, and also nightly or weekly to account for changes after continuous updates.
MemSQL 7.0 will have fully-automatic statistics gathering on all table types, so the large majority of users will never have to run ANALYZE again.
Here’s one that simply rolls through the tables in a given schema. If you are executing in mysql workbench, it may not finish because of the maximum return grids allowed in the UI. May want to comment out the “print completed” portion.
DELIMITER //
CREATE OR REPLACE PROCEDURE analyze_table_by_schema(_table_schema text ) RETURNS void AS
DECLARE
/* Use static parameterized SQL when declaring ‘q’ because _table_name
and _table_schema are placeholders for constant values. */
qry QUERY(table_name VARCHAR(64)) = SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = ‘BASE TABLE’ AND TABLE_SCHEMA = _table_schema;
arr ARRAY(record(table_name VARCHAR(64)));
_table_name varchar(64);
exec_string varchar(1000);
table_completed varchar(64);
complete_string varchar(1000);
BEGIN
arr = COLLECT(qry);
FOR x IN arr LOOP
_table_name = x.table_name;
exec_string = 'analyze table ‘|| _table_schema || ‘.’ || _table_name ;
EXECUTE IMMEDIATE exec_string;
complete_string = ‘echo select ‘’’ || _table_name || ‘’’ Completed ';
EXECUTE IMMEDIATE complete_string;
I was thinking about this procedure a bit more. If you have a table with the tables you want to analyze, maybe by groups, you could drive this procedure from that list instead of all the tables. No reason to analyze code tables that don’t change.