Based on the Data Types documentation section, I was expecting MEDIUMINT (8) UNSIGNED
to consume less memory than INT (10) UNSIGNED
.
From the MemSQL Studio this seems true while exploring the Databases report:
CREATE TABLE `keywords-int` (
`id` binary(16) NOT NULL,
`location` int(10) unsigned DEFAULT null,
PRIMARY KEY (`id`),
KEY `location` (`location`)
)
+----------+------------------+--------------+
| Name | Data Type | Memory Usage |
+----------+------------------+--------------+
| id | binary(16) | 331 MB |
| location | int(10) unsigned | 165 MB |
+----------+------------------+--------------+
CREATE TABLE `keywords-mediumint` (
`id` binary(16) NOT NULL,
`location` mediumint(8) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `location` (`location`)
)
+----------+-----------------------+--------------+
| Name | Data Type | Memory Usage |
+----------+-----------------------+--------------+
| id | binary(16) | 331 MB |
| location | mediumint(8) unsigned | 83 MB | <--- 50% lower
+----------+-----------------------+--------------+
Index usage of both tables remains the same (PRIMARY = 1 GB
and location = 827 MB
).
But the memory_use
provided by INFORMATION_SCHEMA.TABLE_STATISTICS
shows otherwise:
+--------------------+----------+------------+
| table_name | rows | memory_use |
+--------------------+----------+------------+
| keywords-int | 21670444 | 2799602432 |
| keywords-mediumint | 21670444 | 2799602432 |
+--------------------+----------+------------+
I did execute ANALYZE TABLE ...
on both tables.
What could be the explanation? I guess the reported usage from MemSQL Studio per column is the most accurate?
The reason of chosing MEDIUMINT
over INT
is because of the expectation of reduced memory usage. With 50M rows and counting, this should save 200 MB per column.
Am I wrong in my understanding of storage calculation?
Note: In all 21M rows the location
is exactly a four digit number. Perhaps MemSQL doesn’t store the integers as a fixed width?
Thanks!