Hi,
We have one table where we need to do heavy write each day around 100 K records with upsert. Currently our table is columnar so upsert is taking too much memory. Is there a way that we can have write heavy and read heavy replica of the same table?
First you should try to make the upsert as fast as you can. Are you sharding on the key column and does it have a unique index?
Other ways to avoid running out of memory are to batch the upserts so you do, say, 1000 rows at a time, not 100K at once.
Yes, we are sharing on the column which will be unique for all the records (primary key)
No we are not setting unique index for any column. Here is the create table query for the table.
CREATE TABLE `employees` (
`id` char(126) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
`url` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
`title` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
PRIMARY KEY (`id`),
SHARD KEY `__SHARDKEY` (`id`),
SORT KEY `__UNORDERED` ()
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES'
Let us know something can be improved in the table creation to handle the upsert.
To reduce memory usage, as I mentioned before, batching your upserts to be smaller (say 1000 rows at a time instead of 100,000 at a time) could help.
That PRIMARY KEY clause also will create an index.
If you truly need a write-heavy and read-heavy version of the same table, you could have two tables and update one while querying the other, then switch between them, keeping them in sync at the app level. You can create a view for your query apps and ALTER the view to point to (SELECT * FROM …) one or the other as time goes by, so you don’t have to change your queries. This is more complex that just using one table but can be used in situations like this if you really need total isolation between read and write activity.