I am using MemSQL 7.1.7. I have a cluster where is basically one big database. When I try to run INIT backup the taken disk space came from 30% used to an almost 95%. After that, I killed that backup proccess, because I was worried that the cluster would run out of space and crashed. I am doing backup on a remotely mounted NFS. The directories that took all the space were data/blobs and data/logs and after I stopped that backup it came back to normal after a while. I also have another memsql cluster which just replicates the main cluster with the big database. I’m using it for read operations and during backup the disk space taken also grows, but slower.
My questions are:
Why data/blobs and data/logs take so much space during backup process when I’m doing backup on remote backup server and not locally. How can I change this?
Why the blobs and logs increase in size on both clusters and what can I do about it so I can do the backup?
Was there an active write workload running during the backup? Backups do start a snapshot transaction internally which will prevent files needed by the backup from getting cleaned up from disk while the backup is running.
thank you that would probably explain it, but it’s still a big problem, because we are unable to do backups for that database.
Yes, there were a lot of inserts, but from 880GB partition, before the backup started there were only about 30% disk space used and after 2 days while the backup were running it grows up to almost 100%. And the 30% were taken by data that were collected over last few months.
I understand that it’s because what you said, but what can we do about it? We can not stop inserts on production cluster during backups. Is there a way to do the backup without running into this problem?
Yes, but it probably didn’t matter, because the database was not so big before and there were less inserts.
Anyway, I worked out a solution. I build a secondary cluster which is initially empty. When I want to do backup I first start replication for that big database which is on the production cluster. After the replication is fully started I stop the replication and start an initial backup of that replicated database, leaving production cluster intact. After the backup is finished I can start the replication again and next night do the diff backup.
I think Nate was asking because BACKUP INIT is known to be slower then a full backup, so potentially you could use full backups instead of differentials.
It maybe easier to increase the size of your disk then go down the 2nd cluster path. I think the 2nd cluster scheme would work - but you would need to stop replicating as backups on secondary databases are not allowed (so it would be replicate db, wait until in sync, stop replicating, backup init, replicate with differential to start replicating again [only supported on memsql 7.1]).
One other thing Tomas, Nate mentioned the approach of doing the INIT on a secondary cluster actually won’t work because the cluster id is part of the init backup metadata (to make sure folks with multiple clusters don’t accidentally init + diff on different clusters). There is no mechanism to override that.
thank you. I realized that yesterday so right now I’m just doing FULL backups from the secondary cluster. That should not be a problem to restore that backup on any cluster. Am I right?