Hello,
we are getting the following error when trying to backup to S3: ERROR 2004 UNKNOWN_ERR_CODE: Leaf Error (:3306): Socket closed due to keepalive probe failures.
The database is about 500GB, has 35 tables and 32 partitions. I created a split partitions backup on sunday to match the recommended amount of partitions for our setup. I followed the exact steps in the docs. Since then it’s not possible to make any backups.
Can someone provide more information about the error code and possible solutions? Sadly I can’t find anything in the docs about this.
Things I already tried but resulted in the same error:
- increase the backup timeout (using the TIMEOUT clause in the backup query)
- increase connection_timeout
- increase subprocess_io_idle_timeout_ms
- decrease backup_max_threads (tried: 4, 8, 16, 32)
- different master aggregators
- stopped all pipelines
- restarted cluster multiple times
- upgraded to 8.0.17
- ‘FILL CONNECTION POOLS’ on each node
- using WITH INIT clause
I also can’t create a local backup as the command immediatly throws an invalid permission error (even though the directory is owned by memsql:memsql and it actually writes data before throwing).
thanks & best,
tom