Backup to S3 failed

sebaZ · April 15, 2021, 2:25pm

Hi guys!

I have a intermittent problem when I run backup job to S3,

My cluster: 2 EC2
1X Master
1X Aggregator
2X Leaf

Now my issue:

“mysql.connector.errors.DatabaseError: 1970 (HY000): Leaf Error (memsql-leaf-01.xxx.local:3306): Subprocess timed out receiving data. No stderr returned.”

It happen always 5 or 7 minutes after to start the process.

Error from aggregator log:

26865465487880 2021-04-15 19:43:47.181 ERROR: Failed taking a distributed backup for database xxx to directory ‘xx-xx backups/db/2021-04-15/xxx.backup’ failed with (1970:Leaf Error (memsql-leaf-01.xxx.local:3306): Subprocess timed out receiving data. No stderr returned.)

nhoran · April 16, 2021, 9:00pm

Hello!
This error is due to S3’s response time during the backup operation taking longer than what we expect, so the operation times out.

You can control this timeout by changing the global variable ‘subprocess_io_idle_timeout_ms’. Increasing this variable will make the subprocess wait longer for S3 before timing out.

Cheers,
Nate Horan

sebaZ · April 19, 2021, 12:40pm

I have on my config:

memsql> select @@subprocess_io_idle_timeout_ms;
±--------------------------------+
| @@subprocess_io_idle_timeout_ms |
±--------------------------------+
| 120000 |
±--------------------------------+
1 row in set (0.00 sec)

But in see in your documentation that the default now is 240000 so I think that Im running a old version
I moved it to: 300000

I will let you know the results