Hi,
We found out that when a leaf become unreachable, the grace period for marking it in offline mode is 1200 seconds.
This is way too long. We wan’t to be able to recover from it much faster.
Is there a way to configure this value?
By default failover in MemSQL happens in 3 seconds if the node is “hard down” (rejecting all new connections) otherwise the node gets 30 seconds to respond to a heartbeat (if connections are going through, but otherwise are not getting responded to).
The grace period is defense against a node that is repeatedly failing and auto-healing. Failover will be delayed in this case based on the nodes history of coming back online quickly after a failure. If you want to disable this you can set the global variable failover_maximum_grace_interval_seconds to 0 on the master aggregator. In MemSQL 7 that is the default configuration if all databases are sync replicated.