I have a fresh install of 6.7.15 with a free license running on one host with one aggregator and one leaf.
I have the exact same setup on another host and I’m trying to test replication by replicating a database from one host (cluster) to the other. So from the secondary host I run:
replicate database test from root:‘PW’@10.0.2.12;
From then on I get an infinite loop of error messages in the logs for both the leaf and the aggregator.
Below are some sample messages from the logs. Let me know if I need to look for some other messages, there are so many it’s hard for me to tell what’s relevant.
2424463364 2019-04-02 21:20:14.552 ERROR: ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user ‘distributed’@‘localhost’ (using password: YES)
2424463562 2019-04-02 21:20:14.552 ERROR: Failed to connect to ‘127.0.0.1:3307/test_1’: Bad user or password.
2412029554 2019-04-02 21:20:06.559 ERROR: ProcessNotifications Thread 115087: AdjustReferenceDatabases: Failure updating reference databases at LSN 59. Succeeded updating 1, needed 2.
2412455322 2019-04-02 21:20:06.985 ERROR: ProcessHandshakeResponsePacket() failed. Sending back 1045: Access denied for user ‘distributed’@‘localhost’ (using password: YES)
2412455473 2019-04-02 21:20:06.985 ERROR: Thread 99999: SetConnFromMetadataNode: Connection failure connecting to node id 1: distributed:x@‘127.0.0.1’:3306. Failed with error: Access denied for user ‘distributed’@‘localhost’ (using password: YES)
The errors indicate that there is a login issue from the cluster trying to retrieve data from the other cluster. This uses an internal user (called distributed). It looks like maybe you have a different user that is shadowing it and thus causing the login failures.
Did you create any users like distributed@127.0.0.1?
19552115386 2019-04-03 02:05:46.645 WARN: Attempted to set network write buffer size to 0x800000 bytes, but the value is capped at 0x34000 bytes.
19552115448 2019-04-03 02:05:46.645 WARN: Attempted to set network read buffer size to 0x800000 bytes, but the value is capped at 0x34000 bytes.
19552161376 2019-04-03 02:05:46.691 ERROR: ProcessNotifications Thread 115087: AdjustReferenceDatabases: Failure updating reference databases at LSN 69. Succeeded updating 1, needed 2.
Is there some linux config that needs to be done? This is on CentOS 7. I haven’t done all the linux tuning recommendations yet but none of them seem to apply to this.
The warnings in the tracelog related to memory limits are unrelated - it can have an effect on replication performance, but it doesn’t affect correctness.
From looking at your original tracelogs, it seems like the DR replicas are trying to connect to ‘127.0.0.1’ or ‘localhost’. How did you setup your primary cluster? What’s the output of SHOW LEAVES and SHOW AGGREGATORS?
From the tracelogs it seems like you setup your primary cluster with non-public IPs (bootstrap aggregators '127.0.0.1' and/or add leaf root:PW@'127.0.0.1':3307). That does not work with DR - the address you provide in the REPLICATE command is only used for initial setup, and then the addresses used when setting up the primary cluster are used to actually replicate the data (this is because the leaves in the secondary cluster connect directly to the primary cluster’s leaves). The addresses don’t need to be public, but need to be correctly routable to from the secondary cluster.
In a bigger cluster with High Availability, you would do this one availability group at a time to stay online, but for a test environment this is fine (your data should all still be there when you re-add the leaf).
For the MA, you can run the following command on it to change its address:
bootstrap aggregator '$PUBLIC_IP' force;
For this second command, be extremely careful, especially if running in a bigger cluster (avoid it at all costs) - if you run it on a Child Aggregator by accident, then you are essentially running aggregator set as master, with all the associated risks (data loss/cluster metadata inconsistency - it’s a command meant only for emegencies), but skipping a few checks. If you run it on the Master Aggregator, you will just change its address in metadata, so it is safe for your case.