Open
Description
If I bring up the project fresh, then bring up a second mysql slave, then destroy the primary, the failover process appears to work technically if connecting to mysql via CNS but based on the error logs it looks like the onchange container pilot function isn't doing what is expected to inform consul of the primary change.
It's definitely getting past where it was before. I got the mysql primary to come up fine and saw replication working. However when I killed the master to test failover on the slave server I get:
Version: '5.6.34-79.1-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona Server (GPL), Release 79.1, Revision 1c589f9
2017/03/12 22:58:54 INFO manage Setting up replication.
2017-03-12 22:58:54 58858 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem.
2017-03-12 22:58:54 58858 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.168.129.171', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2017-03-12 22:58:54 58858 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2017-03-12 22:58:54 58858 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2017-03-12 22:58:54 58858 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysqld-relay-bin.000001' position: 4
2017-03-12 22:58:54 58858 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'FIRST' at position 4
2017/03/12 22:58:54 2017/03/12 22:58:54 [ERR] http: Request PUT /v1/agent/check/pass/mysql-3709f21efad3?note=ok, error: CheckID "mysql-3709f21efad3" does not have associated TTL from=127.0.0.1:45860
2017/03/12 22:58:54 Unexpected response code: 500 (CheckID "mysql-3709f21efad3" does not have associated TTL)
Service not registered, registering...
2017/03/12 22:58:54 2017/03/12 22:58:54 [INFO] agent: Synced service 'mysql-3709f21efad3'
2017/03/12 22:58:54 2017/03/12 22:58:54 [INFO] agent: Synced check 'mysql-3709f21efad3'
2017/03/12 22:58:54 2017/03/12 22:58:54 [INFO] agent: Synced check 'mysql-3709f21efad3'
2017/03/12 23:04:39 INFO manage [on_change] Executing failover with candidates: [u'192.168.129.173']
2017-03-12 23:04:39 58858 [Note] Error reading relay log event: slave SQL thread was killed
2017-03-12 23:04:39 58858 [Note] Slave I/O thread killed while reading event
2017-03-12 23:04:39 58858 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000001', position 1209
2017/03/12 23:04:39 WARNING: Using a password on the command line interface can be insecure.
2017/03/12 23:04:39 # Checking privileges.
2017/03/12 23:04:39 # Checking privileges on candidates.
2017/03/12 23:04:39 # Performing failover.
2017/03/12 23:04:39 # Candidate slave 192.168.129.173:3306 will become the new master.
2017/03/12 23:04:39 # Checking slaves status (before failover).
2017/03/12 23:04:39 # Preparing candidate for failover.
2017/03/12 23:04:39 # Creating replication user if it does not exist.
2017/03/12 23:04:39 # Stopping slaves.
2017/03/12 23:04:39 # Performing STOP on all slaves.
2017/03/12 23:04:39 # Switching slaves to new master.
2017/03/12 23:04:39 # Disconnecting new master as slave.
2017/03/12 23:04:39 # Starting slaves.
2017/03/12 23:04:39 # Performing START on all slaves.
2017/03/12 23:04:39 # Checking slaves for errors.
2017/03/12 23:04:39 # Failover complete.
2017/03/12 23:04:39 #
2017/03/12 23:04:39 # Replication Topology Health:
2017/03/12 23:04:39 +------------------+-------+---------+--------+------------+---------+
2017/03/12 23:04:39 | host | port | role | state | gtid_mode | health |
2017/03/12 23:04:39 +------------------+-------+---------+--------+------------+---------+
2017/03/12 23:04:39 | 192.168.129.173 | 3306 | MASTER | UP | ON | OK |
2017/03/12 23:04:39 +------------------+-------+---------+--------+------------+---------+
2017/03/12 23:04:39 # ...done.
2017/03/12 23:04:40 ERROR manage [on_change] this node is neither primary or replica after failover; check replication status on cluster.
2017/03/12 23:04:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:04:45 2017/03/12 23:04:45 [INFO] memberlist: Suspect a5b06f708612 has failed, no acks received
2017/03/12 23:04:48 2017/03/12 23:04:48 [INFO] memberlist: Suspect a5b06f708612 has failed, no acks received
2017/03/12 23:04:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:04:50 2017/03/12 23:04:50 [INFO] serf: EventMemberFailed: a5b06f708612 192.168.129.171
2017/03/12 23:04:50 2017/03/12 23:04:50 [INFO] memberlist: Suspect a5b06f708612 has failed, no acks received
2017/03/12 23:04:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:04:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:04 2017/03/12 23:05:04 [WARN] agent: Check 'mysql-3709f21efad3' missed TTL, is now critical
2017/03/12 23:05:04 2017/03/12 23:05:04 [INFO] agent: Synced check 'mysql-3709f21efad3'
2017/03/12 23:05:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:48 2017/03/12 23:05:48 [INFO] serf: attempting reconnect to a5b06f708612 192.168.129.171:8301
2017/03/12 23:05:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:05:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:06:58 2017/03/12 23:06:58 [INFO] serf: attempting reconnect to a5b06f708612 192.168.129.171:8301
2017/03/12 23:06:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:38 2017/03/12 23:07:38 [INFO] serf: attempting reconnect to a5b06f708612 192.168.129.171:8301
2017/03/12 23:07:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:54 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:07:59 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:04 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:09 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:14 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:19 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:24 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:29 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:34 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:39 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:44 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:49 ERROR manage Cannot determine MySQL state; failing health check.
2017/03/12 23:08:54 ERROR manage Cannot determine MySQL state; failing health check.