We are in the middle of switching to GTID based replication from the good old logfile & log position based replication.
But what is GTID? GTID is an abbreviation of ‘GLOBAL TRANSACTION ID’ what speaks for itself: each transaction of a mysql cluster got its globally unique transaction ID, and the DBA has not spent time with positioning slaves, as well as we don’t have to ‘freeze’ any of the servers because of a master failover. The only thing we have to care about: to know what server should be used as a replication master.
OK, what was the problem with the old file-based replication?
Check out this scenario: you have two master servers, called db-master1 and db-master2 and two slaves DB-slave1 db-slave2. All the writes are happening on db-master1 and this will be replicated to the slaves and the failover master too.
So you have your replication set up, and everything works well. But there is a point when you have to do some maintenance stuff on the servers. You have to upgrade the servers for example because you have some security patches to apply. It is not really a problem, when you do it on the slaves, it is not a problem when you do it on the failover master, but there IS a problem when you do it on the active master. The problem is, that the server must be stopped during the operation. And when there is no master to write on there is no site. But hey, we have a failover master for that reason, don’t we?
Yeah, we have. So the DBA has to make the failover, but for doing this he/she has to check out the current binary log file as well the position.
Let’s assume, that the db-master1 is writing the log-bin.002342 file at position 155432 and at the very same moment the db-master2 is at the log-bin.0022122 file at position 120. The DBA will not know where to point the replication thread from the slaves on the db-master2 as long as the database is open and gets writes.
So we have to stop the writes of the application, and while there are no changes in the db we can check out the current position of the log files (at least on the db-master2) point the replication threads to that servers exact position, and start the slaves.
And we are happy because we can start the replication, stop the db-master1 patching it, etc.
OK, we can do it with some badass scripts, we can use magnificent mysql_replication ( 😀 ) ansible module, and we can shorten this period of service stop as much as we can. But still, there is a stopped site for a while. I’m not sure if we want this.
Fortunately, we can use the GTID based replication to switchover slaves without stopping the master. As I mentioned before, the GTID identifies all the transactions in a unique way, so if a slave knows what was the last transaction of it, it can continue the replication from any master in any moment.
If we decide to use that we can issue instead the good old
CHANGE MASTER TO MASTER_HOST=18.104.22.168. MASTER_LOG_FILE=<some_file>, MASTER_LOG_POS=<some_position>
CHANGE MASTER TO MASTER_HOST=22.214.171.124, MASTER_AUTO_POSITION=1
And when we do this, we don’t have to care about any log files or positions on the master, because the slaves will know what was the id of the last transaction of they applied (because the GLOBAL transaction ID is global on the whole cluster).
So, switching to GTID replication will make our lives easier.