8 messages in com.mysql.lists.bugsRe[4]: Replication troubles
FromSent OnAttachments
Vinche08 Jun 2001 01:02 
Sasha Pachev08 Jun 2001 13:15 
Vinche18 Jul 2001 08:43 
Sasha Pachev18 Jul 2001 11:16 
Vinche18 Jul 2001 12:40 
Sasha Pachev18 Jul 2001 13:38 
Michael Widenius18 Jul 2001 13:39 
Sasha Pachev18 Jul 2001 14:25 
Subject:Re[4]: Replication troubles
From:Vinche (Vin@lavtech.ru)
Date:07/18/2001 12:40:36 PM
List:com.mysql.lists.bugs

Hello Sasha,

Wednesday, July 18, 2001, 10:17:10 PM, you wrote:

SP> On Wednesday 18 July 2001 09:44, Vinche wrote:

Hello Sasha,

Saturday, June 09, 2001, 12:15:24 AM, you wrote:

SP> On Friday 08 June 2001 02:03, Vinche wrote:

Hello!

Now about replication. I noticed for a long time ago that there is a problem with replication that exist for me(don't know about the others). Since it is not solved still i decided that this is not a common problem. Sometimes when replication fails, e.g. because of Duplication of keys, MySQL error log tells me something like

010608 10:56:08 Slave: Failed reading log event, reconnecting to retry, SP> log 'logbin.007' position 1214017 010608 10:56:08 Slave: reconnected to master

SP> 'repluser@xxxxxxxxxx:3306',replication resumed in log 'logbin.007' at SP> position 1214017

010608 10:56:08 Error reading packet from server: log event entry

SP> exceeded

SP> max_allowed_packet - increase max_allowed_packet on master (read_errno SP> 0,server_errno=65535)

I always know that this error is because of logbin cursor misposition on slave and it is always +22 bytes of replication record. I am changing cursor to -22 bytes of current location and replication resumes but not for a long time, only to the next hit of Duplicate key, on which it again have cursor +22 bytes of the real binlog record. I know that it will continue again and again until I do flush-logs and change on the slave master_log_file to the next file and master_log_pos to 4. After that it stops loosing right cursor position, but not for a long time. I don't know what is the cause of that problem.

SP> This sounds like a legitimate bug. Apparently, under some conditions, SP> the log SP> event positions gets incremented twice. Next time this happens, can you SP> FTP SP> the apropriate log to ftp://support.mysql.com/pub/mysql/secret/ ?

From the manual:

If you have decided you can skip the next query, do SET SQL_SLAVE_SKIP_COUNTER=1; SLAVE START; to skip a query that does not use auto_increment, last_insert_id or SP> timestamp, or SET SQL_SLAVE_SKIP_COUNTER=2; SLAVE START; otherwise

Can this be related to problems happening here? I mean what will happen if I do SET SQL_SLAVE_SKIP_COUNTER=1 where I should SET SQL_SLAVE_SKIP_COUNTER=2? And why should I care about this, isn't it should be recognized automatically when it must be skipped once or twice? I mean users shouldn't aware of internals, they just want to skip one query in the binlog but not skip one record in binlog.

SP> If you set it to 1 instead of 2, you will skip auto-incerement set, but not SP> the actual query. So the query will run, but possibly produce wrong results. SP> However, this should not mess up the log position. I need to fix one thing
in SP> the manual - for timestamp without auto increment, the setting should be 1, SP> not 2 - will fix this immediately.

SP> Regarding auto-detection of how much to skip - a typical user should never SP> have to skip anything - this feature is there to provide a way for manual SP> recovery in case you hit a bug. So if you ever hit a bug, you have to deal SP> with the internals to be able to get past it and know exactly what you are SP> doing. That is why I decided not to implement any smart-alec decisions on SP> SKIP_SLAVE_COUNTER ( if we hit a bug, it proves we are not as smart as we SP> thought we were anyway) and just do exactly what we are told in recovery.

SP> Howerver, your question makes me ask a question of my own - do you do SP> SKIP_SLAVE_COUNTER a lot? If yes, the fact that you have to do it at all SP> means that there is some other bug, either in your client code or in the SP> replication itself. Next time you have replication stop in the state where SP> SKIP_SLAVE_COUNTER will resume it, make sure you understand why it died - if SP> you don't, provide SHOW SLAVE STATUS, the appropriate log on the master, and SP> the error log of the slave.

Imagine the following situation. I have one mysql server and want to make the mirror of it. However I can't stop this server and can't lock all tables because it's a production server. While I'm doing sequential LOAD TABLE FROM MASTER, there are many inserts happening on primary server and part of them are already loaded from master. So when I will start slave I'll have to skip many statements. The manual assumes that one can stop master to make slaves but what if he can't? I think it would be very nice if we can make a hot replication. I think it's not easy to do: we should track all the queries that change the tables during loading tables. I think the best way is to let the slave handle master( -)),that is let the slave hotadding on master names of tables which master has to binlog. Then we can do this on slave: LOCK TABLE xxx ON MASTER; LOAD TABLE xxx FROM MASTER; MASTER_BINLOG_ADD_TABLE xxx; UNLOCK TABLE xxx ON MASTER; Before that we should clean binlog_do_db on master. Doing the procedure above in the cycle we can load all the tables from master and master will binlog only real updates which we will be able to replay on slave without any problems and the state of slave databases will be consistent in respect to the master databases. Loading tables from master is considerably quick, so locking can be harmlessly done even on production servers, say in nighthours. I understand that letting the slave to operate on master isn't a good thing on security reasons, so it may be better to make this thing: CREATE REPL_SESSION ON MASTER FOR LOADING TABLES xxx.ee,xxx.tt,yyy.gg,etc; this will do some minor binlog which is valid only for this slave (we can track the slave server id) and it won't affect the primary binlog. The master will log here all the normal binlogs EXCEPT the tables that are mentioned in the queries above. Then we can do the procedure that I mentioned above for sequential loading of tables and adding them to the new generated binlog. After that we tell the master STOP REPL_SESSION ON MASTER; This will stop binlogging on master to the slave's binlog and write at the end of it where slave should continue in the global binlog after replaying the session. So now slave can do SLAVE START WITH SESSION REPLAY; which will replay that session on master, then drop it, then continue from the marked position as in normal mode. Of course, I understand that this is far from ideal but maybe some of the ideas will be helpful to make the life of admins easyer. Then it won't be a disaster for them when the replication fail because of bug.