2 messages in com.mysql.lists.mysqlRe: replication algorithm question| From | Sent On | Attachments |
|---|---|---|
| Sasha Pachev | 16 Dec 2000 09:26 | |
| sean | 19 Dec 2000 10:25 |
| Subject: | Re: replication algorithm question![]() |
|---|---|
| From: | Sasha Pachev (sas...@mysql.com) |
| Date: | 12/16/2000 09:26:27 AM |
| List: | com.mysql.lists.mysql |
On Friday 15 December 2000 12:27, Aaron Ingram wrote:
When replicating a query having an error code, the slave will fail if that same error is not encountered. Assuming there's some reason to log failed queries, why should the slave try to execute them? If they were not actually written to the master, even attempting the command on the slave seems incorrect.
Here is an example:
create table foo(n int not null primary key); insert into foo values (1); insert into foo values (1),(2),(3);
the last query will return an error, but still modify the table, so it needs to be replicated.
-Aaron
-----Original Message----- From: Aaron Ingram Sent: Sunday, December 10, 2000 10:20 PM To: 'mys...@lists.mysql.com' Subject: Replication fails when expecting error
I've run across the following replication error on the slave: 001209 2:08:36 Slave: did not get the expected error running query from master - expected: 'Got an error writing communication packets', got 'no error' 001209 2:08:36 Slave: error running query 'delete from table_name' 001209 2:08:36 Error running query, slave aborted. Fix the problem, and re-start the slave thread with mysqladmin start-slave I've confirmed the "Got an error writing communication packets" error appears on the master. However, whatever problem caused that error on the master is not occurring on the slave. Hence the slave failure. Should a slave fail when encountering a mismatched error code of this type?
That aside, even if I can stop the communication problem from recurring on the master, I still have the existing log events to deal with. How do I work around this problem? I really would like to avoid restarting the log from a fresh dump&load, especially since there's no guarantee I can stop this error from happening again.
I'm running MySQL 3.23.28 on RedHat Linux 6.1.
You have found a rather rare bug that would be near to impossible to repeat "at will". The query not the master has actually succeeded, but the errno in the thread structure was set because the client dropped the connection as the thread was trying to tell it that everything was cool. Here is a patch for this:
--- 1.28/sql/sql_delete.cc Fri Dec 8 08:04:53 2000 +++ edited/sql/sql_delete.cc Sat Dec 16 09:54:11 2000 @@ -106,13 +106,13 @@ } if (!error) { - send_ok(&thd->net); // This should return record count mysql_update_log.write(thd,thd->query,thd->query_length); if (mysql_bin_log.is_open()) { Query_log_event qinfo(thd, thd->query); mysql_bin_log.write(&qinfo); } + send_ok(&thd->net); // This should return record count } DBUG_RETURN(error ? -1 : 0); }
Resuming the replication would be a rather tricky task :
SHOW SLAVE STATUS; on the slave
figure out the name of the master log and the position
on the master
od -c -j offset_on_the_slave /path/to/datadir/binlog_name
then count the bytes and try to guess where the next log entry starts - it would be about 100 bytes ahead of the current position, and the 5th byte of the entry ( 4 bytes away from the start) will be most likely 0x02 ( the code for query log event). For the exact offset, look at the previous entry - here is the format:
offset size meaning 0 4 timestamp 4 1 event code ( 0x02 for query) 5 4 orginating server id 9 4 event size
all integers are little endian
And here is the code that creates it ( the ultimate reference :-) ):
int Log_event::write_header(IO_CACHE* file) { // make sure to change this when the header gets bigger char buf[LOG_EVENT_HEADER_LEN]; char* pos = buf; int4store(pos, when); // timestamp pos += 4; *pos++ = get_type_code(); // event type code int4store(pos, server_id); pos += 4; long tmp=get_data_size() + LOG_EVENT_HEADER_LEN; int4store(pos, tmp); pos += 4; return (my_b_write(file, (byte*) buf, (uint) (pos - buf))); }
So after you have figured out the event size of the trouble query, add it to the current slave offset and do:
mysqlbinlog -j new_offset /path/to/datadir/binlog_name | head -1
You should see the next query printed out in plain text - if you do, you got the offset right - if not, check your arithmetic and try again.
Once you got the offset right, on the slave:
CHANGE MASTER TO MASTER_LOG_POS=new_offset; SLAVE START; SHOW SLAVE STATUS;
it should now be going, and your data on the slave should be ok, as the delete query that we have skipped by adjusting the offset has already happened.
In 3.23.30, I will change the code on the slave to print the next offset in case of a query error, so one could skip the trouble query in case something really terrible happens without having to do the binlog magic.
-- MySQL Development Team __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sasha Pachev <sas...@mysql.com> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Provo, Utah, USA <___/




