6 messages in com.mysql.lists.mysqlRe: Replication suddenly stops on mys...
FromSent OnAttachments
Jan Kirchhoff27 Jan 2005 14:35 
Jan Kirchhoff31 Jan 2005 05:16 
Gleb Paharenko31 Jan 2005 08:16 
Jan Kirchhoff31 Jan 2005 23:59 
Gleb Paharenko01 Feb 2005 00:49 
Jan Kirchhoff07 Feb 2005 14:48 
Subject:Re: Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No
From:Jan Kirchhoff (kir@gmx.de)
Date:01/31/2005 05:16:30 AM
List:com.mysql.lists.mysql

Hi,

My problem still goes on... After having had the problem 2 more times within 1 day, I decided to re-do the replication (copy the whole database onto the slave with rsync and reset master and slave). That only lasted for little more than 1 day and I ended up with the same error:

Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

I can look at the binlog with mysqlbinlog on the master and the slave; no errors or problems. After a simple "SLAVE START" without having done any changes to the database, the slave thread startet again and caught up with the master.

I've been using mysql's replication-feature since it first came up in 1999 or 2000 and dealt with lots of problems and workarounds, but this one is weird. Any ideas anybody?

Jan

Hello,

I have a replication setup on to linux boxes (debian woody, kernel 2.4.21-xfs, mysql 4.1.7-standard official intel-compiler binary from mysql.com).

master:~# mysqladmin status Uptime: 464848 Threads: 10 Questions: 296385136 Slow queries: 1752 Opens: 2629 Flush tables: 1 Open tables: 405 Queries per second avg: 637.596

slave:~# mysqladmin status Uptime: 463460 Threads: 2 Questions: 292885156 Slow queries: 6 Opens: 2510 Flush tables: 1 Open tables: 327 Queries per second avg: 631.953

both systems have identical hardware (P4 2.4ghz, 3GB RAM, SCSI-Hardware-RAID) connection is gigabit-ethernet.

Everything used to work fine, but I wanted to get rid of InnoDB since I did only use that for very big table containing historical data and those tables were moved to another server. I ran out of discspace, innodb-datafiles can only grow but not shrink and i didn't need it anyway, so it had to go. I stopped the slave, changed all left over innodb-tables to myisam, added skip-innodb to my.cnf on the master and the slave, restarted the server, renewed the replication by doing it the "classical" way: flush tables with read log, copy the /var/lib/mysql on the slave (not much, just around 20GB), reset master, unlock tables. Then start the slave-mysqld, reset slave, slave start.

Everything was fine and very fast for 4 days (from saturday till wednesday afternoon), then suddenly the slave stopped. this is where the weird stuff starts: "show slave status" tells me everything is fine, just "Slave_IO_Running: No" is wrong. After typing "slave start", it says "Slave_IO_Running: Yes", and "Slave_SQL_Running: No". Very strange. Now i did a "slave stop;slave start;" and everything is fine again, the slave catches up and goes on. Today (thursday afternoon), the same thing happens again and can be solved again by "slave stop;slave start;". Now it happened again around 10pm. Again, the stop-start-trick made it working again.

I add the output of my mysql-shell

Can anybody help me with that? This is a production system under heavy load and I can't play around with different mysql-versions and such... If I don't find a solution really quick, I'll have to do help myself with some shell-skript-daemon checking if replication is running and issuing "stop slave;start slave"-commands otherwise... not really the way it should be :(

Thanks Jan

SLAVE: slave:~# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 7 cpu MHz : 2392.077 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4771.02

slave:~# free total used free shared buffers cached Mem: 3105104 2355364 749740 0 440 1514104 -/+ buffers/cache: 840820 2264284 Swap: 779144 428072 351072

MASTER master:~# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 7 cpu MHz : 2392.163 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4771.02

master:~# free total used free shared buffers cached Mem: 3105104 3096016 9088 0 648 2087780 -/+ buffers/cache: 1007588 2097516 Swap: 779144 391732 387412

Slave shell:

wpdb2:~# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 23083 to server version: 4.1.7-standard

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

wpdb2 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.10.26 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000210 Read_Master_Log_Pos: 146168522 Relay_Log_File: wpdb2-relay-bin.000210 Relay_Log_Pos: 146168608 Relay_Master_Log_File: mysql-bin.000210 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 146168522 Relay_Log_Space: 146168608 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 4384 1 row in set (0.00 sec)

slave mysql> slave start; Query OK, 0 rows affected (0.01 sec)

slave mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.26 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000210 Read_Master_Log_Pos: 186399548 Relay_Log_File: slave-relay-bin.000210 Relay_Log_Pos: 146168608 Relay_Master_Log_File: mysql-bin.000210 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network pro blem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their na mes by issuing 'SHOW SLAVE STATUS' on this slave. Skip_Counter: 0 Exec_Master_Log_Pos: 146168522 Relay_Log_Space: 186399677 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 4395 1 row in set (0.00 sec)

slave mysql> slave stop; Query OK, 0 rows affected (0.00 sec)

slave mysql> slave start; Query OK, 0 rows affected (0.01 sec)

slave mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.26 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000211 Read_Master_Log_Pos: 501070714 Relay_Log_File: slave-relay-bin.000210 Relay_Log_Pos: 148765772 Relay_Master_Log_File: mysql-bin.000210 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 148765643 Relay_Log_Space: 1575017939 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 4729 1 row in set (0.00 sec)

slave mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.26 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000211 Read_Master_Log_Pos: 501273227 Relay_Log_File: slave-relay-bin.000210 Relay_Log_Pos: 155647931 Relay_Master_Log_File: mysql-bin.000210 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 155647802 Relay_Log_Space: 1575220452 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 4729 1 row in set (0.00 sec)

slave mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.10.26 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000211 Read_Master_Log_Pos: 502052054 Relay_Log_File: slave-relay-bin.000210 Relay_Log_Pos: 172407186 Relay_Master_Log_File: mysql-bin.000210 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 172407057 Relay_Log_Space: 1575999279 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 4693 1 row in set (0.00 sec)

slave mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Queueing master event to the relay log Master_Host: 192.168.10.26 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysql-bin.000211 Read_Master_Log_Pos: 987239824 Relay_Log_File: wpdb2-relay-bin.000211 Relay_Log_Pos: 987239782 Relay_Master_Log_File: mysql-bin.000211 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 987239653 Relay_Log_Space: 987239953 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 1 row in set (0.00 sec)

no it looks like it's working again... at first it worked for 4 days, then another 24 hours, and then only 6 hours.