Hi I am following the "Installing A Web, Email & MySQL Database Cluster On Debian 6.0 With ISPConfig 3" How can I check that the replication between the to mysql-servers is working/up to date. If it is not in sync/working, what is the correct procedure to get it inline again. On my Server1: mysql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.17.2.182 Master_User: slaveuser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 267 Relay_Log_File: mysqld-relay-bin.000014 Relay_Log_Pos: 370 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 267 Relay_Log_Space: 673 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec) On my Server2: mysql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.17.2.181 Master_User: slaveuser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000006 Read_Master_Log_Pos: 11551505 Relay_Log_File: mysqld-relay-bin.000013 Relay_Log_Pos: 11551651 Relay_Master_Log_File: mysql-bin.000006 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 11551505 Relay_Log_Space: 11551851 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 1 row in set (0.00 sec) Pls someone show me a way or a link to find 1. Is above correct or faulty and if so in what way, (some values above differ a lot)? 2. How to make it right (Shall all values be the same in above lists?) 3. Is ISPconfig monitoring and reporting the mirror set in any way? //millpark10
I use nagios to monitor a mysql-replication. There is no general solution to fix a problem with the replication. You can skip the error or rebuild the replication. See here. You replication is still running. See "Slave_IO_Running: Yes" The values for the slave-status must not be equal on each server. Currently not. But it´s a nice idea. But i will not add this to 3.0.5.x as 3.0.6 will have a much better handling for system-cron-jobs. BTW. I´m not sure if there is any statement in the tutorial regarding skipping some errors (1007 and 1008) if you´re running ispconfig in a cluster-setup. in some cases this could crash your replication.
Florian, Thank you for a very quick response! So I will read me up on nagios, heard about it before but have not had the need. Is "Slave_IO_Running: Yes" the only parameter I have to take into consideration? e.g. if it is "yes" then everything is working and I dont have to think about any other figures in the SHOW SLAVE STATUS \G output? Even if they seem to be way different as in "Exec_Master_Log_Pos" for example in my post? Quote: "3. Is ISPconfig monitoring and reporting the mirror set in any way?" "Currently not. But it´s a nice idea. But i will not add this to 3.0.5.x as 3.0.6 will have a much better handling for system-cron-jobs." I read about skipping errors, didn't understand it but now have at least a idea how to avoid a situation (1007) before it is happening. OK, good, well this was not a request for features just a way to understand how it works. I am in the process of building a ISPconfig3-system that is supposed to be as safe/secure/redundant as possible. I am using ftm, 2 HP proliant DL380 G4 with 14 disk smartarrays connected. Both servers with hw RAID5 and the Smartarray also RAID5. These two servers are identical and is in a mirror(cluster)setup. Next question is, what will I have to do if one of the servers was down for "maintenance" (e.g. change of a faulty disk) in order to get it "inline" again after it is online. //millpark10
No, you should also checK slave_io_running and slave_sql_running. Maybe Seconds_Behind_Master, too. You can give this script a try. In this setup i would skip sql-errors 1007 and 1008. If you just shut down one (mysql)server, the replication restarts automaticly after the (mysql)server is back online and connected to the other (mysql)server. But this won´t work when you remove the mysql-databases. In this case you have to rebuild the replication (e.g. dump data from the other server and import it).
Florian, again, thank you for quick response and good explanation to my newbie questions. I read your links and suggestions and is getting a crasch course about all this! "No, you should also checK slave_io_running and slave_sql_running. Maybe Seconds_Behind_Master, too. You can give this script a try." I will definitely look into the script (saw it somewere before when searching howtoforge) and find a way to implement it in my setup. Also your comment "In this setup i would skip sql-errors 1007 and 1008." I understood from your blogpost it WILL make the replication to crash in not so uncommon circumstances. So as long as I "only stop" one of my two servers, the replication will repair itself and catch up on any changes that have been done during downtime. Sounds great! But, this is only ISPconfig related stuff? Not actual data? I mean /var/www and /var/mail will not have redundance in this basic setup? I have to e.g. rsync or something similiar in order to have that data mirrored and redundant? Again, sorry for asking questions that I might have read but want to be sure I understood. My goal is to make my installation as failsafe as ever possible. (Dont know if I shall start a new thread but I have to mount my smartarray and probably do some entries in fstab in order to move /var/www /var/mail and mysql data to that disk, or is there somewhere a "howto" for that as well? Not completely sure if I have all info I need to do this) //millpark10
A mysql-replication only works for mysql. To keep the data in sync (web and mail), you can use unison or drbd. If you would to offer mail on both servers at the same time, it would use drbd, to have the data allways in sync.
Florian, again, thank you for quick response! According to the guide I've been following, "Installing A Web, Email & MySQL Database Cluster On Debian 6.0 With ISPConfig 3" I have Unison installed already. (how to change that) After reading about DRBD I understand it adds a bit to redundance, although I don't fully get everything it will do for me. But failover is one important function. Is the "http://www.howtoforge.com/setting-up-network-raid1-with-drbd-on-ubuntu-11.10" up-to-date enough for me to follow? It seems as it will format and mount the two "spare disks" I have (one on each server) and make them a network mirror-set. That is good! Is there a description somewhere (searched but didnt find) on howto move /var to this new partition. Or is it best(?) to only move /var/www and /var/mail and not whole /var ? I am searching for "best practices" here in order to build as much as possible according to "standard". (I guess in the future it will be easier to get help/support with as few specials as possible. And it probably will scale better when needed.) //millpark10
Setting up a DRBD-replication it´s quiet simple. You have a volume on each server which is connected to the other server. You can find a lot of how-to´s and on drbd.org you find also a setup for using primary-primary with ocfs. NEVER sync the whole /var. If would create to volumes r0 and r1. Then put mail on r0 and web on r1 and mount the volumes to /var/vmail and /var/www.
Hi Florian - Got help to develop an improved check-master-master-replication script from Script as starting point. Not sure how to post it here, I will happily share this script with the forum. Is it ok to just attach it to a post (as .txt)? - You said: Dont understand how this setup could offer mail on the two servers at the same time. It seems to me that the mirrored(clustered) setup is (without heartbeat/keepalived) more secure but not redundant to each other? (e.g. RAID1) - Read a lot about drbd on the site but could not be sure how to configure it in my setup, As mentioned in other places, there is a possibility, with unison for /var/vmail, to loose mail. This is what I'd like to avoid. - Last question, I read your post but I can not get the error reproduced as described in my setup. If I add SLAVE_SKIP_ERRORS=1007,1008 in my.cnf it will not do anything bad, will it? Thank you for being so kind to help us newbies here in the forum. (Will subscribe to support!) //millpark10
If you use drbd in dual-primary mode you have the same date on both servers. If you _really_ can´t re-produce this error, you can ignore the SLAVE_SKIP_ERRORS in your config. I just noted that you must stop the slave on both servers (at least with mysqld 5.5.35) if you exclude the db mysql from the replication.
Hi Florian Good catch to stop both servers! (noticed you updated your page about it) Of course I get the error then. Added: slave_skip_errors=1007,1008 in /etc/mysql/my.cnf and problem "solved" (worked around). About DRBD it will be implemented on my next servers instead. Have to read more. For now I will use unison and, as I understand, it will copy from first server, at a preset time interval e.g. every 5 min, to the second server. Not optimal at all but I really have to put this cluster online! Next server cluster i build will be HA, redundant and have somekind of heartbeat/failover. I'll be happy to get inputs on how to do this. Howtos I found so far seems a bit outdated. //millpark10
Hi Florian I'm not 100% sure how it went down, but I couldn't duplicate the error exactly as described earlier in your blog page. But the error appeard when deleting a database. And with both slaves stopped the error showed up exactly as the updated blog page described. Posting the script and regarding your comment about the value "Seconds_Behind_Master: 0" I might update the script once the server is in production if it will report "value not found a lot". I guess this will depend heavily on how much load the server will have regarding replication. Probably I will search for a value outside a predefined "normal" variation instead, and then let the script post a mail. Thanks for all god advice. So, Below is the script that check if replication is up, If not, it send you a mail. Happy to hear comments about it, or other improvements. (I am running the script on both servers in the cluster, but changed "From Server1 Replication new problem" on server 2 to "From Server2 Replication new problem") //millpark10 Code: #!/bin/bash PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin #file name of errorfile filename="error_"$(date +"%Y%m%d%H%M%S")".txt" #Directory where error file will be written cd /root/scripts #Get status of replication from MySQL once output=$((echo "show slave status \G;") | mysql -u root -pYOURMYSQLPASSWORD 2>&1) #three searches for specific strings in $output, if not found this will be written to the error file echo "If you see three outputs below and one empty \"Textfile output\" then everything is working" #remove after test run ###Check Slave_IO_Running### echo "$output" | grep "Slave_IO_Running: Yes" if [ "$?" -eq "1" ]; then echo "Did not catch: \"Slave_IO_Running: Yes\"" >> $filename fi ####Check Slave_SQL_Running### echo "$output" | grep "Slave_SQL_Running: Yes" if [ "$?" -eq "1" ]; then echo "Did not catch: \"Slave_SQL_Running: Yes\"" >> $filename fi ####Check Seconds_Behind_Master### echo "$output" | grep "Seconds_Behind_Master: 0" if [ "$?" -eq "1" ]; then echo "Did not catch: \"Seconds_Behind_Master: 0\"" >> $filename fi echo "Textfile output:" #remove after test run ###Write to logfile everytime script is run, file might be manually deleted once a year or so. Grows with ~125k/month date >> /var/log/mysql_chk_replication.log ###Send notification if replication down### if [ -f $filename ]; then date >> $filename cat $filename #remove after test and uncomment row below to start sending mail when any of the three checks fails #mail -s "From Server1 Replication new problem" [email protected] < $filename fi #### Make script run every 10 min by putting uncommented line below in #### crontab -e #### */10 * * * * /root/scripts/chk_sql_ok_howtoforgepost.sh &> /dev/null #### I placed my script in /root/scripts and renamed it to chk_sql_ok_howtoforgepost.sh chmod set to rwxr-x--x
Minor update to script Here is the updated version of the script, posting more info if Seconds_Behind_Master is not zero. Code: #!/bin/bash PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin #file name of errorfile filename="error_"$(date +"%Y%m%d%H%M%S")".txt" #Directory where error file will be written cd /root/scripts #Get status of replication from MySQL once output=$((echo "show slave status \G;") | mysql -u root -pYOURMYSQLPASSWORD 2>&1) #three searches for specific strings in $output, if not found this will be written to the error file #echo "If you see three outputs below and one empty \"Textfile output\" then everything is working" #remove after test run ###Check Slave_IO_Running### echo "$output" | grep "Slave_IO_Running: Yes" if [ "$?" -eq "1" ]; then echo "Did not catch: \"Slave_IO_Running: Yes\"" >> $filename fi ####Check Slave_SQL_Running### echo "$output" | grep "Slave_SQL_Running: Yes" if [ "$?" -eq "1" ]; then echo "Did not catch: \"Slave_SQL_Running: Yes\"" >> $filename fi ####Check Seconds_Behind_Master### #echo "$output" | grep "Seconds_Behind_Master: 0" #if [ "$?" -eq "1" ]; then #echo "Did not catch: \"Seconds_Behind_Master: 0\"" >> $filename #fi latency=$(echo "$output" | grep "Seconds_Behind_Master") echo "$latency" | grep "Seconds_Behind_Master: 0" if [ "$?" -eq "1" ]; then echo "Caught delay: $latency" >> $filename fi #echo "Textfile output:" #remove after test run ###Write to logfile everytime script is run date >> /var/log/mysql_chk_replication.log ###Send notification if replication down or Seconds_Behind_Master is not 0### if [ -f $filename ]; then date >> $filename echo "$output" >> $filename #cat $filename #remove after test and uncomment row below to start sending mail when any of the three checks fails mail -s "From Server1 Replication new problem" [email protected] < $filename fi #### Make script run every 10 min by putting uncommented line below in #### crontab -e #### */10 * * * * /root/scripts/chk_sql_ok_howtoforgepost.sh &> /dev/null #### I placed my script in /root/scripts and renamed it to chk_sql_ok_howtoforgepost.sh chmod set to rwxr-x--x Do apropriate changes and put same script on second server as well. //millpark10