Hi, I followed the MySQL Cluster HOWTO (many compliments to the author!). The cluster nodes and the management node are set up correctly in fact from ndb_mgmd I'm able to see all the nodes connected to the cluster manager. I'm running Ubuntu 6.10 and I installed via apt-get the packages heartbeat-2 and ldirectord-2 from Ubuntu repositories (I need version 2 because with the packages available from ultramonkeys my system hangs). My problem comes when I try to configure the load balancer, my system doesn't show the virtual IP address when I run: ip addr sh. My configuration files are listed below: ha.cf ******************************* logfacility local0 auto_failback off bcast eth0 mcast eth0 225.0.0.1 694 1 0 node ron respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster haresources ********************************* ron \ LVSSyncDaemonSwap::master \ ldirectord::ldirectord.cf \ IPaddr2::192.168.1.65/24/eth0/192.168.1.255 authkeys ********************************* auth 3 3 md5 myauthpassword ldirectord.cf ********************************* # Global Directives checktimeout=10 checkinterval=2 autoreload=no logfile="local0" quiescent=yes virtual=192.168.1.65:3306 service=mysql real=192.168.0.62:3306 gate real=192.168.0.100:3306 gate checktype=negotiate login="root" passwd="mysqlrootpassword" database="ldirectord" request="SELECT * FROM connectioncheck" scheduler=wrr I also set in /etc/sysctl.conf, net.ipv4.ip_forward=1. I have the cluster manager on 192.168.1.61 and I want the virtual IP to be 192.168.1.65. But I'm not able to see any virtual IP address. #ip addr sh ... 2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:02:3f:be:13:95 brd ff:ff:ff:ff:ff:ff inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0 inet6 fe80::202:3fff:febe:1395/64 scope link valid_lft forever preferred_lft forever Someone has any ideas of where is the problem? Thanks for any help, Samuele.
Dear samuele, First of all check the ha-log file which will you find in log directory, i.e /var/log. Whenever your heartbeat comes up and running it writes very useful information in form of logs. With the help of this you can diagnose the problem. You can do this that restart your heartbeat and paste tail -n 50 here to check what is going on your configuration. Regards,
Thanks for your answer. I checked /var/log but there's no ha-log file. I started heartbeat: root@ron:/etc/ha.d# /etc/init.d/heartbeat start Starting High-Availability services: 2007/01/10_13:41:29 INFO: IPaddr2 Resource is stopped Done. And tail -f /var/log/messages says: Jan 10 13:41:28 localhost logd: [7047]: info: logd started with default configuration. Jan 10 13:41:28 localhost logd: [7047]: WARN: Core dumps could be lost if multiple dumps occur Jan 10 13:41:28 localhost logd: [7047]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Jan 10 13:41:28 localhost logd: [7048]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jan 10 13:41:28 localhost logd: [7047]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Jan 10 13:41:29 localhost heartbeat: [7199]: WARN: Core dumps could be lost if multiple dumps occur Jan 10 13:41:29 localhost heartbeat: [7199]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Jan 10 13:41:29 localhost heartbeat: [7199]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jan 10 13:41:29 localhost heartbeat: [7199]: info: ************************** Jan 10 13:41:29 localhost heartbeat: [7199]: info: Configuration validated. Starting heartbeat 2.0.7 Jan 10 13:41:29 localhost heartbeat: [7200]: info: heartbeat: version 2.0.7 Jan 10 13:41:29 localhost heartbeat: [7200]: info: Heartbeat generation: 18 Jan 10 13:41:29 localhost heartbeat: [7200]: info: G_main_add_TriggerHandler: Added signal manual handler Jan 10 13:41:29 localhost heartbeat: [7200]: info: G_main_add_TriggerHandler: Added signal manual handler Jan 10 13:41:29 localhost heartbeat: [7200]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Jan 10 13:41:29 localhost heartbeat: [7200]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 Jan 10 13:41:29 localhost heartbeat: [7200]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 Jan 10 13:41:29 localhost heartbeat: [7200]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) Jan 10 13:41:29 localhost heartbeat: [7200]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jan 10 13:41:29 localhost heartbeat: [7200]: info: Comm_now_up(): updating status to active Jan 10 13:41:29 localhost heartbeat: [7200]: info: Local status now set to: 'active' I try these commands to check if heartbeat is running: root@ron:/etc/ha.d# cl_status hbstatus Heartbeat is stopped on this machine. root@ron:/etc/ha.d# /etc/ha.d/resource.d/LVSSyncDaemonSwap master eth0 status master stopped root@ron:/etc/ha.d# ldirectord /etc/ha.d/ldirectord.cf status ldirectord is stopped for /etc/ha.d/ldirectord.cf It seems that heartbeat is not running... but why? I've just started it...
Well check this is the one i've using for my mysql HA. Code: # debugfile /var/log/ha-debug # logfile /var/log/ha-log # logfacility local0 # keepalive 2 # deadtime 30 # ucast eth1 10.0.0.156 # auto_failback off # node db1 node db2 # ping RouterIP # respawn hacluster /usr/lib/heartbeat/ipfail # May this help you regarding your configuration. Regards,
I've added to my ha.cf the following lines: debugfile /var/log/ha-debug logfile /var/log/ha-log And now I have the heartbeat log files ha-log and ha-debug in /var/log. I restarted heartbeat and this is the content of the log files: ha-log: ------- heartbeat[13819]: 2007/01/10_22:44:42 WARN: Core dumps could be lost if multiple dumps occur heartbeat[13819]: 2007/01/10_22:44:42 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability heartbeat[13819]: 2007/01/10_22:44:42 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[13819]: 2007/01/10_22:44:42 info: ************************** heartbeat[13819]: 2007/01/10_22:44:42 info: Configuration validated. Starting heartbeat 2.0.7 heartbeat[13820]: 2007/01/10_22:44:42 info: heartbeat: version 2.0.7 heartbeat[13820]: 2007/01/10_22:44:42 info: Heartbeat generation: 21 heartbeat[13820]: 2007/01/10_22:44:42 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[13820]: 2007/01/10_22:44:42 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[13820]: 2007/01/10_22:44:42 info: Removing /var/run/heartbeat/rsctmp failed, recreating. heartbeat[13820]: 2007/01/10_22:44:42 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat[13820]: 2007/01/10_22:44:42 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 heartbeat[13820]: 2007/01/10_22:44:42 info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) heartbeat[13820]: 2007/01/10_22:44:42 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[13820]: 2007/01/10_22:44:42 info: Comm_now_up(): updating status to active heartbeat[13820]: 2007/01/10_22:44:42 info: Local status now set to: 'active' heartbeat[13820]: 2007/01/10_22:44:42 ERROR: socket_wait_conn_new: trying to create in /var/run/heartbeat/register bind:: No such file or directory heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Emergency Shutdown: Master Control process died. heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13820 with SIGTERM heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13824 with SIGTERM heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13825 with SIGTERM heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13826 with SIGTERM heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13827 with SIGTERM heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Emergency Shutdown(MCP dead): Killing ourselves. (the content o file ha-debug is equal) There's an error, that's why heartbeat doesn't start... But I have no idea of what kind of error it is...
Ok, browsing on google I found a patch to apply to the /etc/init.d/heartbeat file in order to create the directories that heartbeat needs in /var/run. I've added these lines in file /etc/init.d/heartbeat inside the function StartHA() : if [ ! -d $RUNDIR/heartbeat ]; then mkdir -p $RUNDIR/heartbeat/{ccm,crm} chown -R hacluster:haclient $RUNDIR/heartbeat chmod -R 750 $RUNDIR/heartbeat fi Ok, now restarting heartbeat I get NO ERROR on ha-log file. root@ron:/home/sam# /etc/init.d/heartbeat start Starting High-Availability services: 2007/01/10_23:13:17 INFO: IPaddr2 Resource is stopped Done. And now running: root@ron:/home/sam# ps aux | grep heartbeat root 14866 0.0 2.4 12516 12516 ? SLs 23:13 0:00 heartbeat: master control process nobody 14869 0.0 1.1 5920 5920 ? SL 23:13 0:00 heartbeat: FIFO reader nobody 14870 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: write: bcast eth0 nobody 14871 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: read: bcast eth0 nobody 14872 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: write: mcast eth0 nobody 14873 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: read: mcast eth0 113 14874 0.0 0.2 4196 1424 ? S 23:13 0:00 /usr/lib/heartbeat/ipfail root@ron:/home/sam# cl_status hbstatus Heartbeat is running on this machine. But the problem is still not solved in fact: root@ron:/home/sam# /etc/ha.d/resource.d/LVSSyncDaemonSwap master eth0 status master stopped root@ron:/home/sam# ldirectord ldirectord.cf status ldirectord is stopped for /etc/ha.d/ldirectord.cf root@ron:/home/sam# ip addr sh eth0 2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:02:3f:be:13:95 brd ff:ff:ff:ff:ff:ff inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0 inet6 fe80::202:3fff:febe:1395/64 scope link valid_lft forever preferred_lft forever And this is the tail -f /var/log/messages: Jan 10 23:13:17 localhost heartbeat: [14865]: WARN: Core dumps could be lost if multiple dumps occur Jan 10 23:13:17 localhost heartbeat: [14865]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Jan 10 23:13:17 localhost heartbeat: [14865]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jan 10 23:13:17 localhost heartbeat: [14865]: info: ************************** Jan 10 23:13:17 localhost heartbeat: [14865]: info: Configuration validated. Starting heartbeat 2.0.7 Jan 10 23:13:17 localhost heartbeat: [14866]: info: heartbeat: version 2.0.7 Jan 10 23:13:18 localhost heartbeat: [14866]: info: Heartbeat generation: 23 Jan 10 23:13:18 localhost heartbeat: [14866]: info: G_main_add_TriggerHandler: Added signal manual handler Jan 10 23:13:18 localhost heartbeat: [14866]: info: G_main_add_TriggerHandler: Added signal manual handler Jan 10 23:13:18 localhost heartbeat: [14866]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Jan 10 23:13:18 localhost heartbeat: [14866]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 Jan 10 23:13:18 localhost heartbeat: [14866]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 Jan 10 23:13:18 localhost heartbeat: [14866]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) Jan 10 23:13:18 localhost heartbeat: [14866]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jan 10 23:13:18 localhost heartbeat: [14866]: info: Comm_now_up(): updating status to active Jan 10 23:13:18 localhost heartbeat: [14866]: info: Local status now set to: 'active' Jan 10 23:13:18 localhost heartbeat: [14866]: info: Starting child client "/usr/lib/heartbeat/ipfail" (113,117) Jan 10 23:13:18 localhost heartbeat: [14866]: info: Local status now set to: 'up' Jan 10 23:13:18 localhost heartbeat: [14874]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 113 gid 117 (pid 14874) Jan 10 23:13:19 localhost heartbeat: [14866]: info: Link ron:eth0 up. Jan 10 23:13:22 localhost ipfail: [14874]: info: Link Status update: Link ron/eth0 now has status up I can't understand why it does not work... These are the config files: ha.cf **** debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 auto_failback off bcast eth0 mcast eth0 225.0.0.1 694 1 0 node ron respawn hacluster /usr/lib/heartbeat/ipfail haresources ********* ron IPaddr2::192.168.1.65/24/eth0/192.168.1.255 LVSSyncDaemonSwap::master::eth0 ldirectord::ldirectord.cf ldirectord.cf ********* checktimeout=10 checkinterval=2 autoreload=no logfile="local0" quiescent=yes virtual=192.168.1.65:3306 service=mysql real=192.168.0.62:3306 gate real=192.168.0.100:3306 gate checktype=negotiate login="root" passwd="mysqlrootpassword" database="ldirectord" request="SELECT * FROM connectioncheck" scheduler=wrr Any ideas to make it work?
Ok There is some problem that heartbeat is not taking resources from haresources file.... because when you start heartbeat it says IPaddr2 is stopped. More do the following to take resources manually. run the following command. /usr/lib/heartbeat/hb_takeover all This script is using by heartbeat to take resources. Then see what happened...also check logs... too Regards,