MySQL Cluster HOWTO - Cannot load Virtual IP address

Discussion in 'HOWTO-Related Questions' started by samu, Jan 10, 2007.

  1. samu

    samu New Member

    Hi, I followed the MySQL Cluster HOWTO (many compliments to the author!).
    The cluster nodes and the management node are set up correctly in fact from ndb_mgmd I'm able to see all the nodes connected to the cluster manager.

    I'm running Ubuntu 6.10 and I installed via apt-get the packages heartbeat-2 and ldirectord-2 from Ubuntu repositories (I need version 2 because with the packages available from ultramonkeys my system hangs).

    My problem comes when I try to configure the load balancer, my system doesn't show the virtual IP address when I run: ip addr sh.

    My configuration files are listed below:

    ha.cf
    *******************************
    logfacility local0
    auto_failback off
    bcast eth0
    mcast eth0 225.0.0.1 694 1 0
    node ron
    respawn hacluster /usr/lib/heartbeat/ipfail
    apiauth ipfail gid=haclient uid=hacluster

    haresources
    *********************************
    ron \
    LVSSyncDaemonSwap::master \
    ldirectord::ldirectord.cf \
    IPaddr2::192.168.1.65/24/eth0/192.168.1.255

    authkeys
    *********************************
    auth 3
    3 md5 myauthpassword

    ldirectord.cf
    *********************************
    # Global Directives
    checktimeout=10
    checkinterval=2
    autoreload=no
    logfile="local0"
    quiescent=yes

    virtual=192.168.1.65:3306
    service=mysql
    real=192.168.0.62:3306 gate
    real=192.168.0.100:3306 gate
    checktype=negotiate
    login="root"
    passwd="mysqlrootpassword"
    database="ldirectord"
    request="SELECT * FROM connectioncheck"
    scheduler=wrr

    I also set in /etc/sysctl.conf, net.ipv4.ip_forward=1.

    I have the cluster manager on 192.168.1.61 and I want the virtual IP to be 192.168.1.65.
    But I'm not able to see any virtual IP address.

    #ip addr sh
    ...
    2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:02:3f:be:13:95 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0
    inet6 fe80::202:3fff:febe:1395/64 scope link
    valid_lft forever preferred_lft forever


    Someone has any ideas of where is the problem?

    Thanks for any help,
    Samuele.
     
  2. sohaileo

    sohaileo New Member

    Dear samuele,

    First of all check the ha-log file which will you find in log directory, i.e /var/log. Whenever your heartbeat comes up and running it writes very useful information in form of logs. With the help of this you can diagnose the problem.
    You can do this that restart your heartbeat and paste tail -n 50 here to check what is going on your configuration.

    Regards,
     
  3. samu

    samu New Member

    Thanks for your answer.
    I checked /var/log but there's no ha-log file.

    I started heartbeat:
    root@ron:/etc/ha.d# /etc/init.d/heartbeat start
    Starting High-Availability services:
    2007/01/10_13:41:29 INFO: IPaddr2 Resource is stopped
    Done.

    And tail -f /var/log/messages says:
    Jan 10 13:41:28 localhost logd: [7047]: info: logd started with default configuration.
    Jan 10 13:41:28 localhost logd: [7047]: WARN: Core dumps could be lost if multiple dumps occur
    Jan 10 13:41:28 localhost logd: [7047]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
    Jan 10 13:41:28 localhost logd: [7048]: info: G_main_add_SignalHandler: Added signal handler for signal 15
    Jan 10 13:41:28 localhost logd: [7047]: info: G_main_add_SignalHandler: Added signal handler for signal 15
    Jan 10 13:41:29 localhost heartbeat: [7199]: WARN: Core dumps could be lost if multiple dumps occur
    Jan 10 13:41:29 localhost heartbeat: [7199]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
    Jan 10 13:41:29 localhost heartbeat: [7199]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
    Jan 10 13:41:29 localhost heartbeat: [7199]: info: **************************
    Jan 10 13:41:29 localhost heartbeat: [7199]: info: Configuration validated. Starting heartbeat 2.0.7
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: heartbeat: version 2.0.7
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: Heartbeat generation: 18
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: G_main_add_SignalHandler: Added signal handler for signal 17
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: Comm_now_up(): updating status to active
    Jan 10 13:41:29 localhost heartbeat: [7200]: info: Local status now set to: 'active'


    I try these commands to check if heartbeat is running:

    root@ron:/etc/ha.d# cl_status hbstatus
    Heartbeat is stopped on this machine.

    root@ron:/etc/ha.d# /etc/ha.d/resource.d/LVSSyncDaemonSwap master eth0 status
    master stopped

    root@ron:/etc/ha.d# ldirectord /etc/ha.d/ldirectord.cf status
    ldirectord is stopped for /etc/ha.d/ldirectord.cf

    It seems that heartbeat is not running... but why? I've just started it...
     
  4. sohaileo

    sohaileo New Member

    Well check this is the one i've using for my mysql HA.

    Code:
    #
    debugfile /var/log/ha-debug
    #
    logfile /var/log/ha-log
    #
    logfacility local0
    #
    keepalive 2
    #
    deadtime 30
    #
    ucast eth1 10.0.0.156
    #
    auto_failback off
    #
    node db1
    node db2
    #
    ping RouterIP
    #
    respawn hacluster /usr/lib/heartbeat/ipfail
    #
    
    May this help you regarding your configuration.

    Regards,
     
  5. samu

    samu New Member

    I've added to my ha.cf the following lines:
    debugfile /var/log/ha-debug
    logfile /var/log/ha-log

    And now I have the heartbeat log files ha-log and ha-debug in /var/log.

    I restarted heartbeat and this is the content of the log files:

    ha-log:
    -------

    heartbeat[13819]: 2007/01/10_22:44:42 WARN: Core dumps could be lost if multiple dumps occur
    heartbeat[13819]: 2007/01/10_22:44:42 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
    heartbeat[13819]: 2007/01/10_22:44:42 WARN: Logging daemon is disabled --enabling logging daemon is recommended
    heartbeat[13819]: 2007/01/10_22:44:42 info: **************************
    heartbeat[13819]: 2007/01/10_22:44:42 info: Configuration validated. Starting heartbeat 2.0.7
    heartbeat[13820]: 2007/01/10_22:44:42 info: heartbeat: version 2.0.7
    heartbeat[13820]: 2007/01/10_22:44:42 info: Heartbeat generation: 21
    heartbeat[13820]: 2007/01/10_22:44:42 info: G_main_add_TriggerHandler: Added signal manual handler
    heartbeat[13820]: 2007/01/10_22:44:42 info: G_main_add_TriggerHandler: Added signal manual handler
    heartbeat[13820]: 2007/01/10_22:44:42 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
    heartbeat[13820]: 2007/01/10_22:44:42 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
    heartbeat[13820]: 2007/01/10_22:44:42 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
    heartbeat[13820]: 2007/01/10_22:44:42 info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
    heartbeat[13820]: 2007/01/10_22:44:42 info: G_main_add_SignalHandler: Added signal handler for signal 17
    heartbeat[13820]: 2007/01/10_22:44:42 info: Comm_now_up(): updating status to active
    heartbeat[13820]: 2007/01/10_22:44:42 info: Local status now set to: 'active'
    heartbeat[13820]: 2007/01/10_22:44:42 ERROR: socket_wait_conn_new: trying to create in /var/run/heartbeat/register bind:: No such file or directory
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Emergency Shutdown: Master Control process died.
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13820 with SIGTERM
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13824 with SIGTERM
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13825 with SIGTERM
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13826 with SIGTERM
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Killing pid 13827 with SIGTERM
    heartbeat[13823]: 2007/01/10_22:44:44 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.

    (the content o file ha-debug is equal)
    There's an error, that's why heartbeat doesn't start...
    But I have no idea of what kind of error it is...
     
  6. samu

    samu New Member

    Ok, browsing on google I found a patch to apply to the /etc/init.d/heartbeat file in order to create the directories that heartbeat needs in /var/run.

    I've added these lines in file /etc/init.d/heartbeat inside the function StartHA() :
    if [ ! -d $RUNDIR/heartbeat ]; then
    mkdir -p $RUNDIR/heartbeat/{ccm,crm}
    chown -R hacluster:haclient $RUNDIR/heartbeat
    chmod -R 750 $RUNDIR/heartbeat
    fi

    Ok, now restarting heartbeat I get NO ERROR on ha-log file.
    root@ron:/home/sam# /etc/init.d/heartbeat start
    Starting High-Availability services:
    2007/01/10_23:13:17 INFO: IPaddr2 Resource is stopped
    Done.

    And now running:
    root@ron:/home/sam# ps aux | grep heartbeat
    root 14866 0.0 2.4 12516 12516 ? SLs 23:13 0:00 heartbeat: master control process
    nobody 14869 0.0 1.1 5920 5920 ? SL 23:13 0:00 heartbeat: FIFO reader
    nobody 14870 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: write: bcast eth0
    nobody 14871 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: read: bcast eth0
    nobody 14872 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: write: mcast eth0
    nobody 14873 0.0 1.1 5916 5916 ? SL 23:13 0:00 heartbeat: read: mcast eth0
    113 14874 0.0 0.2 4196 1424 ? S 23:13 0:00 /usr/lib/heartbeat/ipfail
    root@ron:/home/sam# cl_status hbstatus
    Heartbeat is running on this machine.

    But the problem is still not solved in fact:
    root@ron:/home/sam# /etc/ha.d/resource.d/LVSSyncDaemonSwap master eth0 status
    master stopped
    root@ron:/home/sam# ldirectord ldirectord.cf status
    ldirectord is stopped for /etc/ha.d/ldirectord.cf
    root@ron:/home/sam# ip addr sh eth0
    2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:02:3f:be:13:95 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.61/24 brd 192.168.1.255 scope global eth0
    inet6 fe80::202:3fff:febe:1395/64 scope link
    valid_lft forever preferred_lft forever

    And this is the tail -f /var/log/messages:
    Jan 10 23:13:17 localhost heartbeat: [14865]: WARN: Core dumps could be lost if multiple dumps occur
    Jan 10 23:13:17 localhost heartbeat: [14865]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
    Jan 10 23:13:17 localhost heartbeat: [14865]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
    Jan 10 23:13:17 localhost heartbeat: [14865]: info: **************************
    Jan 10 23:13:17 localhost heartbeat: [14865]: info: Configuration validated. Starting heartbeat 2.0.7
    Jan 10 23:13:17 localhost heartbeat: [14866]: info: heartbeat: version 2.0.7
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: Heartbeat generation: 23
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: G_main_add_TriggerHandler: Added signal manual handler
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: G_main_add_SignalHandler: Added signal handler for signal 17
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: Comm_now_up(): updating status to active
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: Local status now set to: 'active'
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: Starting child client "/usr/lib/heartbeat/ipfail" (113,117)
    Jan 10 23:13:18 localhost heartbeat: [14866]: info: Local status now set to: 'up'
    Jan 10 23:13:18 localhost heartbeat: [14874]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 113 gid 117 (pid 14874)
    Jan 10 23:13:19 localhost heartbeat: [14866]: info: Link ron:eth0 up.
    Jan 10 23:13:22 localhost ipfail: [14874]: info: Link Status update: Link ron/eth0 now has status up

    I can't understand why it does not work...

    These are the config files:

    ha.cf
    ****
    debugfile /var/log/ha-debug
    logfile /var/log/ha-log
    logfacility local0
    auto_failback off
    bcast eth0
    mcast eth0 225.0.0.1 694 1 0
    node ron
    respawn hacluster /usr/lib/heartbeat/ipfail

    haresources
    *********
    ron IPaddr2::192.168.1.65/24/eth0/192.168.1.255 LVSSyncDaemonSwap::master::eth0 ldirectord::ldirectord.cf

    ldirectord.cf
    *********
    checktimeout=10
    checkinterval=2
    autoreload=no
    logfile="local0"
    quiescent=yes
    virtual=192.168.1.65:3306
    service=mysql
    real=192.168.0.62:3306 gate
    real=192.168.0.100:3306 gate
    checktype=negotiate
    login="root"
    passwd="mysqlrootpassword"
    database="ldirectord"
    request="SELECT * FROM connectioncheck"
    scheduler=wrr

    Any ideas to make it work?
     
  7. sohaileo

    sohaileo New Member

    Ok There is some problem that heartbeat is not taking resources from haresources file.... because when you start heartbeat it says IPaddr2 is stopped. More do the following to take resources manually.
    run the following command.

    /usr/lib/heartbeat/hb_takeover all

    This script is using by heartbeat to take resources. Then see what happened...also check logs... too

    Regards,
     

Share This Page