Load-Balanced MySQL Cluster Error with Cluster

Discussion in 'HOWTO-Related Questions' started by stylez, Sep 25, 2006.

  1. stylez

    stylez New Member

    Hello,

    I've followed the tutorial on how to setup a load-balanced Mysql Cluster and everything seems to be working fine but just recently as I checked up on the services, one of the mysql-cluster isn't being recognized by ndb_mgm app. I've had this problem twice before and I thought I misconfigured it and reinstall the whole system on VM's, I thought I solved it but it seems to be reoccuring after a few days of completing the setup.

    Here is my configuration for the 5 machines: (note all VMs)

    sql-1 172.30.0.7 (runs ndbd and mysql)
    sq-2 172.30.0.8 (runs ndbd and mysql)
    loadb-1 172.30.0.110 (runs lb1 and ndb_mgm) [active]
    loadb-2 172.30.0.9 (runs lb2) [passive]

    virtual IP for cluster: 172.30.0.111

    I can ping the virtual IP, I can access the mysql db's from 0.7 and 0.8 but when I try from 0.111, I get an error trying to connect.

    Here's the output from show in ndb_mgm

    Cluster Configuration
    ---------------------
    [ndbd(NDB)] 2 node(s)
    id=2 @172.30.0.7 (Version: 4.1.21, Nodegroup: 0, Master)
    id=3 @172.30.0.8 (Version: 4.1.21, Nodegroup: 0)

    [ndb_mgmd(MGM)] 1 node(s)
    id=1 @172.30.0.110 (Version: 4.1.21)

    [mysqld(API)] 2 node(s)
    id=4 (not connected, accepting connect from any host)
    id=5 @172.30.0.8 (Version: 4.1.21)


    I've restarted mysql on 0.7 and it seems to run fine, but ndb_mgm doesn't see it and even so, 0.8 is running it fine but I still can't connect. Everything worked last week when I completed the setup and I don't know what else I could do to check what may be erroring so that the cluster isn't working. Loadb-1 is the active load-balancers and it should direct the db to sql-2 but it doesn't seem to. I ran all the checks found on http://www.howtoforge.com/loadbalanced_mysql_cluster_debian_p8 and it all checks out fine and the active loadb-1 has the ip 172.30.0.111 as the virutal. If anyone has experience this or could shed some light on what I might be doing wrong that would be great. As I said, everything work 100% when I completed the inital install and I even tested when a single cluster and load balancer would go down, and it worked as the tutorial stated.
     
  2. falko

    falko Super Moderator Howtoforge Staff

  3. stylez

    stylez New Member

    Command "ip addr sh eth0"

    loadb-1:
    2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:a7:30:cf brd ff:ff:ff:ff:ff:ff
    inet 172.30.0.110/24 brd 172.30.0.255 scope global eth0
    inet 172.30.0.111/24 brd 172.30.0.255 scope global secondary eth0

    loadb-2
    2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:1f:46:fd brd ff:ff:ff:ff:ff:ff
    inet 172.30.0.9/24 brd 172.30.0.255 scope global eth0


    Command "ldirectord ldirectord.cf status"

    loadb-1:
    ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 919


    loadb-2:
    ldirectord is stopped for /etc/ha.d/ldirectord.cf

    Command: "

    loadb-1: "ipvsadm -L -n"
    IP Virtual Server version 1.0.11 (size=4096)
    Prot LocalAddress:port Scheduler Flags
    -> RemoteAddress:port Forward Weight ActiveConn InActConn
    TCP 172.30.0.111:3306 wrr
    -> 172.30.0.8:3306 Route 0 0 0
    -> 172.30.0.7:3306 Route 0 0 0

    loadb-2:
    IP Virtual Server version 1.0.11 (size=4096)
    Prot LocalAddress:port Scheduler Flags
    -> RemoteAddress:port Forward Weight ActiveConn InActConn


    Command: "/etc/ha.d/resource.d/LVSSyncDaemonSwap master status"

    loadb-1:
    master running
    (ipvs_syncmaster pid: 1046)


    loadb-2:
    master stopped



    Everything seems to check out but I'm still unable to connect. When I first installed the app and tested ndb_mgm, both NDB's show up, ndb MGM shows up and so does both MYSQLD. Now when I run a show all I get this the following:

    [ndbd(NDB)] 2 node(s)
    id=2 @172.30.0.7 (Version: 4.1.21, Nodegroup: 0)
    id=3 @172.30.0.8 (Version: 4.1.21, Nodegroup: 0, Master)

    [ndb_mgmd(MGM)] 1 node(s)
    id=1 @172.30.0.110 (Version: 4.1.21)

    [mysqld(API)] 2 node(s)
    id=4 @172.30.0.8 (Version: 4.1.21)
    id=5 (not connected, accepting connect from any host)


    You can see that 172.30.0.7 mysqld isn't showing up, but it's running on 0.7 and I can access the mysql directly from it.
     
  4. falko

    falko Super Moderator Howtoforge Staff

    What's the output of
    Code:
    netstat -tap
    and
    Code:
    df -h
    on 172.30.0.7? Are there any errors in the logs on 172.30.0.7?
     
  5. stylez

    stylez New Member

    sql-1:~# netstat -tap
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
    tcp 0 0 *:mysql *:* LISTEN 27158/mysqld
    tcp 0 0 *:www *:* LISTEN 813/apache2
    tcp 0 0 *:ssh *:* LISTEN 800/sshd
    tcp 0 0 sql-1.localdomain:2202 *:* LISTEN 27099/ndbd
    tcp 0 0 sql-1.localdomain:35463 172.30.0.110:1186 ESTABLISHED27098/ndbd
    tcp 0 0 sql-1.localdomain:35466 172.30.0.110:1186 ESTABLISHED27158/mysqld
    tcp 0 0 sql-1.localdomain:mysql 172.30.0.110:56547 TIME_WAIT -
    tcp 0 0 sql-1.localdomain:2202 172.30.0.8:49152 ESTABLISHED27099/ndbd
    tcp 0 0 sql-1.localdomain:mysql 172.30.0.110:56521 TIME_WAIT -
    tcp 0 148 sql-1.localdomain:ssh 172.30.0.2:1800 ESTABLISHED18132/0
    tcp 0 0 sql-1.localdomain:35465 172.30.0.110:2202 ESTABLISHED27099/ndbd
    tcp 0 0 sql-1.localdomain:2202 172.30.0.8:49149 ESTABLISHED27099/ndbd
    tcp 0 0 sql-1.localdomain:35468 172.30.0.8:2202 ESTABLISHED27158/mysqld

    sql-1:~# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda1 883M 424M 412M 51% /
    tmpfs 126M 0 126M 0% /dev/shm

    (The sql data I'm storing will be < 1mb in total, it's just user's ftp login information)

    I've checked the logs and nothing seems out of place, there are no errors being thrown.
     
  6. falko

    falko Super Moderator Howtoforge Staff

    What's in /etc/fstab? I could imagine it's a problem with your disk space or memory as a MySQL cluster needs lots of memory...
     
  7. stylez

    stylez New Member

    sql-1:~# cat /etc/fstab
    # /etc/fstab: static file system information.
    #
    # <file system> <mount point> <type> <options> <dump> <pass>
    proc /proc proc defaults 0 0
    /dev/sda1 / ext3 defaults,errors=remount-ro 0 1
    /dev/sda5 none swap sw 0 0
    /dev/hda /media/cdrom0 iso9660 ro,user,noauto 0 0
    /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
     
  8. falko

    falko Super Moderator Howtoforge Staff

    You don't have much swap (only 126MB). And if your memory is low that could cause a problem... What's the output of
    Code:
    cat /proc/meminfo
    ?
     
  9. stylez

    stylez New Member

    sql-1:~# cat /proc/meminfo
    total: used: free: shared: buffers: cached:
    Mem: 263208960 256610304 6598656 0 25546752 80146432
    Swap: 82210816 0 82210816
    MemTotal: 257040 kB
    MemFree: 6444 kB
    MemShared: 0 kB
    Buffers: 24948 kB
    Cached: 78268 kB
    SwapCached: 0 kB
    Active: 59388 kB
    Inactive: 163688 kB
    HighTotal: 0 kB
    HighFree: 0 kB
    LowTotal: 257040 kB
    LowFree: 6444 kB
    SwapTotal: 80284 kB
    SwapFree: 80284 kB


    So you think I should bump up the memory? I default these VM's to have about 256mb of ram. I didn't think that the cluster would require much since its not hold much information.
     
  10. stylez

    stylez New Member

    So I bumped up the memory on both sql-1 and sql-2 to 512mb of ram.

    sql-1:~# cat /proc/meminfo
    total: used: free: shared: buffers: cached:
    Mem: 528752640 223784960 304967680 0 14512128 72069120
    Swap: 82210816 0 82210816
    MemTotal: 516360 kB
    MemFree: 297820 kB
    MemShared: 0 kB
    Buffers: 14172 kB
    Cached: 70380 kB
    SwapCached: 0 kB
    Active: 40808 kB
    Inactive: 161348 kB
    HighTotal: 0 kB
    HighFree: 0 kB
    LowTotal: 516360 kB
    LowFree: 297820 kB
    SwapTotal: 80284 kB
    SwapFree: 80284 kB


    Still no change.
     
  11. stylez

    stylez New Member

    So I've looked into the issue abit more, when I try to access the connectioncheck table I get the following message:

    ERROR 1105 (HY000): Failed to open 'connectioncheck', error while unpacking from engine

    Also since I'm running VM's, I always ssh to the machine and didn't realize there was an error getting printed to the console.

    DBI connect('database=ldirectordb;host=172.30.0.140:port3306','ldirector',...) failed: Unknown database 'ldirectordb' at /etc/ha.d/resource.d/ldirector line 1950

    I saw that people were having this issue after restarting their cluster

    http://forums.mysql.com/read.php?25,80009,80009

    I wasn't sure if you've seen this before, because when you start from scratch it works, but after 1 reboot, it seems that the database somehow gets corrupted or something. I've tried dropping the database, but still doesn't work.
     
  12. falko

    falko Super Moderator Howtoforge Staff

    Strange, I never had these problems after a reboot... Maybe you need to update your Perl-DBI module?
     
  13. stylez

    stylez New Member

    The perl DBI modules are the latest version. This really sucks because it works when I first have it initially setup. It's only after a restart, the sql db seems to get corrupted and the active load balancer will start to throw the error about the connectioncheck table error.
     

Share This Page