Load-Balanced MySQL Cluster Error with Cluster

    I've followed the tutorial on how to setup a load-balanced Mysql Cluster and everything seems to be working fine but just recently as I checked up on the services, one of the mysql-cluster isn't being recognized by ndb_mgm app. I've had this problem twice before and I thought I misconfigured it and reinstall the whole system on VM's, I thought I solved it but it seems to be reoccuring after a few days of completing the setup.

    Here is my configuration for the 5 machines: (note all VMs)

    sql-1 (runs ndbd and mysql)
    sq-2 (runs ndbd and mysql)
    loadb-1 (runs lb1 and ndb_mgm) [active]
    loadb-2 (runs lb2) [passive]

    virtual IP for cluster:

    I can ping the virtual IP, I can access the mysql db's from 0.7 and 0.8 but when I try from 0.111, I get an error trying to connect.

    Here's the output from show in ndb_mgm

    Cluster Configuration
    [ndbd(NDB)] 2 node(s)
    id=2 @ (Version: 4.1.21, Nodegroup: 0, Master)
    id=3 @ (Version: 4.1.21, Nodegroup: 0)

    [ndb_mgmd(MGM)] 1 node(s)
    id=1 @ (Version: 4.1.21)

    [mysqld(API)] 2 node(s)
    id=4 (not connected, accepting connect from any host)
    id=5 @ (Version: 4.1.21)

    I've restarted mysql on 0.7 and it seems to run fine, but ndb_mgm doesn't see it and even so, 0.8 is running it fine but I still can't connect. Everything worked last week when I completed the setup and I don't know what else I could do to check what may be erroring so that the cluster isn't working. Loadb-1 is the active load-balancers and it should direct the db to sql-2 but it doesn't seem to. I ran all the checks found on http://www.howtoforge.com/loadbalanced_mysql_cluster_debian_p8 and it all checks out fine and the active loadb-1 has the ip as the virutal. If anyone has experience this or could shed some light on what I might be doing wrong that would be great. As I said, everything work 100% when I completed the inital install and I even tested when a single cluster and load balancer would go down, and it worked as the tutorial stated.
    Command "ip addr sh eth0"

    2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:a7:30:cf brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth0
    inet brd scope global secondary eth0

    2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:1f:46:fd brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth0

    Command "ldirectord ldirectord.cf status"

    ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 919

    ldirectord is stopped for /etc/ha.d/ldirectord.cf

    Command: "

    loadb-1: "ipvsadm -L -n"
    IP Virtual Server version 1.0.11 (size=4096)
    Prot LocalAddress:port Scheduler Flags
    -> RemoteAddress:port Forward Weight ActiveConn InActConn
    TCP wrr
    -> Route 0 0 0
    -> Route 0 0 0

    IP Virtual Server version 1.0.11 (size=4096)
    Prot LocalAddress:port Scheduler Flags
    -> RemoteAddress:port Forward Weight ActiveConn InActConn

    Command: "/etc/ha.d/resource.d/LVSSyncDaemonSwap master status"

    master running
    (ipvs_syncmaster pid: 1046)

    master stopped

    Everything seems to check out but I'm still unable to connect. When I first installed the app and tested ndb_mgm, both NDB's show up, ndb MGM shows up and so does both MYSQLD. Now when I run a show all I get this the following:

    [ndbd(NDB)] 2 node(s)
    id=2 @ (Version: 4.1.21, Nodegroup: 0)
    id=3 @ (Version: 4.1.21, Nodegroup: 0, Master)

    [ndb_mgmd(MGM)] 1 node(s)
    id=1 @ (Version: 4.1.21)

    [mysqld(API)] 2 node(s)
    id=4 @ (Version: 4.1.21)
    id=5 (not connected, accepting connect from any host)

    You can see that mysqld isn't showing up, but it's running on 0.7 and I can access the mysql directly from it.
    What's the output of
    netstat -tap
    df -h
    on Are there any errors in the logs on
    sql-1:~# netstat -tap
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
    tcp 0 0 *:mysql *:* LISTEN 27158/mysqld
    tcp 0 0 *:www *:* LISTEN 813/apache2
    tcp 0 0 *:ssh *:* LISTEN 800/sshd
    tcp 0 0 sql-1.localdomain:2202 *:* LISTEN 27099/ndbd
    tcp 0 0 sql-1.localdomain:35463 ESTABLISHED27098/ndbd
    tcp 0 0 sql-1.localdomain:35466 ESTABLISHED27158/mysqld
    tcp 0 0 sql-1.localdomain:mysql TIME_WAIT -
    tcp 0 0 sql-1.localdomain:2202 ESTABLISHED27099/ndbd
    tcp 0 0 sql-1.localdomain:mysql TIME_WAIT -
    tcp 0 148 sql-1.localdomain:ssh ESTABLISHED18132/0
    tcp 0 0 sql-1.localdomain:35465 ESTABLISHED27099/ndbd
    tcp 0 0 sql-1.localdomain:2202 ESTABLISHED27099/ndbd
    tcp 0 0 sql-1.localdomain:35468 ESTABLISHED27158/mysqld

    sql-1:~# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda1 883M 424M 412M 51% /
    tmpfs 126M 0 126M 0% /dev/shm

    (The sql data I'm storing will be < 1mb in total, it's just user's ftp login information)

    I've checked the logs and nothing seems out of place, there are no errors being thrown.
    What's in /etc/fstab? I could imagine it's a problem with your disk space or memory as a MySQL cluster needs lots of memory...
    sql-1:~# cat /etc/fstab
    # /etc/fstab: static file system information.
    # <file system> <mount point> <type> <options> <dump> <pass>
    proc /proc proc defaults 0 0
    /dev/sda1 / ext3 defaults,errors=remount-ro 0 1
    /dev/sda5 none swap sw 0 0
    /dev/hda /media/cdrom0 iso9660 ro,user,noauto 0 0
    /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
    You don't have much swap (only 126MB). And if your memory is low that could cause a problem... What's the output of
    cat /proc/meminfo
    sql-1:~# cat /proc/meminfo
    total: used: free: shared: buffers: cached:
    Mem: 263208960 256610304 6598656 0 25546752 80146432
    Swap: 82210816 0 82210816
    MemTotal: 257040 kB
    MemFree: 6444 kB
    MemShared: 0 kB
    Buffers: 24948 kB
    Cached: 78268 kB
    SwapCached: 0 kB
    Active: 59388 kB
    Inactive: 163688 kB
    HighTotal: 0 kB
    HighFree: 0 kB
    LowTotal: 257040 kB
    LowFree: 6444 kB
    SwapTotal: 80284 kB
    SwapFree: 80284 kB

    So you think I should bump up the memory? I default these VM's to have about 256mb of ram. I didn't think that the cluster would require much since its not hold much information.
    So I bumped up the memory on both sql-1 and sql-2 to 512mb of ram.

    sql-1:~# cat /proc/meminfo
    total: used: free: shared: buffers: cached:
    Mem: 528752640 223784960 304967680 0 14512128 72069120
    Swap: 82210816 0 82210816
    MemTotal: 516360 kB
    MemFree: 297820 kB
    MemShared: 0 kB
    Buffers: 14172 kB
    Cached: 70380 kB
    SwapCached: 0 kB
    Active: 40808 kB
    Inactive: 161348 kB
    HighTotal: 0 kB
    HighFree: 0 kB
    LowTotal: 516360 kB
    LowFree: 297820 kB
    SwapTotal: 80284 kB
    SwapFree: 80284 kB

    Still no change.
    So I've looked into the issue abit more, when I try to access the connectioncheck table I get the following message:

    ERROR 1105 (HY000): Failed to open 'connectioncheck', error while unpacking from engine

    Also since I'm running VM's, I always ssh to the machine and didn't realize there was an error getting printed to the console.

    DBI connect('database=ldirectordb;host=','ldirector',...) failed: Unknown database 'ldirectordb' at /etc/ha.d/resource.d/ldirector line 1950

    I saw that people were having this issue after restarting their cluster


    I wasn't sure if you've seen this before, because when you start from scratch it works, but after 1 reboot, it seems that the database somehow gets corrupted or something. I've tried dropping the database, but still doesn't work.
    Strange, I never had these problems after a reboot... Maybe you need to update your Perl-DBI module?
    The perl DBI modules are the latest version. This really sucks because it works when I first have it initially setup. It's only after a restart, the sql db seems to get corrupted and the active load balancer will start to throw the error about the connectioncheck table error.

