High Availabilty with heartbeat and ldirectord

Discussion in 'Server Operation' started by tate_harmann, Jun 14, 2006.

  1. tate_harmann

    tate_harmann New Member

    I am setting up a highly available, load balancing apache cluster. I think I have everything in place, and everything works except the load balancing. Heartbeat is used for the failover and works fine. I am using source hashing as the scheduling-method for ldirectord. Ldirectord does see the two nodes as the out put of "ipvsadm -L -n" shows:
    SLES9-CLUSTER1:~ # ipvsadm -L -n
    IP Virtual Server version 1.2.0 (size=4096)
    Prot LocalAddress:port Scheduler Flags
    -> RemoteAddress:port Forward Weight ActiveConn InActConn
    TCP sh
    -> Route 1 0 0
    -> Local 1 0 0

    And when I shut down one of the boxes, they are pulled from the pool and the master will roll to the other like it is supposed to. However, the actual web request on port 80 fails when going to the non-local node ( in the above example.) It will come through fine on the local node. So about half of the web requests fail. I did enable ip forwarding, is there anything else I need to do? Oh, it is suse enterprise linux 9, and the service address gets bound to eth0 as eth0:0. I don't know if this is right, but most of the examples I found online set up the service address as lo:0.
    I can post some config files if needed.

    thank you,
  2. tate_harmann

    tate_harmann New Member

    I guess I just modified the config found here:

    All I need is step 6, everything else works. However, I did mine with only two boxes instead of four. Each has a loadbalancer and http service on it. Only one loadbalancer is active at a time, but I still want both boxes to balance the http requests. There is an article I found on doing this very setup:

    However, I needed to tweak mine a little as I am running SLES 9. I am basically using a hybrid config between the two tutorials.

  3. noahlau

    noahlau New Member

    In HA cluster, Only one loadbalancer is active at a time. The another one is standby load balancer which will be active when the primary loadbalancer is failed.
  4. falko

    falko Super Moderator Howtoforge Staff

    I only see local IP addresses in your post... :confused:
  5. tate_harmann

    tate_harmann New Member

    Sorry, what I mean is the output of the "ipvsadm -L -n" command lists the nodes as local or route:
    TCP sh
    -> Route 1 0 0
    -> Local 1 0 0

    They are all private ip addresses. is my virtual address, .77 is the active load balancer but is also an available node to recieve http requests. .78 is the other node, but I'm not sure that the loadbalancer is passing requests to that node or not. Since half of my requests were failing, I assumed the ones that failed were the ones getting forwarded to .78 and then getting dropped.
  6. tate_harmann

    tate_harmann New Member

    OK, I think I found my problem here:
    The Linux Virtual Server has three different ways of forwarding packets: Network Address Translation (NAT), IP-IP encapsulation or tunnelling and Direct Routing.

    * Direct Routing: Packets from end users are forwarded directly to the real server. The IP packet is not modified, so the real servers must be configured to accept traffic for the virtual server's IP address. This can be done using a dummy interface, or packet filtering to redirect traffic addressed to the virtual server's IP address to a local port. The real server may send replies directly back to the end user. That is if a host based layer 4 switch is used, it may not be in the return path.

    I need to set up an ip alias on my loopback (lo:0) for the apache web server to accept connections for the virtual ip address ( However, the tutorial explains how to do it in debian, do you know how it is done on SLES 9? I'll check in the meantime. Thanks,
  7. tate_harmann

    tate_harmann New Member

    That was the problem. I just did:

    ifconfig lo:0

    to add the alias, and the server started accepting requests.

  8. _stephan_

    _stephan_ New Member

    Hi there!

    So, i did like the howto described, but when i nmap the VIP, the http and mysql port is filtered.. Is there anything else i have to do? e.g. change the default route or add a new route on the realservers?

    Last edited: Aug 28, 2007
  9. _stephan_

    _stephan_ New Member

    No ideas?

    hm, i thought it would be easier to get some hints...

  10. falko

    falko Super Moderator Howtoforge Staff

    What do you mean with "filtered"?
  11. _stephan_

    _stephan_ New Member


    nmap-ing the VIP shows only the HTTP and MySQL ports as filtered. But, now it's only sometimes filtered.. So it works a couple of hours, after a restart of 1 RS, the ports change to filtered.. strange, isn't it?

  12. Tenebris

    Tenebris New Member

    Loopback alias

    I've been trying to do this and the real server loses all contact with this outside world.
    In fact, the server won't respond to any requests after I add such a loopback alias.
    Any one else here having the same issue?


  13. falko

    falko Super Moderator Howtoforge Staff

    Which distribution are you using, and which tutorial (URL) did you follow?
  14. Tenebris

    Tenebris New Member

    Re: Loopback alias

    I'm using CentOS 5 and I was following a tutorial out of several pages:

    First, the O'Reilly Book, Linux System Administrator's Guide, under the chapter for load balancers.
    Second, http://www.jedi.com/obiwan/technology/ultramonkey-rhel4.html, which followed pretty much the same logic.
    Third, http://www.ultramonkey.org/3/topologies/sl-ha-lb-eg.html

    I even used the "correction" script from http://classcast.blogspot.com/2006/12/two-node-lvs-dr-setup-on-centos.html that was supposed to solve the loopback alias problem...
    Except the the "correction" script locks out everything once it tries to raise the loopback alias. Also the correction script wants an executable that doesn't exist: /etc/ha.d/rc.d/arptables-noarp-addr_takeip. (I did a yum search for arptables and ended up installing arptables_jf, but that didn't install such an executable either).

    I've tried experimenting with different configurations out of ldirectord.cf, including changing gate to masq and (gasp!) ipip.

    I'm pretty sure my sysctl settings are correct, but here they are:
    On my load balancer:
    net.ipv4.ip_forward = 1
    net.ipv4.conf.default.rp_filter = 1
    net.ipv4.conf.default.accept_source_route = 0
    kernel.sysrq = 0
    kernel.core_uses_pid = 1
    net.ipv4.tcp_syncookies = 1
    kernel.msgmnb = 65536
    kernel.msgmax = 65536
    kernel.shmmax = 68719476736
    kernel.shmall = 4294967296

    ...and on my nodes:
    net.ipv4.ip_forward = 1
    net.ipv4.conf.default.rp_filter = 1
    net.ipv4.conf.default.accept_source_route = 0
    net.ipv4.conf.all.arp_ignore = 1
    net.ipv4.conf.eth0.arp_ignore = 1
    net.ipv4.conf.all.arp_announce = 2
    net.ipv4.conf.eth0.arp_announce = 2

    ...and my LB's ldirectord.cf is as follows:
    real= gate
    real= gate
    receive="I'm alive!"

    There is an "ldirectord.html" on each of the nodes that is successfully acknowledged... if the node is not running with a loopback alias. If I do set my node's loopback alias as follows:
    ipconfig lo:0 netmask
    ...the node stops responding to the load balancer. However, I can still hit the node from anywhere else except the load balancer.

    If I take the loopback alias down on the nodes, ldirectord says it can see the nodes, but any attempt to hit the virtual IP now times out.
  15. falko

    falko Super Moderator Howtoforge Staff

    Instead of setting up a loopback alias, you can try this on the nodes (in /etc/sysctl.conf):

    This allows the nodes (and therefore Apache) to listen to IPs that are currently not bound to them.
  16. Tenebris

    Tenebris New Member

    Tried that just now, but...

    ...still no dice. However, since then, I've noticed some interesting other behavior...

    I tried setting the LB's "checkinterval" value to 30, so that it checks to see if it can access nodes 30 seconds apart. (Or a "tick" in old MUD parlance). At this current point, the loopback interface on every node is down.

    Then I fire up ldirectord, and let it see the nodes. (If the loopback alias on the nodes is currently up, then it won't get a response from the nodes, and will flag those nodes as unavailable.)

    If I were to hit the Virtual IP from a web browser it'll time out.
    However, if I turn on the loopback aliases on the nodes right now, everything works perfectly - the requests successfully route to a random node.

    At least, until the next tick, maximum 30 seconds later, at which point, the load balancer cannot make a request of the node and marks it as being nonfunctional.

    It is almost as if the Load Balancer does forward packets to the node, but cannot receive confirmation that it has done so. ldirectord marks the node disabled after "checkinteraval" seconds have passed, because requests to the node don't come back. It is obvious that the node is listening, but is unable to respond to the LB because the node's loopback alias is set to the Virtual IP.

    Any help would be appreciated.

    (From a loopback standpoint, I don't understand how a node is ever expected to communicate with another server when the node loopback alias is set to be the same as that other server.)

    Solomon Chang
  17. adam0x54

    adam0x54 New Member

Share This Page