Hello. I have a question about the great HOW-TO article about setting up a high availability Apache cluster. I am following this tutorial, but I am using Ubuntu 6.07 LTS Server instead of Debian Sarge. It has the necessary virtual server support in the kernel. The tutorial sets up the cluster entirely on a local, publicly inaccessible network. I'm trying to put the load balancers on the public network, with "real" IPs. What I am not clear on is exactly how the cluster servers should be networked, given that the load balancers need to be publicly accessible using a "real" IP address, while the cluster nodes themselves need to be on a local 192.168.0.XXX network. I have my Internet connection going to a router, then to my two load balancers on eth0 (NIC 1) Then, on eth1 (NIC 2) on each load balancer, I have network cables going to another router. Two Apache nodes are plugged in to that router. Is this the proper physical network setup? The tutorial didn't cover this, and I haven't been able to find it spelled out on the Linux Virtual Server site either. If it isn't proper, can someone tell me the way to network the cluster using a real address and local network?
Output from tests on page 3 Presently, pinging or SSHing to my "real" ip address works for about a minute, then stops. Then starts again a few minutes later, then stops and repeats. It looks like ldirectord is exiting. Browsing to the "real" address during that tune when ldirectord is running never gets a web page response. Using telnet to port 80, I can see that, while I am on the load balancer console, the web server on the load balancer and two clusters respond to requests. Code: # telnet 192.168.0.101 80 Trying 199.32.87.62... Connected to 199.32.87.62. Escape character is '^]'. GET /ldirector.html {RETURN KEY PRESSED} Test Page However, if I put a unique document on the two web cluster nodes and try to request that through the real IP (loadbalancer public address), they are not found. Evidently it is not passing the request to the cluster nodes. Code: # telnet 192.168.0.101 80 Trying 199.32.87.62... Connected to 199.32.87.62. Escape character is '^]'. GET /different_file.html {RETURN KEY PRESSED} {404 error message from web server returned} 199.32.87.62 is the public address (altered for posting). 199.32.87.254 is the gateway. 192.168.0.101 = cluster 1 192.168.0.102 = cluster 2 192.168.0.103 = loadb1 192.168.0.104 = loadb2 [highlight]Here is the output from page three of the tutorial.[/highlight] Code: loadb1:~$ ip addr sh 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:11:43:35:e1:a2 brd ff:ff:ff:ff:ff:ff inet 192.168.0.103/24 brd 192.168.0.255 scope global eth0 inet6 fe80::211:43ff:fe35:e1a2/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:11:43:35:e1:a4 brd ff:ff:ff:ff:ff:ff inet 199.32.87.62/24 brd 199.32.87.254 scope global eth1 inet6 2001:18e8:2:330:211:43ff:fe35:e1a4/64 scope global dynamic valid_lft 2591985sec preferred_lft 604785sec inet6 fe80::211:43ff:fe35:e1a4/64 scope link valid_lft forever preferred_lft forever 4: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 Code: loadb1:~$ sudo -s Password: root@loadb1:~# ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 199.32.87.62:80 rr -> 192.168.0.101:80 Route 1 0 0 -> 192.168.0.102:80 Route 1 0 0 Code: root@loadb1:~# ldirectord ldirectord.cf status ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 4288 Code: loadb1:~$ /etc/ha.d/resource.d/LVSSyncDaemonSwap master status master running (ipvs_syncmaster pid: 4427) Code: loadb1:~$ cd /etc/ha.d loadb1:/etc/ha.d$ cat ha.cf logfacility local0 bcast eth0 # Linux mcast eth0 225.0.0.1 694 1 0 auto_failback off node loadb1 node loadb2 respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster loadb1:/etc/ha.d$ cat haresources loadb1 \ ldirectord::ldirectord.cf \ LVSSyncDaemonSwap::master \ IPaddr2::199.32.87.62/24/eth1/199.32.87.254 Code: loadb1:/etc/ha.d$ cat ldirectord.cf checktimeout=10 checkinterval=2 autoreload=no logfile="/var/log/ldirector-local0" quiescent=yes virtual=199.32.87.62:80 real=192.168.0.101:80 gate real=192.168.0.102:80 gate fallback=127.0.0.1:80 gate service=http request="ldirector.html" receive="Test Page" scheduler=rr protocol=tcp checktype=negotiate
ldirectord log output Code: loadb1:/etc/ha.d$ cat /var/log/ldirector-local0 [Mon Jul 9 14:05:02 2007|ldirectord.cf] Removed real server: 192.168.0.101:80 ( x 199.32.87.62:80 [Mon Jul 9 14:05:02 2007|ldirectord.cf] Removed real server: 192.168.0.102:80 ( x 199.32.87.62:80 [Mon Jul 9 14:05:02 2007|ldirectord.cf] Removed virtual server: 199.32.87.62:80 [Mon Jul 9 14:05:02 2007|ldirectord.cf] Linux Director Daemon terminated on signal: TERM [Mon Jul 9 14:07:02 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:07:02 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:07:33 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:07:33 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:07:34 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:07:34 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:07:35 2007|ldirectord.cf] Starting Linux Director v1.77.2.36 as daemon [Mon Jul 9 14:07:35 2007|ldirectord.cf] Added virtual server: 199.32.87.62:80 [Mon Jul 9 14:07:35 2007|ldirectord.cf] Added fallback server: 127.0.0.1:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:07:35 2007|ldirectord.cf] Quiescent real server: 192.168.0.102:80 mapped from 192.168.0.102:80 ( x 199.32.87.62:80) (Weight set to 0) [Mon Jul 9 14:07:35 2007|ldirectord.cf] Quiescent real server: 192.168.0.101:80 mapped from 192.168.0.101:80 ( x 199.32.87.62:80) (Weight set to 0) [Mon Jul 9 14:07:35 2007|ldirectord.cf] Restored real server: 192.168.0.101:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:07:35 2007|ldirectord.cf] Deleted fallback server: 127.0.0.1:80 ( x 199.32.87.62:80) [Mon Jul 9 14:07:35 2007|ldirectord.cf] Restored real server: 192.168.0.102:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:11:07 2007|ldirectord.cf] Removed real server: 192.168.0.101:80 ( x 199.32.87.62:80 [Mon Jul 9 14:11:07 2007|ldirectord.cf] Removed real server: 192.168.0.102:80 ( x 199.32.87.62:80 [Mon Jul 9 14:11:07 2007|ldirectord.cf] Removed virtual server: 199.32.87.62:80 [Mon Jul 9 14:11:07 2007|ldirectord.cf] Linux Director Daemon terminated on signal: TERM [Mon Jul 9 14:12:49 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:12:49 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:12:49 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:12:49 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:12:50 2007|ldirectord.cf] Starting Linux Director v1.77.2.36 as daemon [Mon Jul 9 14:12:50 2007|ldirectord.cf] Added virtual server: 199.32.87.62:80 [Mon Jul 9 14:12:50 2007|ldirectord.cf] Added fallback server: 127.0.0.1:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:12:50 2007|ldirectord.cf] Quiescent real server: 192.168.0.102:80 mapped from 192.168.0.102:80 ( x 199.32.87.62:80) (Weight set to 0) [Mon Jul 9 14:12:50 2007|ldirectord.cf] Quiescent real server: 192.168.0.101:80 mapped from 192.168.0.101:80 ( x 199.32.87.62:80) (Weight set to 0) [Mon Jul 9 14:12:50 2007|ldirectord.cf] Restored real server: 192.168.0.101:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:12:50 2007|ldirectord.cf] Deleted fallback server: 127.0.0.1:80 ( x 199.32.87.62:80) [Mon Jul 9 14:12:50 2007|ldirectord.cf] Restored real server: 192.168.0.102:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:13:18 2007|ldirectord.cf] Configuration file '/etc/ha.d/ldirectord.cf' has changed on disk [Mon Jul 9 14:13:18 2007|ldirectord.cf] - ignore new configuration [Mon Jul 9 14:13:27 2007|ldirectord.cf] Removed real server: 192.168.0.101:80 ( x 199.32.87.62:80 [Mon Jul 9 14:13:27 2007|ldirectord.cf] Removed real server: 192.168.0.102:80 ( x 199.32.87.62:80 [Mon Jul 9 14:13:27 2007|ldirectord.cf] Removed virtual server: 199.32.87.62:80 [Mon Jul 9 14:13:27 2007|ldirectord.cf] Linux Director Daemon terminated on signal: TERM [Mon Jul 9 14:15:25 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:15:25 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:15:56 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:15:56 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:15:56 2007|ldirectord.cf] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Mon Jul 9 14:15:56 2007|ldirectord.cf] Exiting with exit_status 3: Exiting from ldirectord status [Mon Jul 9 14:15:57 2007|ldirectord.cf] Starting Linux Director v1.77.2.36 as daemon [Mon Jul 9 14:15:57 2007|ldirectord.cf] Added virtual server: 199.32.87.62:80 [Mon Jul 9 14:15:57 2007|ldirectord.cf] Added fallback server: 127.0.0.1:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:15:57 2007|ldirectord.cf] Quiescent real server: 192.168.0.102:80 mapped from 192.168.0.102:80 ( x 199.32.87.62:80) (Weight set to 0) [Mon Jul 9 14:15:57 2007|ldirectord.cf] Quiescent real server: 192.168.0.101:80 mapped from 192.168.0.101:80 ( x 199.32.87.62:80) (Weight set to 0) [Mon Jul 9 14:15:57 2007|ldirectord.cf] Restored real server: 192.168.0.101:80 ( x 199.32.87.62:80) (Weight set to 1) [Mon Jul 9 14:15:57 2007|ldirectord.cf] Deleted fallback server: 127.0.0.1:80 ( x 199.32.87.62:80) [Mon Jul 9 14:15:57 2007|ldirectord.cf] Restored real server: 192.168.0.102:80 ( x 199.32.87.62:80) (Weight set to 1) Code: loadb1:/etc/ha.d$ tail /var/log/messages Jul 9 14:15:57 loadb1 heartbeat: info: /sbin/ip -f inet addr add 199.32.87.62/24 brd 199.32.87.254 dev eth1 Jul 9 14:15:57 loadb1 heartbeat: info: /sbin/ip link set eth1 up Jul 9 14:15:57 loadb1 kernel: [42949421.020000] ADDRCONF(NETDEV_UP): eth1: link is not ready Jul 9 14:15:57 loadb1 heartbeat: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-199.32.87.62 eth1 199.32.87.62 auto 199.32.87.62 ffffffffffff Jul 9 14:15:57 loadb1 kernel: [42949421.070000] NET: Registered protocol family 17 Jul 9 14:16:00 loadb1 kernel: [42949424.000000] tg3: eth1: Link is up at 10 Mbps, half duplex. Jul 9 14:16:00 loadb1 kernel: [42949424.000000] tg3: eth1: Flow control is off for TX and off for RX. Jul 9 14:16:00 loadb1 kernel: [42949424.020000] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready Jul 9 14:16:07 loadb1 heartbeat[4097]: info: Local Resource acquisition completed. (none) Jul 9 14:16:07 loadb1 heartbeat[4097]: info: local resource transition completed.
Ask your provider (where you have your servers) to help you with this. They should be able to give you a virtual IP address.
Thanks for your reply! I'm having the following configuration: Two real servers with vmware server installed. I'm just looking at one of the two servers: 1st Real server (host): - 1st interface: public IPs, connected to the internet over eth0 - 2nd interface: 1 private IP (192.168.1.2) connected over a switch to the other virtual server host I followed your tutorial, but used the configuration like this: two virtual clients : loadb1 (192.168.0.11) and loadb2 (192.168.0.12) bridged to eth1 (local net) and instead of the local virtual IP a public virtual IP bridged to eth0 two virtual clients: web1 (192.168.0.61) and web2 (192.168.0.62) bridged to eth1 (local net) I can access websites on web1/2 on the local net directly. But I can't connect to the webservers using the load balancer over the public IP. Configuration on load1: (load2 the same just the other local ip) /etc/ha.d/haresources: Code: load1 \ ldirectord::ldirectord.cf \ LVSSyncDaemonSwap::master \ IPaddr2::aaa.aaa.aaa.aaa/24/eth0 \ IPaddr2::bbb.bbb.bbb.bbb/24/eth0 /etc/ha.d/ldirectord.cf: Code: checktimeout=10 checkinterval=2 autoreload=no logfile="local0" quiescent=yes virtual=aaa.aaa.aaa.aaa:80 real=192.168.0.61:80 gate real=192.168.0.62:80 gate fallback=127.0.0.1:80 gate service=http request="ldirector.html" receive="Test Page" scheduler=wrr protocol=tcp checktype=negotiate virtual=bbb.bbb.bbb.bbb:80 real=192.168.0.61:80 gate real=192.168.0.62:80 gate fallback=127.0.0.1:80 gate service=http request="ldirector.html" receive="Test Page" scheduler=wrr protocol=tcp checktype=negotiate Where aaa.aaa.aaa.aaa and bbb.bbb.bbb.bbb are public IPs. sysctl -p: Code: net.ipv4.ip_forward = 1 ipvsadm: Code: IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP bbb.bbb.bbb.bbb:www wrr -> 192.168.0.62:www Route 1 0 0 -> 192.168.0.61:www Route 1 0 0 TCP aaa.aaa.aaa.aaa:www wrr -> 192.168.0.62:www Route 1 0 0 -> 192.168.0.61:www Route 1 0 0 Connection from a client (internet) over the load balancer's public IP ipvsadm -L -c -n: Code: IPVS connection entries pro expire state source virtual destination TCP 00:51 SYN_RECV clientip:61625 aaa.aaa.aaa.aaa:80 192.168.0.62:80 The state "SYN_RECV" never changes and the client gets a timeout. Configuration on web1/2: ifconfig: Code: eth0 Link encap:Ethernet HWaddr 00:0C:29:25:96:2E inet addr:192.168.0.61 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe25:962e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1671007 errors:0 dropped:0 overruns:0 frame:0 TX packets:1352975 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:148462185 (141.5 MiB) TX bytes:166755807 (159.0 MiB) Interrupt:177 Base address:0x1400 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) lo:0 Link encap:Local Loopback inet addr:aaa.aaa.aaa.aaa Mask:255.255.255.255 UP LOOPBACK RUNNING MTU:16436 Metric:1 lo:1 Link encap:Local Loopback inet addr:bbb.bbb.bbb.bbb Mask:255.255.255.255 UP LOOPBACK RUNNING MTU:16436 Metric:1 sysctl -p: Code: net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.eth0.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 net.ipv4.conf.eth0.arp_announce = 2 The gateway for the webservers: route: Code: Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.0.0 * 255.255.255.0 U 0 0 0 eth0 default 192.168.0.1 0.0.0.0 UG 0 0 0 eth0 when the client connects: netstat: Code: Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 aaa.aaa.aaa.aaa:www clientIP:61676 SYN_RECV And here, it stays in the "SYN_RECV" state, too. In my opinion the packets from the client are forwarded over the loadbalancer to the webserver. But here apache doesn't get the packets... I don't know how to check each step, to evaluate the error... Do you have any clue what the problem could be? Thank you very much and best regards Markus
Hi I got it to work. LVS with two Interfaces is only possible with LVS-NAT. I still have one problem adding the gateways over heartbeat (IPaddr2) for the external interface. Regrads Markus
Same problem Hi Markus, I am interested in your solution, because I am having the same problem here. I do not have a seperate firewall, but my loadbalancers have the public ips. I have 2 webservers behind them and configured them with private ips as well as with the virtual public ips on lo. Unfortuanetly I only get http access if the have also a public interface configured. All the rest is the same as yours. Can you give me a hint? Thanks, Andreas
Set gateway and iptables so the lb forwards traffic Hi My problem was that I forgot to set the gateway for the public interfaces, as the loadbalancer heartbeat config only sets the one for the private interface. So when the server came up I wrote: Code: route add default gw aaa.bbb.ccc.ddd eth1 where as aaa.bbb.ccc.ddd is the gateway and eth1 the interface alias of the public interface. My problem then was, that I haven't got internet connection on the virtual webservers (they only have a private network) and I didn't want to set up a gateway just for them. So I used the active load balancers private ip as gateway for the virtual servers. By typing the following ipchains command: Code: iptables -t nat -A POSTROUTING -j MASQUERADE -s 192.168.0.0/24 To forward traffic in the 192.168.0.0 net. Just ask if you have more questions. regards
Thanks a lot, this works for me as well. To summarize: There are 2 possiblities for a setup: 1. 2 interfaces, 1 public, 1 private. The loadbalancer (lb) forwards te request to the webserver and the webserver sends its answer directly through the public interface. In this case the webserver needs a public IP and can therefore be accessed from outside (although you can close all incoming ports via iptables). In this case you have to assign the virtual public IPs to the loopback device. 2. LVS-NAT: In this case the request will be sent through the loadbalancer (as you described above). Here, I do not need any publi interfaces for the webserver. I guess, I also do not need to assign the public virtual IP to the loopback interface, right? Thanks, Andreas
Ah, another question: Does the passive loadbalancer overtake the private IP from the active one in case of a failover? Otherwise you will loose your gateway then ...
hi 1. That's correct, imho. 2. Dito Yes it overtakes the public ip, as you configured it in heartbeat. But you can enable routing and add the default gateway to the passive load balancer at the beginning. So if anything goes wrong with the active loadbalancer it should work if the public ip is overtaken by the passive load balancer. I can not tell you if it's really working, because I stopped using this configuration in a productive environment over vmware. I had difficulties with the loadbalancer, it stopped sharing the traffic, but hearbeat wasn't aware of it... So no access to the websites even with two loadbalancers... regards