Hello, i make this howto and work perfect for I made this howto and running smoothly for a while. From time to time, NODE Ldirector started shooting connection to a Web server, not the more closed connections, leaving the WebServer over 20,000 Connections tcp on port 80 in timewait. So I return to normality after that restart the Web Server http://www.howtoforge.com/set-up-a-loadbalanced-ha-apache-cluster-ubuntu8.04
The error start in 8:51 and 9:02 I already had more than 5,000 one thousand connections in port 80. [APACHE] /var/log/ha-log heartbeat[4646]: 2009/01/30_17:45:25 WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 4396940 ms (> 1010 ms) before being called (GSource: 0x811b0e8) heartbeat[4646]: 2009/01/30_17:45:25 info: Gmain_timeout_dispatch: started at 4307228265 should have starte d at 4306788571 heartbeat[4646]: 2009/01/30_17:45:25 WARN: Gmain_timeout_dispatch: Dispatch function for update msgfree cou nt was delayed 4397590 ms (> 10000 ms) before being called (GSource: 0x811b1b8) heartbeat[4646]: 2009/01/30_17:45:25 info: Gmain_timeout_dispatch: started at 4307228265 should have starte d at 4306788506 heartbeat[4646]: 2009/01/30_17:45:25 WARN: Gmain_timeout_dispatch: Dispatch function for client audit was d elayed 4391760 ms (> 5000 ms) before being called (GSource: 0x811b018) heartbeat[4646]: 2009/01/30_17:45:25 info: Gmain_timeout_dispatch: started at 4307228265 should have starte d at 4306789089 heartbeat[4646]: 2009/01/31_08:51:12 info: Daily informational memory statistics heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 250/94109 ms age 0 [pid4646/MST_CONTROL] heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 7885/3053206 580260/233792 [pid4646/MST_CONTRO L] heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 596708 total malloc bytes. pid [4646/MST_CONTR OL] heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0 heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/3 ms age 172679140 [pid4677/HBFIFO] heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 315/412 30556/13710 [pid4677/HBFIFO] heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 32660 total malloc bytes. pid [4677/HBFIFO] heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0 heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/0 ms age 172559120 [pid4678/HBWRITE] heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 334/97649 33048/15406 [pid4678/HBWRITE] heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 41584 total malloc bytes. pid [4678/HBWRITE] heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0 heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/0 ms age 172559120 [pid4679/HBREAD] heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 334/386 24920/11353 [pid4679/HBREAD] heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 25004 total malloc bytes. pid [4679/HBREAD] heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0 heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/176837 ms age 1900 [pid4680/HBWRITE] heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 346/4527045 34568/16438 [pid4680/HBWRITE] heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 46624 total malloc bytes. pid [4680/HBWRITE] heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0 heartbeat[4646]: 2009/01/31_08:51:12 info: MSG stats: 0/1466 ms age 169877960 [pid4681/HBREAD] heartbeat[4646]: 2009/01/31_08:51:12 info: cl_malloc stats: 347/29731 34652/16482 [pid4681/HBREAD] heartbeat[4646]: 2009/01/31_08:51:12 info: RealMalloc stats: 36068 total malloc bytes. pid [4681/HBREAD] heartbeat[4646]: 2009/01/31_08:51:12 info: Current arena value: 0 heartbeat[4646]: 2009/01/31_08:51:12 info: These are nothing to worry about. heartbeat[4657]: 2009/02/02_11:12:28 WARN: Core dumps could be lost if multiple dumps occur. heartbeat[4657]: 2009/02/02_11:12:28 WARN: Consider setting non-default value in /proc/sys/kernel/core_patt [UltraMonkey] /var/log/ultramonkey.log [Thu Jan 29 08:53:00 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1) [Thu Jan 29 08:53:00 2009|ldirectord|4995] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1) [Thu Jan 29 08:53:00 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1) [Thu Jan 29 08:53:00 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1) [Thu Jan 29 09:41:52 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.132:80 (10.0.0.143:80) (Weight set to 0) [Thu Jan 29 09:42:02 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.132:443 (10.0.0.143:443) (Weight set to 0) [Thu Jan 29 09:42:12 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.132:10002 (10.0.0.143:10002) (Weight set to 0) [Sat Jan 31 09:05:08 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0) [Sat Jan 31 09:05:08 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0) [Sat Jan 31 09:05:21 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0) [Sat Jan 31 09:05:21 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0) [Sat Jan 31 09:05:34 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0) [Sat Jan 31 09:05:34 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0) [Sat Jan 31 09:12:17 2009|ldirectord|4995] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1) [Sat Jan 31 09:12:17 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1) [Sat Jan 31 09:12:20 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1) [Sat Jan 31 09:12:20 2009|ldirectord|4995] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1) [Sat Jan 31 09:12:25 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1) [Sat Jan 31 09:12:25 2009|ldirectord|4995] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1) [Mon Feb 2 11:17:33 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0) [Mon Feb 2 11:17:33 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 0) [Mon Feb 2 11:17:47 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0) [Mon Feb 2 11:17:47 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 0) [Mon Feb 2 11:17:50 2009|ldirectord|4995] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0) [Mon Feb 2 11:17:50 2009|ldirectord.cf|4483] Quiescent real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 0) [Mon Feb 2 11:18:07 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1) [Mon Feb 2 11:18:07 2009|ldirectord|4995] Restored real server: 10.0.0.122:80 (10.0.0.143:80) (Weight set to 1) [Mon Feb 2 11:19:44 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1) [Mon Feb 2 11:19:44 2009|ldirectord|4995] Restored real server: 10.0.0.122:10002 (10.0.0.143:10002) (Weight set to 1) [Mon Feb 2 11:19:52 2009|ldirectord|4995] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1) [Mon Feb 2 11:19:52 2009|ldirectord.cf|4483] Restored real server: 10.0.0.122:443 (10.0.0.143:443) (Weight set to 1) [Ultramonkey] root@vsrv123:/etc/ha.d# cat ldirectord.cf checktimeout=10 checkinterval=2 autoreload=no logfile="/var/log/ultramonkey.log" quiescent=yes virtual = 10.0.0.143:80 real = 10.0.0.122:80 gate real = 10.0.0.132:80 gate service = http request = "ldirector.html" receive = "Test Page" scheduler = rr protocol = tcp checktype = negotiate virtual = 10.0.0.143:443 real = 10.0.0.122:443 gate real = 10.0.0.132:443 gate service = https request = "ldirector.html" receive = "Test Page" scheduler = rr protocol = tcp checktype = negotiate virtual = 10.0.0.143:10002 real = 10.0.0.122:10002 gate real = 10.0.0.132:10002 gate service = https request = "ldirector.html" receive = "Test Page" scheduler = rr protocol = tcp checktype = negotiate
I did the following test, the shut Ultramonkeys. And even then my apache continued with the same problem. Thought it was problem with the connections that were getting in timewait. This time there are no more connection and even then he not respond to web requests. I try to "TOP" and not answered. I believe it may be some something related to memory. What do you think? Have you ever seen the "TOP" stop answer the charge?