I have this server that serves some media ads (graphics files) with heavy traffic and a few websites that is not heavy traffic. I have also installed ISPconfig 3 (version 3.0.2.2) on it. Using ubuntu 8.04.4. Lately, this server is very unresponsive. I don't know what happened cause suddenly the server's http is very unresponsive. I tried looking into all the log files but was unable to find the solution. Apache service is up and down. One thing I found in the log is apache saying some error about maxclients reached. Ask me to increase. However, the maxclients is already at 700. I increased it to 800 and then 1000 and I still get the error after only mere minutes of restarting apache service. Then I look into apache server-status and see lots of thread in apache that is in "K". It means still in keepalive. That is probably why the maclients used up so quickly. Then I turned keepalive OFF. No more apache PID in "K" but http is still up and down. Anyone have any idea what else to look? Is it someone attacking my apache service?
Hi! How is your I/O? What HDDs are you using? What is the apache memory footprint (an the total memory) and are you using prefork or worker mpm?
I/O seems to be fine: I am not sure what HDD we are using but this is a server hosted in a datacenter. Should be SAS or SCSI. Using default apache2 that comes with ubuntu 8.04. Should be using prefork MPM. Memory footprint is around 9-11mb. Total memory is 4GB. As of today, server is more responsive with my SSH sessions (no more timeout on SSH). And typing in commands also more responsive. However, apache server is still being detected as hit and miss. We have a loadbalancer (haproxy) running and also nagios that keep detecting this server to not respond to http every few clicks. BTW, There is also this error in apache error log: [Tue Aug 09 16:24:23 2011] [notice] mod_fcgid: process /var/www/domain.com/web/wing/index.php(21176) exit(server exited), terminated by calling exit(), return code: 0 Could that be the reason for the problem? As far as I have read, this is not a serious problem. I did changed that domain to using mod-php instead which stop this notice. But problem still persist.
ssh timeouts, check your interfaces Code: ifconfig | grep errors see if they give a lot of errors check dmesg as well and as Falko said, install munin to track everything.
ssh seems to be back to being stable now. So no errors in ifconfig. Narrowed down the problem to being http only. Server load is not even high. Barely 0.23. I am really pulling my hair on this one (not like I have much hair left). Already installed munin and will wait for the data to come in. Any other suggestions?
Have you tried the suggestions from this tutorial? http://www.howtoforge.com/configuring_apache_for_maximum_performance
here is the login to munin server. URL is http://174.143.149.122/munin/vn239/vn239.html. The login is tempo and tempo112. Please take a look and let me know. Notice the broken graph because the server connection is not very good. Not sure if it is due to the existing problem with the http or the server's connection generally. If I put the munin master in the same server, it will be worst as I will have problem loading the graph as the http is keep giving timeout.
Do not think it's apache performance issue. Cause you can see from the munin graph, apache is not exactly maxed out. Plus we have setup a lot of servers with similar apache settings.
No output from that command. We have moved out one of the website that is having the highest traffic. The apache services seems to be back to normal. Actually there are virtually no traffic right now. HTTP is responding as it should. So problem is definitely caused by this website having too high traffic? If you look at the munin graph, you can see the apache process dipped to almost zero. That is the moving out of the website. How is it possible if that site is causing load, there would have been cpuload and memory load and i/o load and so on. The thing is, the website with that traffic has been running fine for 1 year already. We moved out the site to another server very similar to this. And the server can handle this site no problem. Only different is that it does not run ISPconfig 3. Now, I am not blaming ISPconfig but just want to put everything on the table and view from all perspective. Any idea on how to proceed to solve this problem is very much appreciated. Here is the web traffic for both of the most active website in the server: Code: Domain This month Last month This year Last year vn.domain.com 5 821 MB 906 MB 31 535 MB 27 567 MB asvn.domain.com 12 100 MB 55 765 MB 320 022 MB 0 MB The asvn.domain.com, we have moved out 2 days ago. Still have the same problem. vn.domain.com just moved out 6-7 hours ago. sysctl.conf Code: net.ipv6.conf.eth0.autoconf = 0 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_no_metrics_save=1 net.core.netdev_max_backlog = 2500 net.ipv4.ip_local_port_range = 1024 65535 fs.file-max = 65555 net.ipv4.netfilter.ip_conntrack_max = 131072
Looking at your graphs now, i still see gaps after you moved the busy site. I'm putting my money on a broken network adapter.
ah..thanks for this suggestion...made me realize maybe the hosting provider is rate limiting our bandwidth. Let me check with them for mrtg graph. Also, how do I justify to the hosting provider that the network adapter is giving problem? Is there a way to check from my side? Edit: I just got the MRTG graph from my hosting company. Third graph (FastEthernet0/10) is the server which I said is having the problem. First (FastEthernet0/14) and second (FastEthernet0/9) graph is the other two servers that is starting to give problem now. Looks like traffic is flying off the roof up to 80mb. But it looks very suspicious because it just spike up and down. Any idea?
The spikes are weird.. are you having crons run at those times? The spikes are blue on the switch, that's outgoing for the switch on a FastEthernet port, that's max 100mbit going towards your server. So there's your problem. Your hitting your maximum port speed. Now to find out why! You could install iptraf and keep it opened up in a screen (in case it spikes again) .. when the connection is back, check iptraf to see who send you all the traffic and on which port.
I think you have too much iowait (see http://174.143.149.122/munin/vn239/vn239-cpu.html ). Normally you shouldn't see iowait at all. Also, it might be possible to optimize MySQL. You have lots of selects, inserts, updates, and deletes...
I thought of that as well, iowait can occur from almost everything else that the process needs to wait on .. a MySQL reply, a file operation, a slow nfs mount, a busy network card, etc etc .. he's topping his max switchport speed, i bet he moved the busy website to the machine that's attached to Fa0/9 .. max 97mbit. maybe some script kiddie is dossing him since it's OUT on the switch port, so IN for the server.
wow! lots of suggestions. Let's take them one by one. yes, we do have some rsync scripts running that will rsync data into the servers. These are mostly media files that we use for serving users. This rsync does run between different servers in different countries. But it has been running for a year and nothing changed recently. No reason for it to suddenly spike the traffic. Our mysql an apache are actually already optimized. This has been implemented regionwide in servers around other parts of Asia. However, not discounting more improvement can be made. As for I/O wait, what else can be done beside changing to a faster HDD? Implement faster file system? Anyway, we have completely restructured our system in this 3 servers. The Haproxy is now balancing between all three servers. All running in multi backend balancing with ACLs. Theorically problem should still be there because same traffics are being directed into same three servers. Do not understand why it is okay now. However, will ask for MRTG graph tomorrow to see if traffic trend has changed. BTW, speaking of haproxy. We used the haproxy howto in this website. Totally great stuffs. Thanks to howtoforge team.
What version of mysql are you running? There are some bugs in 5.5.12 + that won't show any load issues, but will kill your server. As a test, put up phpinfo.php on a domain (script with no db access), and see how fast response is. If it's normal, then chances are mysql is your issue. I jumped in because this is my issue on Fedora 15...time to downgrade.
There are some measures you can take. For example, you could use tmpfs ( http://www.howtoforge.com/storing-files-directories-in-memory-with-tmpfs ) for you cache directories (if you do caching) or use memcached to store your cache in memory. You could switch off access logging in Apache if you don't need it (for example if you use Google analytics), but I don't recommend to switch off error logging. Disable .htaccess files by setting AllowOverride None and placing the .htaccess directives directly in your vhosts. Allow browsers to cache static files (like images, css, js) so that they don't have to fetch it from your server after the first access (see http://www.howtoforge.com/make-brow...es-with-mod_expires-on-apache2-debian-squeeze ).
We are using latest version of mysql that comes with ubuntu 8.04 which is 5.0.51a-3ubuntu5.8. This should be okay as we have are using this version on a lot of our other servers. We do not use edge version of packages unless it is a feature that we need.