Today I started getting loads of Error connecting to MySQL server at localhost: Too many connections error messages on our production server. After connecting via SSH, I noticed that the server was completely swamped (load average >50). I barely managed to reboot it. After a closer examination, I found out that for the past month or so, swap usage has been persistently growing. Today it shot up to more than 2GB (see attachments) and "commited" to 5G (the server has only 512M of RAM). The strange part is that "apps" usage never exceeded 300M and it was more or less around 100M on average. The server is running ISPConfig 3.0.1.3 on Ubuntu Jaunty. Does anyone have any idea why this has been happening? I can post additional information about the server if it will help determining the cause. Any help will be much appreciated.
memory leak. any unstable software? Are you running anything from jaunty proposed or backports or any thing that is not part of jaunty? have you looked at the processes and seen what is actually eating up ram and getting pushed to swap? Do you have something writing to tmpfs?
I don't think I'm running any unstable software. It's basically a clean ISPConfig sysyem, with some extras like munin, subversion and some php modules (see attachment for a full list of packages). I have not added any repositories either. This is the full list of repositories (with deb-src counterparts removed): Code: deb http://si.archive.ubuntu.com/ubuntu/ jaunty main restricted deb http://si.archive.ubuntu.com/ubuntu/ jaunty-updates main restricted deb http://si.archive.ubuntu.com/ubuntu/ jaunty universe deb http://si.archive.ubuntu.com/ubuntu/ jaunty-updates universe deb http://si.archive.ubuntu.com/ubuntu/ jaunty multiverse deb http://si.archive.ubuntu.com/ubuntu/ jaunty-updates multiverse deb http://security.ubuntu.com/ubuntu jaunty-security main restricted deb http://security.ubuntu.com/ubuntu jaunty-security universe deb http://security.ubuntu.com/ubuntu jaunty-security multiverse As mentioned in my first post, I don't think the processes are eating up ram at all. The 'apps' memory usage is about 100M on average and does not rise over time. The only suspicious process I noticed in htop is /usr/sbin/console-kit-daemon. There are about 65 active instances, each reportedly using 1.2% of ram. How can I check this? I don't believe the problem is actually in MySQL. I think the too many connections error is a consequence rather than a reason for this problem. I can set the max_connections setting though, if you think it'll help.
Please do what I suggested and you wil see that your problem is solved. The reasom for this is simply a lot of spam spam or a similar incident which causes postfix to open up more connections then your mysql settings allow which causes a lot of waiting processes which then fill up your swap.
I've set the following settings in my.cnf: Code: max_connections = 500 max_user_connections = 500 Was there anything else you had in mind? What about postfix?
OK, have already done that, thanks. Just one observation: Could the max connection limit affect websites? The way I see it is that if postfix still tries to open up a lot of connections, legitimate connections from websites could get blocked when the limit is reached.
Just to clarify my reasoning as to why I thought MySQL was not the culprit here, but rather a consequence. As you can see in the (first two) attachments to this post, MySQL thread count was more or less stable at about 2 throughout the past month, and never exceeded 10. When the overload happened last night at around midnight it shot up to over a 100. In the third attachment (memory usage), you can see a strange thing happening. Apparently last night after I rebooted the server, swap usage grew rapidly to about 1G, then dropped at around 5 AM. P.S.: All times are GMT+2 (Paris time).
Hi, till, Changing max_connections and max_user_connections did not help. Swap usage is currently at 150M and still gradually rising at the same rate as before. Any other suggestions? I'm at a loss as to the reason for this behaviour.
The thing is, no process uses an excessive amount of memory. In fact, there is always a significant amount of free memory on the server. That's why this is so bewildering to me. Here is an excerpt from the top command (sorted by memory usage): Code: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10960 mysql 20 0 237m 45m 3744 S 0.0 9.3 76:52.11 mysqld 2447 root 20 0 244m 38m 1140 S 0.0 7.9 28:14.47 console-kit-dae 2627 root 20 0 278m 13m 7240 S 0.0 2.7 1:15.99 apache2 17871 www-data 20 0 278m 8420 1496 S 0.0 1.7 0:00.55 apache2 22976 www-data 20 0 278m 8404 1492 S 0.0 1.7 0:00.11 apache2 22977 www-data 20 0 279m 8364 1496 S 0.0 1.7 0:00.11 apache2 22276 www-data 20 0 279m 8356 1456 S 0.0 1.7 0:00.16 apache2 32108 www-data 20 0 279m 8344 1496 S 0.0 1.7 0:00.06 apache2 32107 www-data 20 0 279m 8324 1460 S 0.3 1.7 0:00.06 apache2 6905 www-data 20 0 279m 8284 1488 S 0.0 1.7 0:00.00 apache2 7404 www-data 20 0 279m 8284 1444 S 0.0 1.7 0:00.03 apache2 7279 www-data 20 0 279m 8272 1452 S 0.0 1.7 0:00.01 apache2 5044 www-data 20 0 278m 8176 1492 S 0.0 1.6 0:00.07 apache2 6906 www-data 20 0 278m 8128 1452 S 0.0 1.6 0:00.03 apache2 7407 www-data 20 0 278m 8120 1452 S 0.0 1.6 0:00.01 apache2 7410 www-data 20 0 278m 8060 1420 S 0.0 1.6 0:00.00 apache2 7408 www-data 20 0 278m 7780 1304 S 0.0 1.6 0:00.01 apache2 7409 www-data 20 0 278m 7780 1292 S 0.0 1.6 0:00.00 apache2 7402 www-data 20 0 278m 7728 1272 S 0.0 1.5 0:00.00 apache2 7403 www-data 20 0 278m 7592 1160 S 0.0 1.5 0:00.00 apache2 29980 www-data 20 0 162m 5824 464 S 0.0 1.2 0:00.02 apache2 2905 root 20 0 55100 4132 1080 S 0.0 0.8 0:39.71 fail2ban-server 29979 root 20 0 20304 3804 1892 S 0.0 0.8 0:08.65 vlogger 21009 root 20 0 76688 3452 2696 S 0.0 0.7 0:00.04 sshd 26665 root 20 0 76688 3452 2696 R 0.0 0.7 0:00.07 sshd 7384 postfix 20 0 56576 3256 2508 S 0.0 0.7 0:00.00 smtp 21399 root 20 0 30692 3108 1964 S 0.0 0.6 0:00.02 mc 2759 root 20 0 40552 2312 644 S 0.0 0.5 0:15.48 munin-node 26814 root 20 0 20080 2212 1540 S 0.0 0.4 0:00.03 bash 2158 nobody 20 0 27756 2208 528 S 0.0 0.4 0:17.54 mydns 21017 root 20 0 20080 2208 1540 S 0.0 0.4 0:00.01 bash 7387 postfix 20 0 39136 2180 1712 S 0.0 0.4 0:00.01 bounce 21401 root 20 0 20064 2176 1528 S 0.0 0.4 0:00.05 bash 21474 postfix 20 0 39104 2136 1680 S 0.0 0.4 0:00.00 pickup 7294 root 20 0 18984 1308 988 R 0.3 0.3 0:00.18 top 12292 root 20 0 76688 1032 808 S 0.0 0.2 0:05.58 sshd 23820 root 20 0 76688 1032 808 S 0.0 0.2 0:05.58 sshd 1842 messageb 20 0 22596 868 420 S 0.0 0.2 0:45.53 dbus-daemon I've also created a temporary login for munin here: http://munin.protobit.net/protobit.net/prod.protobit.net.html username: test password: test You can check the graphs for yourself, to see if there's anything out of the ordinary.
It does to me as well. What I can't get my head around is why swap is being used, when there's plenty of RAM left. Should I try to reduce MaxClients in apache2.conf? I think this shouldn't be a problem, since the server hosts no site with very high traffic. It is currently set to 150 (the default). Any other ideas?
Hi Mrm .. take a look at your actual 'swappiness' kernel parameter .. # cat /proc/sys/vm/swappiness I don't know your distro .. (debian default is 60) .. maybe you can test a lower value (for a day or two) setting to 20 (or 0 ..even better) # echo "0" > /proc/sys/vm/swappiness (These settings are applied instantly by the kernel and are not persistent after a reboot) ..you asked for 'another idea' ..here you have Bye.. bajodel.
Hi, bajodel, I do appreciate any ideas at this point. First of all, let me say that I was not aware this setting existed. After reading up on it a bit, I'm pretty sure I understand what it does. So, if I understand it correctly, I'm not sure how it could have an effect in my situation, where the swap size gradually and persistently grows over time: Again, if I understand it correctly, increasing this value should increase swap usage (with the same memory usage), while decreasing it should decrease swap usage. However, by my understanding, this setting should not affect the growth of swap over time. But, as I said at the beginning, I do appreciate every idea, so I'm going to set it to 10 and report back in a couple of days, when I see if it will have had any effect.
The "Swap FAQ" you have read look like a 'simple explanation' ..but ..yes .. that parameter is (in few words) the kernel "trend" (over time) to swap. Just try it .. but '0' is better in my opinion. I've re-read the entire post ..your distro is ubuntu (server) and the default is the same ad debian (60) .. which is quite good, but i've also experienced more swap than what i was used to on my (debian) test server with ISPConfig on-board I've also experienced bad performances related to clamav scanning .. in particular if you have gunzipped attachmentes (cpio is the hungry app). Try to understand which app is swapped: # top then press: SHIFT-O then press: P <enter> you can see sorted apps for swap usage .. copy & paste here. Bye.. bajodel.
OK, I've changed it to 0. EDIT: Should the swap usage decrease without rebooting the server? Or will it only stop growing (in case this setting helps)? The default was indeed 60. I think the problem here isn't so much that swap is being used, the more serious problem is that it keeps growing. This is what I really don't understand. Since this is basically not a mail server (it only has postfix installed for the websites to use it), I have disabled clamav, spamassassin, pop3 and imap. Especially clamav was indeed a huge memory hog, which was the primary reason I disabled it. Code: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND 5761 www-data 20 0 278m 8516 1488 S 0.0 1.7 0:00.41 270m apache2 5652 www-data 20 0 279m 8728 1544 S 0.0 1.7 0:00.39 270m apache2 5731 www-data 20 0 279m 8716 1536 S 0.0 1.7 0:00.37 270m apache2 4456 www-data 20 0 279m 8732 1544 S 0.0 1.7 0:00.52 270m apache2 6223 www-data 20 0 278m 8536 1496 S 0.0 1.7 0:00.33 270m apache2 6347 www-data 20 0 278m 8576 1488 S 0.0 1.7 0:00.36 270m apache2 6229 www-data 20 0 278m 8332 1488 S 0.3 1.7 0:00.32 270m apache2 6225 www-data 20 0 278m 8408 1516 S 0.0 1.7 0:00.32 270m apache2 5780 www-data 20 0 278m 8520 1532 S 0.0 1.7 0:00.40 270m apache2 1867 www-data 20 0 278m 8496 1548 S 0.0 1.7 0:00.36 270m apache2 16034 root 20 0 278m 14m 8700 S 0.0 3.1 0:15.43 263m apache2 2447 root 20 0 259m 37m 1140 S 0.0 7.6 35:49.74 221m console-kit-dae 10960 mysql 20 0 237m 45m 3788 S 0.7 9.4 84:01.18 191m mysqld 25706 www-data 20 0 162m 5816 468 S 0.0 1.2 0:00.01 156m apache2 26718 postfix 20 0 106m 5188 3840 S 0.0 1.0 0:00.01 101m smtpd 12292 root 20 0 76688 1012 788 S 0.0 0.2 0:06.99 73m sshd 23820 root 20 0 76688 1012 788 S 0.0 0.2 0:06.95 73m sshd 25557 root 20 0 76688 3444 2696 R 0.0 0.7 0:00.10 71m sshd 2294 root 20 0 56428 332 328 S 0.0 0.1 0:00.05 54m saslauthd 2295 root 20 0 56428 332 328 S 0.0 0.1 0:00.04 54m saslauthd 2299 root 20 0 56428 332 328 S 0.0 0.1 0:00.02 54m saslauthd 2301 root 20 0 56428 332 328 S 0.0 0.1 0:00.02 54m saslauthd 2302 root 20 0 56428 332 328 S 0.0 0.1 0:00.03 54m saslauthd 2905 root 20 0 55100 3336 1016 S 0.0 0.7 0:44.08 50m fail2ban-server 8520 postfix 20 0 52220 1668 1020 S 0.0 0.3 0:00.03 49m qmgr 1861 root 20 0 48940 396 284 S 0.0 0.1 0:00.23 47m sshd 8894 postfix 20 0 41612 2156 1248 S 0.0 0.4 0:00.04 38m tlsmgr 2759 root 20 0 40552 2156 644 S 0.0 0.4 0:17.27 37m munin-node 2819 postfix 20 0 39104 2132 1680 S 0.0 0.4 0:00.00 36m pickup 26734 postfix 20 0 39104 2136 1684 S 0.0 0.4 0:00.01 36m showq 2230 root 20 0 37048 724 472 S 0.0 0.1 0:11.92 35m master 30926 root 20 0 31796 384 316 S 0.0 0.1 0:01.50 30m pure-ftpd-mysql 2157 nobody 20 0 26192 308 200 S 0.0 0.1 0:00.36 25m mydns 2158 nobody 20 0 27756 2256 520 S 0.0 0.5 0:19.08 24m mydns 1842 messageb 20 0 22596 836 420 S 0.0 0.2 0:49.90 21m dbus-daemon 14166 ntp 20 0 21384 1188 776 S 0.0 0.2 0:00.11 19m ntpd 2403 root 20 0 19972 536 384 S 0.0 0.1 0:11.72 18m cron 12350 root 20 0 19056 304 300 S 0.0 0.1 0:00.04 18m bash I don't think I quite understand the swap column here. What does 270m mean? 270MB? Surely not?!
(in case) ..i think it should stop growing.. at least. No reboot is required. But ..if you want to test the 'trend' from the initial status (low swap) you can: # echo "0" > /proc/sys/vm/swappiness (mod swappiness behaviour) # sync (recommended before drop cached memory) # echo "3" > /proc/sys/vm/drop_caches (drop chached memory) # swapoff -a (disable swap) # swapon -a (re-enable swap) It's heavy to swallow for your server ..but i think it's (quite) equivalent to rebooting. At worst you can cron that in a script if you cannot find a solution For 'top' the default view is kb (when not explicit) ..in your case is surely mb .. but consider: p: SWAP -- Swapped size (kb) The swapped out portion of a taskâs total virtual memory image. o: VIRT -- Virtual Image (kb) The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out. VIRT = SWAP + RES. q: RES -- Resident size (kb) The non-swapped physical memory a task has used. RES = CODE + DATA. Bye.. bajodel.