Swap usage issue

Discussion in 'General' started by MrM, Aug 11, 2009.

  1. MrM

    MrM New Member

    Today I started getting loads of Error connecting to MySQL server at localhost: Too many connections error messages on our production server. After connecting via SSH, I noticed that the server was completely swamped (load average >50). I barely managed to reboot it.

    After a closer examination, I found out that for the past month or so, swap usage has been persistently growing. Today it shot up to more than 2GB (see attachments) and "commited" to 5G (the server has only 512M of RAM). The strange part is that "apps" usage never exceeded 300M and it was more or less around 100M on average.

    The server is running ISPConfig 3.0.1.3 on Ubuntu Jaunty. Does anyone have any idea why this has been happening?

    I can post additional information about the server if it will help determining the cause.

    Any help will be much appreciated.
     

    Attached Files:

  2. syadnom

    syadnom New Member

    memory leak. any unstable software?

    Are you running anything from jaunty proposed or backports or any thing that is not part of jaunty?

    have you looked at the processes and seen what is actually eating up ram and getting pushed to swap?

    Do you have something writing to tmpfs?
     
  3. till

    till Super Moderator

  4. MrM

    MrM New Member

    I don't think I'm running any unstable software. It's basically a clean ISPConfig sysyem, with some extras like munin, subversion and some php modules (see attachment for a full list of packages). I have not added any repositories either. This is the full list of repositories (with deb-src counterparts removed):
    Code:
    deb http://si.archive.ubuntu.com/ubuntu/ jaunty main restricted
    deb http://si.archive.ubuntu.com/ubuntu/ jaunty-updates main restricted
    deb http://si.archive.ubuntu.com/ubuntu/ jaunty universe
    deb http://si.archive.ubuntu.com/ubuntu/ jaunty-updates universe
    deb http://si.archive.ubuntu.com/ubuntu/ jaunty multiverse
    deb http://si.archive.ubuntu.com/ubuntu/ jaunty-updates multiverse
    deb http://security.ubuntu.com/ubuntu jaunty-security main restricted
    deb http://security.ubuntu.com/ubuntu jaunty-security universe
    deb http://security.ubuntu.com/ubuntu jaunty-security multiverse
    As mentioned in my first post, I don't think the processes are eating up ram at all. The 'apps' memory usage is about 100M on average and does not rise over time. The only suspicious process I noticed in htop is /usr/sbin/console-kit-daemon. There are about 65 active instances, each reportedly using 1.2% of ram.

    How can I check this?

    I don't believe the problem is actually in MySQL. I think the too many connections error is a consequence rather than a reason for this problem. I can set the max_connections setting though, if you think it'll help.
     

    Attached Files:

  5. till

    till Super Moderator

    Please do what I suggested and you wil see that your problem is solved. The reasom for this is simply a lot of spam spam or a similar incident which causes postfix to open up more connections then your mysql settings allow which causes a lot of waiting processes which then fill up your swap.
     
  6. MrM

    MrM New Member

    I've set the following settings in my.cnf:
    Code:
    max_connections = 500
    max_user_connections = 500
    Was there anything else you had in mind? What about postfix?
     
  7. till

    till Super Moderator

    Just restart mysql. There has nothing to be changed in postfix.
     
  8. MrM

    MrM New Member

    OK, have already done that, thanks.

    Just one observation: Could the max connection limit affect websites? The way I see it is that if postfix still tries to open up a lot of connections, legitimate connections from websites could get blocked when the limit is reached.
     
  9. MrM

    MrM New Member

    Just to clarify my reasoning as to why I thought MySQL was not the culprit here, but rather a consequence. As you can see in the (first two) attachments to this post, MySQL thread count was more or less stable at about 2 throughout the past month, and never exceeded 10. When the overload happened last night at around midnight it shot up to over a 100.

    In the third attachment (memory usage), you can see a strange thing happening. Apparently last night after I rebooted the server, swap usage grew rapidly to about 1G, then dropped at around 5 AM.

    P.S.: All times are GMT+2 (Paris time).
     

    Attached Files:

  10. MrM

    MrM New Member

    Hi, till,

    Changing max_connections and max_user_connections did not help. Swap usage is currently at 150M and still gradually rising at the same rate as before.

    Any other suggestions? I'm at a loss as to the reason for this behaviour. :confused:
     
  11. till

    till Super Moderator

    Check with ps aux which processes consume the memory.
     
  12. MrM

    MrM New Member

    The thing is, no process uses an excessive amount of memory. In fact, there is always a significant amount of free memory on the server. That's why this is so bewildering to me.

    Here is an excerpt from the top command (sorted by memory usage):
    Code:
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    10960 mysql     20   0  237m  45m 3744 S  0.0  9.3  76:52.11 mysqld
     2447 root      20   0  244m  38m 1140 S  0.0  7.9  28:14.47 console-kit-dae
     2627 root      20   0  278m  13m 7240 S  0.0  2.7   1:15.99 apache2
    17871 www-data  20   0  278m 8420 1496 S  0.0  1.7   0:00.55 apache2
    22976 www-data  20   0  278m 8404 1492 S  0.0  1.7   0:00.11 apache2
    22977 www-data  20   0  279m 8364 1496 S  0.0  1.7   0:00.11 apache2
    22276 www-data  20   0  279m 8356 1456 S  0.0  1.7   0:00.16 apache2
    32108 www-data  20   0  279m 8344 1496 S  0.0  1.7   0:00.06 apache2
    32107 www-data  20   0  279m 8324 1460 S  0.3  1.7   0:00.06 apache2
     6905 www-data  20   0  279m 8284 1488 S  0.0  1.7   0:00.00 apache2
     7404 www-data  20   0  279m 8284 1444 S  0.0  1.7   0:00.03 apache2
     7279 www-data  20   0  279m 8272 1452 S  0.0  1.7   0:00.01 apache2
     5044 www-data  20   0  278m 8176 1492 S  0.0  1.6   0:00.07 apache2                                                                                          
     6906 www-data  20   0  278m 8128 1452 S  0.0  1.6   0:00.03 apache2
     7407 www-data  20   0  278m 8120 1452 S  0.0  1.6   0:00.01 apache2
     7410 www-data  20   0  278m 8060 1420 S  0.0  1.6   0:00.00 apache2
     7408 www-data  20   0  278m 7780 1304 S  0.0  1.6   0:00.01 apache2
     7409 www-data  20   0  278m 7780 1292 S  0.0  1.6   0:00.00 apache2
     7402 www-data  20   0  278m 7728 1272 S  0.0  1.5   0:00.00 apache2
     7403 www-data  20   0  278m 7592 1160 S  0.0  1.5   0:00.00 apache2
    29980 www-data  20   0  162m 5824  464 S  0.0  1.2   0:00.02 apache2
     2905 root      20   0 55100 4132 1080 S  0.0  0.8   0:39.71 fail2ban-server
    29979 root      20   0 20304 3804 1892 S  0.0  0.8   0:08.65 vlogger
    21009 root      20   0 76688 3452 2696 S  0.0  0.7   0:00.04 sshd
    26665 root      20   0 76688 3452 2696 R  0.0  0.7   0:00.07 sshd
     7384 postfix   20   0 56576 3256 2508 S  0.0  0.7   0:00.00 smtp
    21399 root      20   0 30692 3108 1964 S  0.0  0.6   0:00.02 mc
     2759 root      20   0 40552 2312  644 S  0.0  0.5   0:15.48 munin-node
    26814 root      20   0 20080 2212 1540 S  0.0  0.4   0:00.03 bash
     2158 nobody    20   0 27756 2208  528 S  0.0  0.4   0:17.54 mydns
    21017 root      20   0 20080 2208 1540 S  0.0  0.4   0:00.01 bash
     7387 postfix   20   0 39136 2180 1712 S  0.0  0.4   0:00.01 bounce
    21401 root      20   0 20064 2176 1528 S  0.0  0.4   0:00.05 bash
    21474 postfix   20   0 39104 2136 1680 S  0.0  0.4   0:00.00 pickup
     7294 root      20   0 18984 1308  988 R  0.3  0.3   0:00.18 top
    12292 root      20   0 76688 1032  808 S  0.0  0.2   0:05.58 sshd
    23820 root      20   0 76688 1032  808 S  0.0  0.2   0:05.58 sshd
     1842 messageb  20   0 22596  868  420 S  0.0  0.2   0:45.53 dbus-daemon
    I've also created a temporary login for munin here:
    http://munin.protobit.net/protobit.net/prod.protobit.net.html
    username: test
    password: test

    You can check the graphs for yourself, to see if there's anything out of the ordinary.
     
  13. till

    till Super Moderator

    Looks quite normal.
     
  14. MrM

    MrM New Member

    It does to me as well. What I can't get my head around is why swap is being used, when there's plenty of RAM left.

    Should I try to reduce MaxClients in apache2.conf? I think this shouldn't be a problem, since the server hosts no site with very high traffic. It is currently set to 150 (the default).

    Any other ideas?
     
  15. bajodel

    bajodel New Member

    Hi Mrm .. take a look at your actual 'swappiness' kernel parameter ..

    # cat /proc/sys/vm/swappiness

    I don't know your distro .. (debian default is 60)

    .. maybe you can test a lower value (for a day or two) setting to 20 (or 0 ..even better)

    # echo "0" > /proc/sys/vm/swappiness

    (These settings are applied instantly by the kernel and are not persistent after a reboot)

    ..you asked for 'another idea' ..here you have :)

    Bye..

    bajodel.
     
  16. MrM

    MrM New Member

    Hi, bajodel,

    I do appreciate any ideas at this point. First of all, let me say that I was not aware this setting existed. After reading up on it a bit, I'm pretty sure I understand what it does.

    So, if I understand it correctly, I'm not sure how it could have an effect in my situation, where the swap size gradually and persistently grows over time:
    [​IMG]

    Again, if I understand it correctly, increasing this value should increase swap usage (with the same memory usage), while decreasing it should decrease swap usage. However, by my understanding, this setting should not affect the growth of swap over time.

    But, as I said at the beginning, I do appreciate every idea, so I'm going to set it to 10 and report back in a couple of days, when I see if it will have had any effect.
     
    Last edited: Aug 25, 2009
  17. bajodel

    bajodel New Member

    The "Swap FAQ" you have read look like a 'simple explanation' ..but ..yes .. that parameter is (in few words) the kernel "trend" (over time) to swap.

    Just try it :) .. but '0' is better in my opinion.

    I've re-read the entire post ..your distro is ubuntu (server) and the default is the same ad debian (60) .. which is quite good, but i've also experienced more swap than what i was used to on my (debian) test server with ISPConfig on-board :)

    I've also experienced bad performances related to clamav scanning .. in particular if you have gunzipped attachmentes (cpio is the hungry app).

    Try to understand which app is swapped:
    # top
    then press: SHIFT-O
    then press: P <enter>

    you can see sorted apps for swap usage .. copy & paste here.

    Bye..

    bajodel.
     
    Last edited: Aug 25, 2009
  18. MrM

    MrM New Member

    OK, I've changed it to 0.

    EDIT: Should the swap usage decrease without rebooting the server? Or will it only stop growing (in case this setting helps)?

    The default was indeed 60.

    I think the problem here isn't so much that swap is being used, the more serious problem is that it keeps growing. This is what I really don't understand.

    Since this is basically not a mail server (it only has postfix installed for the websites to use it), I have disabled clamav, spamassassin, pop3 and imap. Especially clamav was indeed a huge memory hog, which was the primary reason I disabled it.

    Code:
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP COMMAND
     5761 www-data  20   0  278m 8516 1488 S  0.0  1.7   0:00.41 270m apache2
     5652 www-data  20   0  279m 8728 1544 S  0.0  1.7   0:00.39 270m apache2
     5731 www-data  20   0  279m 8716 1536 S  0.0  1.7   0:00.37 270m apache2
     4456 www-data  20   0  279m 8732 1544 S  0.0  1.7   0:00.52 270m apache2
     6223 www-data  20   0  278m 8536 1496 S  0.0  1.7   0:00.33 270m apache2
     6347 www-data  20   0  278m 8576 1488 S  0.0  1.7   0:00.36 270m apache2
     6229 www-data  20   0  278m 8332 1488 S  0.3  1.7   0:00.32 270m apache2
     6225 www-data  20   0  278m 8408 1516 S  0.0  1.7   0:00.32 270m apache2
     5780 www-data  20   0  278m 8520 1532 S  0.0  1.7   0:00.40 270m apache2
     1867 www-data  20   0  278m 8496 1548 S  0.0  1.7   0:00.36 270m apache2
    16034 root      20   0  278m  14m 8700 S  0.0  3.1   0:15.43 263m apache2
     2447 root      20   0  259m  37m 1140 S  0.0  7.6  35:49.74 221m console-kit-dae
    10960 mysql     20   0  237m  45m 3788 S  0.7  9.4  84:01.18 191m mysqld
    25706 www-data  20   0  162m 5816  468 S  0.0  1.2   0:00.01 156m apache2
    26718 postfix   20   0  106m 5188 3840 S  0.0  1.0   0:00.01 101m smtpd
    12292 root      20   0 76688 1012  788 S  0.0  0.2   0:06.99  73m sshd
    23820 root      20   0 76688 1012  788 S  0.0  0.2   0:06.95  73m sshd
    25557 root      20   0 76688 3444 2696 R  0.0  0.7   0:00.10  71m sshd
     2294 root      20   0 56428  332  328 S  0.0  0.1   0:00.05  54m saslauthd
     2295 root      20   0 56428  332  328 S  0.0  0.1   0:00.04  54m saslauthd
     2299 root      20   0 56428  332  328 S  0.0  0.1   0:00.02  54m saslauthd
     2301 root      20   0 56428  332  328 S  0.0  0.1   0:00.02  54m saslauthd
     2302 root      20   0 56428  332  328 S  0.0  0.1   0:00.03  54m saslauthd
     2905 root      20   0 55100 3336 1016 S  0.0  0.7   0:44.08  50m fail2ban-server
     8520 postfix   20   0 52220 1668 1020 S  0.0  0.3   0:00.03  49m qmgr
     1861 root      20   0 48940  396  284 S  0.0  0.1   0:00.23  47m sshd
     8894 postfix   20   0 41612 2156 1248 S  0.0  0.4   0:00.04  38m tlsmgr
     2759 root      20   0 40552 2156  644 S  0.0  0.4   0:17.27  37m munin-node
     2819 postfix   20   0 39104 2132 1680 S  0.0  0.4   0:00.00  36m pickup
    26734 postfix   20   0 39104 2136 1684 S  0.0  0.4   0:00.01  36m showq
     2230 root      20   0 37048  724  472 S  0.0  0.1   0:11.92  35m master
    30926 root      20   0 31796  384  316 S  0.0  0.1   0:01.50  30m pure-ftpd-mysql
     2157 nobody    20   0 26192  308  200 S  0.0  0.1   0:00.36  25m mydns
     2158 nobody    20   0 27756 2256  520 S  0.0  0.5   0:19.08  24m mydns
     1842 messageb  20   0 22596  836  420 S  0.0  0.2   0:49.90  21m dbus-daemon
    14166 ntp       20   0 21384 1188  776 S  0.0  0.2   0:00.11  19m ntpd
     2403 root      20   0 19972  536  384 S  0.0  0.1   0:11.72  18m cron
    12350 root      20   0 19056  304  300 S  0.0  0.1   0:00.04  18m bash
    I don't think I quite understand the swap column here. What does 270m mean? 270MB? Surely not?!
     
    Last edited: Aug 25, 2009
  19. bajodel

    bajodel New Member

    (in case) ..i think it should stop growing.. at least.
    No reboot is required.

    But ..if you want to test the 'trend' from the initial status (low swap) you can:
    # echo "0" > /proc/sys/vm/swappiness (mod swappiness behaviour)
    # sync (recommended before drop cached memory)
    # echo "3" > /proc/sys/vm/drop_caches (drop chached memory)
    # swapoff -a (disable swap)
    # swapon -a (re-enable swap)

    It's heavy to swallow for your server :rolleyes: ..but i think it's (quite) equivalent to rebooting. At worst you can cron that in a script :) if you cannot find a solution :D

    For 'top' the default view is kb (when not explicit) ..in your case is surely mb .. but consider:


    p: SWAP -- Swapped size (kb)
    The swapped out portion of a task√Ęs total virtual memory image.

    o: VIRT -- Virtual Image (kb)
    The total amount of virtual memory used by the task. It includes
    all code, data and shared libraries plus pages that have been
    swapped out.

    VIRT = SWAP + RES.

    q: RES -- Resident size (kb)
    The non-swapped physical memory a task has used.

    RES = CODE + DATA.


    Bye..

    bajodel.
     
    Last edited: Aug 25, 2009
  20. bajodel

    bajodel New Member

    So.. MrM ..how things are going ?
    Is swap devouring your expensive server ? :p

    Bye..

    bajodel.
     

Share This Page