CPU load locks up box. Apache or MYSQL related.

Discussion in 'General' started by crypted, Oct 12, 2010.

  1. crypted

    crypted New Member

    Code:
    top - 14:53:01 up 2 days,  1:09,  1 user,  load average: 52.22, 68.57, 37.83
    Tasks: 346 total,   1 running, 343 sleeping,   0 stopped,   2 zombie
    Cpu(s): 26.4%us, 11.2%sy,  0.2%ni,  0.0%id, 61.7%wa,  0.0%hi,  0.4%si,  0.0%st
    Mem:   2063384k total,  1940664k used,   122720k free,    12108k buffers
    Swap:  1951856k total,   932880k used,  1018976k free,   142528k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
    29613 web5      20   0  202m  49m 7656 S   18  2.5   0:02.42 php-cgi                                                                
     6500 mysql     20   0  502m  27m 2940 S   12  1.4  24:26.55 mysqld                                                                 
    29458 web5      20   0  202m  49m 7656 S   11  2.5   0:00.96 php-cgi                                                                
    28987 web8      20   0  195m  31m 7660 D    1  1.6   0:03.24 php-cgi                                                                
    29201 www-data  20   0     0    0    0 Z    1  0.0   0:00.06 apache2 <defunct>                                                      
    29335 web29     20   0  190m  33m 7636 S    1  1.7   0:00.48 php-cgi                                                                
    29408 web8      20   0  203m  47m 7808 D    1  2.4   0:00.64 php-cgi                                                                
    29515 web5      20   0  202m  50m 7684 S    1  2.5   0:00.78 php-cgi                                                                
       46 root      15  -5     0    0    0 S    1  0.0   0:07.94 kblockd/0                                                              
     3630 root      20   0 57384 4100  960 S    1  0.2  13:12.69 collectl                                                               
    29269 web5      20   0  203m  45m 7640 D    1  2.3   0:00.96 php-cgi                                                                
    29273 web5      20   0  204m  49m 7636 D    1  2.4   0:01.94 php-cgi                                                                
    29294 web29     20   0  190m  34m 7684 S    1  1.7   0:00.52 php-cgi                                                                
    29306 web5      20   0  204m  48m 7584 S    1  2.4   0:01.02 php-cgi                                                                
    29326 web29     20   0  190m  33m 7636 S    1  1.7   0:00.48 php-cgi                                                                
    29409 web8      20   0  201m  46m 7628 S    1  2.3   0:00.58 php-cgi                                                                
    29412 web8      20   0  201m  46m 7624 D    1  2.3   0:00.62 php-cgi                                                                
    29474 web8      20   0  201m  48m 7592 D    1  2.4   0:00.68 php-cgi                                                                
    29514 web1      20   0  187m  34m 7788 D    1  1.7   0:01.00 php-cgi                                                                
    29734 root      20   0 19216 1532  940 R    1  0.1   0:00.32 top                 
    Debian Lenny x64, dual 3ghz P4, 2Gb ram, 500gb hdd.

    I've noticed that the websites will get hit like crazy and cause massive load spikes. (inter5.org and areyouliberal.com, namely). I put a robots.txt limitation of 8 seconds for cycles to stop the massive Googlebot floods. I've also added caching to all Wordpress sites.

    I've been trying to tweak my.cnf to allow better response times because I think it's going to end up being Mysql related. The tuning-primer.sh recommendations have been implemented almost entirely.

    Still locks up. It will come back to responding after about 15 minutes. However, it will keep a CPU load of 50+ for about half an hour before things seem to settle causing bad lag.

    Error logs don't really help too much.

    mydns log
    Code:
    mydns[24690]: mydns: error finding NS type resource records for name `ns1' in zone 12: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: mydns: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2): error during query: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: last message repeated 2 times
    mydns[24690]: mydns: error finding NS type resource records for name `' in zone 12: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: mydns: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2): error during query: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: last message repeated 2 times
    mydns[24690]: mydns: error finding NS type resource records for name `ns2' in zone 12: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: mydns: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2): error during query: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: last message repeated 2 times
    mydns[24690]: mydns: error finding NS type resource records for name `' in zone 12: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: mydns: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2): error during query: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: last message repeated 2 times
    mydns[24690]: mydns: ns3.derekgordon.com.: error loading SOA: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: mydns: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2): error during query: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mydns[24690]: last message repeated 2 times
    mydns[24690]: mydns: ns3.derekgordon.com.: error loading SOA: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (errno=2)
    mail.err
    Code:
    Oct 12 14:48:24 my imapd-ssl: authentication error: Input/output error
    Oct 12 14:49:45 my authdaemond: failed to connect to mysql server (server=localhost, userid=ispconfig): Lost connection to MySQL server at 'reading authorization packet', system error: 104
    Oct 12 14:50:58 my imapd: authentication error: Input/output error
    Oct 12 14:51:28 my imapd: authentication error: Input/output error
    Oct 12 15:00:46 my imapd: authentication error: Input/output error
    Oct 12 15:00:51 my authdaemond: failed to connect to mysql server (server=localhost, userid=ispconfig): Lost connection to MySQL server at 'sending authentication information', system error: 32
    kern.log
    Code:
    Oct 12 10:15:45 my kernel: [164931.197729] php-cgi[24767]: segfault at 411347f0 ip 676429 sp 7fff02289320 error 4 in php5-cgi[400000+506000]
    Oct 12 12:42:34 my kernel: [173901.308359] php-cgi[11310]: segfault at 44afc280 ip 676429 sp 7fffca89e5f0 error 4 in php5-cgi[400000+506000]
    error.log web8 / inter5.org
    Code:
    [Tue Oct 12 15:04:16 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    [Tue Oct 12 15:04:43 2010] [warn] (104)Connection reset by peer: mod_fcgid: read data from fastcgi server error.
    [Tue Oct 12 15:04:43 2010] [warn] (104)Connection reset by peer: mod_fcgid: ap_pass_brigade failed in handle_request function       
    [Tue Oct 12 15:04:43 2010] [warn] (104)Connection reset by peer: mod_fcgid: read data from fastcgi server error.
    [Tue Oct 12 15:04:44 2010] [error] [client 66.249.71.105] Premature end of script headers: index.php
    [Tue Oct 12 15:05:19 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    [Tue Oct 12 15:05:36 2010] [warn] mod_fcgid: read data timeout in 360 seconds
    [Tue Oct 12 15:05:37 2010] [error] [client 98.158.20.230] Premature end of script headers: index.php
    [Tue Oct 12 15:06:33 2010] [warn] (104)Connection reset by peer: mod_fcgid: read data from fastcgi server error.
    [Tue Oct 12 15:06:34 2010] [warn] (104)Connection reset by peer: mod_fcgid: ap_pass_brigade failed in handle_request function       
    [Tue Oct 12 15:07:50 2010] [warn] mod_fcgid: read data timeout in 360 seconds
    [Tue Oct 12 15:07:50 2010] [error] [client 72.14.199.155] Premature end of script headers: index.php
    [Tue Oct 12 15:08:04 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    [Tue Oct 12 15:08:06 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    [Tue Oct 12 15:08:12 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    [Tue Oct 12 15:08:12 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    [Tue Oct 12 15:08:21 2010] [warn] (103)Software caused connection abort: mod_fcgid: ap_pass_brigade failed in handle_request functi$
    
    access.log web8 / inter5.org has at least 10 queries a second. These use Mysql (wordpress installs).

    No logs are being added for Mysql /var/log/ files, damnit.

    Any thoughts on tuning this thing? Errors indicate an inability to access Mysql causing the daemons to not be able to produce information and lock up....

    Code:
    # The MySQL database server configuration file.
    #
    # You can copy this to one of:
    # - "/etc/mysql/my.cnf" to set global options,
    # - "~/.my.cnf" to set user-specific options.
    # 
    # One can use all long options that the program supports.
    # Run program with --help to get a list of available options and with
    # --print-defaults to see which it would actually understand and use.
    #
    # For explanations see
    # http://dev.mysql.com/doc/mysql/en/server-system-variables.html
    
    # This will be passed to all mysql clients
    # It has been reported that passwords should be enclosed with ticks/quotes
    # escpecially if they contain "#" chars...
    # Remember to edit /etc/mysql/debian.cnf when changing the socket location.
    [client]
    port            = 3306
    socket          = /var/run/mysqld/mysqld.sock
    
    # Here is entries for some specific programs
    # The following values assume you have at least 32M ram
    
    # This was formally known as [safe_mysqld]. Both versions are currently parsed.
    [mysqld_safe]
    socket          = /var/run/mysqld/mysqld.sock
    nice            = 0
    
    [mysqld]
    #
    # * Basic Settings
    #
    user            = mysql
    pid-file        = /var/run/mysqld/mysqld.pid
    socket          = /var/run/mysqld/mysqld.sock
    port            = 3306
    basedir         = /usr
    datadir         = /var/lib/mysql
    tmpdir          = /tmp
    language        = /usr/share/mysql/english
    skip-external-locking
    #
    # Instead of skip-networking the default is now to listen only on
    # localhost which is more compatible and is not less secure.
    #bind-address           = 127.0.0.1
    #
    # * Fine Tuning
    #
    key_buffer              = 16M
    max_allowed_packet      = 16M
    thread_stack            = 128K
    thread_cache_size       = 8
    # This replaces the startup script and checks MyISAM tables if needed
    # the first time they are touched
    myisam-recover          = BACKUP
    max_connections        = 150
    table_cache            = 300
    #thread_concurrency     = 10
    #
    # * Query Cache Configuration
    #
    query_cache_limit       = 2M
    query_cache_size        = 32M
    #
    # * Logging and Replication
    #
    # Both location gets rotated by the cronjob.
    # Be aware that this log type is a performance killer.
    #log            = /var/log/mysql/mysql.log
    #
    # Error logging goes to syslog. This is a Debian improvement :)
    #
    # Here you can see queries with especially long duration
    log_slow_queries        = /var/log/mysql/mysql-slow.log
    #long_query_time = 2
    log-queries-not-using-indexes
    #
    # The following can be used as easy to replay backup logs or for replication.
    # note: if you are setting up a replication slave, see README.Debian about
    #       other settings you may need to change.
    #server-id              = 1
    #log_bin                        = /var/log/mysql/mysql-bin.log
    expire_logs_days        = 10
    max_binlog_size         = 100M
    #binlog_do_db           = include_database_name
    #binlog_ignore_db       = include_database_name
    #
    # * BerkeleyDB
    #
    # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12.
    skip-bdb
    #
    # * InnoDB
    #
    # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
    # Read the manual for more InnoDB related options. There are many!
    # You might want to disable InnoDB to shrink the mysqld process by circa 100MB.
    #skip-innodb
    #
    # * Security Features
    #
    # Read the manual, too, if you want chroot!
    # chroot = /var/lib/mysql/
    #
    # For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
    #
    # ssl-ca=/etc/mysql/cacert.pem
    # ssl-cert=/etc/mysql/server-cert.pem
    # ssl-key=/etc/mysql/server-key.pem
    
    
    
    [mysqldump]
    quick
    quote-names
    max_allowed_packet      = 16M
    
    [mysql]
    #no-auto-rehash # faster start of mysql but no tab completition
    
    [isamchk]
    key_buffer              = 16M
    
    #
    # * NDB Cluster
    #
    # See /usr/share/doc/mysql-server-*/README.Debian for more information.
    #
    # The following configuration is read by the NDB Data Nodes (ndbd processes)
    # not from the NDB Management Nodes (ndb_mgmd processes).
    #
    # [MYSQL_CLUSTER]
    # ndb-connectstring=127.0.0.1
    
    
    #
    # * IMPORTANT: Additional settings that can override those from this file!
    #   The files must end with '.cnf', otherwise they'll be ignored.
    #
    !includedir /etc/mysql/conf.d/
    
    join_buffer_size = 2M
    max_heap_table_size = 80M
    tmp_table_size = 80M
    
    low_priority_updates = 1
    
    concurrent_insert=2
    
     
    Last edited: Oct 12, 2010
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Your server uses a lot of swap, thats normally a indication that it does not has enough RAM. Mysql and apache need a lot of ram to run fast as mysql caches the tables and queries in ram and when it has to use swap then the performance drops rapidly. Currently you have 2 GB ram installed, can you increase it to 4, 6 or 8 GB?
     
  3. Turbanator

    Turbanator Member HowtoForge Supporter

    Memory is probably your main issue, but may also want to increase your mysql max_connections to much higher than 150 [500] (and ignore tuning-primer.sh stating that you have it set too high or allocating too much memory to mysql). I say that because mydns lookup and email/spam fighting also hits mysql (I think)....it solved many of my problems at least.


    The segfaults you get are an ongoing issue that many of us are still trying to fix. I think falko pointed us to some articles just recently but I haven't had a chance to research them.
     
  4. crypted

    crypted New Member

    Till, I don't know on memory. The standard price at the datacenter is $180 onetime fee for an additional 2GB or an extra $18/monthly. If they will let me submit my own memory and pay an installation fee of like $20, then I'd be more apt to doing it.

    Right now, there is 512MB free on the memory and almost all SWAP is free. It just goes crazy for some reason at random points and they all spike.

    I'm 99.99999% it's SQL related. I've tweaked it a bit more and will give it 24 hrs before I run those tests again to determine if I need to tweak further.
     
  5. MarkSeger

    MarkSeger New Member

    The good news is I can see from your TOP display that you have collectl running. Download/install collectl-utils and plot everything with colplot. Maybe something will jump out at you as to which resource is being starved.
    -mark
     
  6. e100

    e100 New Member

    When I see high load averages the first thing I always check is disk IO.
    Your load average should be less than or equal to the number of CPU cores you have. If the load is higher than the number of cores then you have a bottleneck somewhere and it is usually disk IO.

    When this issue is happening run: vmstat 1
    Ctrl+C will exit
    This will print lots of data including blocks read/written each second.
    You will see that IO in/out will be rather high.

    Till and Turbanator are right, you need more RAM.
    You were using nearly 1GB of SWAP and your CPU's were spending 61% of their time waiting on disk IO:

    Code:
    Cpu(s): 26.4%us, 11.2%sy,  0.2%ni,  0.0%id, [B]61.7%wa[/B],  0.0%hi,  0.4%si,  0.0%st
    Mem:   2063384k total,  1940664k used,   122720k free,    12108k buffers
    Swap:  1951856k total,   [B]932880k used[/B],  1018976k free,   142528k cached
    
    
     
  7. crypted

    crypted New Member

    I understand that principle and I thank you all for the commentary. My only curiosity and confusion is why this occurs so randomly and when the server is generally not having high use (web, imap, ftp, etc.. are not being hit too hard).

    It seems as if there's some memleak or something somewhere.

    RAM has been on this list and I'll get some more put in tonight to test it out at least temporarily.
     
  8. crypted

    crypted New Member

    Also, what would a good WAIT generally be?

    I'm at 4GB as of now. wait has been as high as 15.4% but just for a second or two.

    Code:
    top - 22:41:33 up 19 min,  1 user,  load average: 0.71, 0.72, 0.72
    Tasks: 175 total,   2 running, 171 sleeping,   0 stopped,   2 zombie
    Cpu(s): 16.9%us, 11.8%sy,  0.0%ni, 70.9%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   4063804k total,  1659392k used,  2404412k free,    49000k buffers
    Swap:  1951856k total,        0k used,  1951856k free,   435708k cached
     
  9. e100

    e100 New Member

    I've never paid attention to wait unless I'm having an issue so I guess I would say if it is mostly low thats a good sign.

    The server was overloaded when you were running top.
    15 php-cgi and one mysql processes all fighting for swap and CPU time.
    Based on the info you provided I would say this issue was caused by too much web traffic.

    If you have email and other processes on this server that only adds to the problem.

    My suggestion is to edit your apache configs and reduce the number of php processes that are allowed to run at a time. You need to limit the php process to an amount that your hardware can handle. When you have too many processes running the CPU will waste a considerable amount of time just switching from one process to another.

    Also, reducing the number of php process will reduce the amount of memory needed.
     
  10. crypted

    crypted New Member

    Agreed. But, what is the best method to edit such settings with ISPC3 in use?

    I'd love to limit PHP processes for my two popular websites (each racking in several thousand hits a day).
     
  11. till

    till Super Moderator Staff Member ISPConfig Developer

    Which php method do you use in your sites, cgi, fcgi, mod_php or suphp?
     
  12. MarkSeger

    MarkSeger New Member

    Trying to figure out what is going on by looking at a few data points over a handful of seconds is a real good way to make the wrong decision. Like I said you have collectl running so if you look at /var/log/collectl you should see a number of files, probably one/day, that contain samples of almost every performance metric on your system. One set every 10 seconds! All you need to do it 'play it back' with the right parameters. You'll be able to see cpu, disk, network, memory, nfs (if you use it), page faults, interrupts and a whole lot more. Even detailed process and slab memory data.

    To get started just type:

    collectl -p /var/log/collectl/filename -oT

    The -scdnm with display 'brief' format for cpu, disk, network and memory.
    The -oT switch will include optional timestamps.

    You can change the subsystems to see individual CPU, NETWORK or DISK loads by specifying then in uppercase, but you'll get far less compact data.

    you can even running with --vmstats instead of -s if you prefer that format.

    check out http://collectl.sourceforge.net to learn mode

    -mark
     
  13. crypted

    crypted New Member

    Mark, I'll look into that collectl after while.

    Till, I'm using fcgi for the main websites as they're wordpress. The rest use mod-php.
     
  14. till

    till Super Moderator Staff Member ISPConfig Developer

    Thats a good choice!

    Which value have you set under system Server Config on the fastCGI tab for the value "FastCGI Children"? If it is > 1, then set it to 1, then you will have to change a setting in every website, e.g. quota and click on save to apply the new value.
     
  15. crypted

    crypted New Member

    FCGI children is at 8 with max requests of 5000. Should it be just lesser than 8, or 1?

    Changed children, went to the two big sites and changed hdd quota by 1mb.

    Now the waiting game.
     
  16. MarkSeger

    MarkSeger New Member

    while you're waiting, just type "collectl<return>" and watch the output. very compact, very low overhead. <0.1%.
    -mark
     
  17. crypted

    crypted New Member

    Till, I made those changes. Now it takes about 4 seconds to access the website inter5.org and areyouliberal.com.

    Code:
    top - 08:37:57 up 10:15,  2 users,  load average: 2.17, 2.10, 1.67
    Tasks: 185 total,   2 running, 182 sleeping,   0 stopped,   1 zombie
    Cpu(s): 11.9%us,  8.5%sy,  0.0%ni, 79.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   4063804k total,  2898012k used,  1165792k free,   619852k buffers
    Swap:  1951856k total,        0k used,  1951856k free,   670860k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
    15614 web5      20   0  206m  53m 7768 S   36  1.4   0:29.66 php-cgi                                                                
    16287 root      20   0     0    0    0 Z    3  0.0   0:00.10 miniserv.pl <defunct>                                                  
     2933 mysql     20   0  251m  79m 5904 S    1  2.0  13:44.49 mysqld                                                                 
    15630 www-data  20   0  248m  10m 1796 S    1  0.3   0:00.10 apache2                                                                
        1 root      20   0 10316  752  620 S    0  0.0   0:01.14 init                                                                   
        2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd                                                               
        3 root      RT  -5     0    0    0 S    0  0.0   0:00.08 migration/0                                                            
        4 root      15  -5     0    0    0 S    0  0.0   0:00.14 ksoftirqd/0  
    Code:
    waiting for 1 second sample...
    #<--------CPU--------><----------Disks-----------><----------Network---------->
    #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
     100  28    59   8282      0      0    168     20      5     52      4      44 
      96   9    49   9013     28      3    172     10      4     27      4      17 
     100   9   100  11693      0      0    332      8      3     28      5      21 
      87  14    71   2287      0      0    704     30      4     44      5      37 
      50  18    49   1217      0      0     80     14      3     22      3      16 
      24  12   101    857      0      0      0      0      3     31      7      28 
       0   0    80    148      0      0    196     27      5     52     37      56 
       0   0    75    150      4      1    148     14      4     29      7      19 
      45   2    77   5209      0      0    252     18      5     36      8      28 
      82   5    53  10375      0      0    216     12      5     39      3      21 
      79   5    40   9537      0      0     52      8      3     27      2      13 
      48   3    61   7862      0      0    296     17      2     21      5      12 
       0   0    35    112      0      0     64     10      3     24      3      18 
       0   0    33    100      0      0      0      0      1     17      1       9 
      24   2   178    104      0      0    928    151      1     16      2      12 
      87  11    52   2327      0      0    496      8      2     22      1      13 
    Ouch!
     
  18. MarkSeger

    MarkSeger New Member

    well I see you got collectl going. ;)

    btw - if you want to see what your memory is doing at the same time just add the switch -s+m. if you want time stamps just include -oT. more suggestions later if you care.

    clearly you don't have a network or disk problem. All your CPUs are certainly getting hammered though since the CPU number is reported is a average of all of them. In fact if you have 4 and collectl reports 25%, 1 could be at 100%. To see individual CPU loads, use "collectl -sC". Unfortunally this can be a pain to view so it you add the --home switch it will provide a display similar to top, but with no history.

    one thing that is curious is I've never seen interrupts so low! Typically you see 1000 on an idle system because the clock interrupts 1K times/second. being a web hosted environment, perhaps you're running in a VM and the clock is being processed by the hypervisor? no big deal, just a curiousity.

    getting back to the high cpu load, it feels like this is indeed a case where the application needs to be tuned or simply needs more cpu. in any event, as you try to tune you can always run collectl in another window and be able to observe immediate results.

    enjoy
    -mark
     
  19. crypted

    crypted New Member

    I'll tell you I notice a response difference when I kill off the most popular website, inter5.org. (just deactivated it again to watch)

    its online in this one
    Code:
    top - 10:38:55 up 12:16,  2 users,  load average: 0.83, 1.06, 1.14
    Tasks: 180 total,   2 running, 178 sleeping,   0 stopped,   0 zombie
    Cpu(s): 62.7%us, 18.7%sy,  0.0%ni, 18.1%id,  0.3%wa,  0.0%hi,  0.2%si,  0.0%st
    Mem:   4063804k total,  1847300k used,  2216504k free,   245612k buffers
    Swap:  1951856k total,     1792k used,  1950064k free,   370840k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
    24176 web5      20   0  207m  53m 7992 S   55  1.3   4:28.24 php-cgi                                                                
    23646 web8      20   0  194m  40m 7984 S   54  1.0   4:25.35 php-cgi                                                                
    20516 web5      20   0  209m  55m 7996 R   39  1.4   6:05.86 php-cgi                                                                
     2933 mysql     20   0  261m  81m 5988 S   14  2.1  20:36.26 mysqld     
    its offline in this one
    Code:
    top - 10:41:34 up 12:19,  2 users,  load average: 0.35, 0.81, 1.03
    Tasks: 164 total,   1 running, 163 sleeping,   0 stopped,   0 zombie
    Cpu(s): 50.0%us,  0.0%sy,  0.0%ni, 50.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   4063804k total,  1443496k used,  2620308k free,   246112k buffers
    Swap:  1951856k total,     1792k used,  1950064k free,   372280k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
    30509 root      20   0 18960 1324  940 R  254  0.0   0:00.46 top                                                                    
        1 root      20   0 10316  752  620 S    0  0.0   0:01.20 init                                                                   
        2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd                                                               
        3 root      RT  -5     0    0    0 S    0  0.0   0:00.10 migration/0                                                            
        4 root      15  -5     0    0    0 S    0  0.0   0:00.20 ksoftirqd/0         
    collectl showing where i turned the website back on. you will see the cpu spike and some disk spike.
    Code:
       0   0    65    125      0      0    576     17      1     16      1       5 
       0   0    53    127      0      0      0      0      2     27      2      21 
       1   0    44    127      4      1    108     10      3     38      3      33 
       0   0     9     42      0      0      0      0      1     10      1       4 
       1   0    62    157      0      0      0      0      1      8      0       2 
       0   0    44     89      0      0      0      0      3     38      3      32 
       0   0    36     88      0      0     88      9      3     24      3      15 
      37   5    30    334     16      4      0      0      2     20      1      12 
      49  11    33    472      8      2      0      0      1     15      1      12 
      50  24    24   1673      0      0      0      0      1     17      1      13 
       0   0    43     82      0      0      0      0      1     12      4      10 
       1   0    68    111      0      0    476     16      2     34     33      30 
       0   0    29     68      0      0      0      0      3     28     13      23 
       0   0    26     95      0      0      0      0      2     19      2      10 
       1   0    25     61      0      0      0      0      1     15      1       8 
    #<--------CPU--------><----------Disks-----------><----------Network---------->
    #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
      24   6    45    488      0      0    660     14      2     20      1      10 
      14   7    43    214      0      0      0      0      3     21      3      17 
      32   3    12    152      4      1      0      0      2     21      1      10 
       5   1    16     75      0      0      0      0      1      9      0       3 
      57  17    32    728      0      0      0      0      1     11      1       5 
      84  25   142   1585    340     72   1604     30      2     25      1      12 
      97  15    70   4485     32      6      0      0      2     24      7      20 
     100  29    65   5916      0      0      0      0      4     53     27      40 
      76  29    41   4397      8      2      0      0      2     30      2      19 
      62  16    55    904      0      0      0      0      2     32     24      23 
      34  18    41   1225      0      0    464     13      3     37     39      35 
      20   0    30    106      0      0      0      0      1     19      1       9 
      96  16    31   4941     24      3      0      0      2     20      5      14 
      84  25    58   7904      0      0      4      1      2     19      2       8 
    Also, check out the graphs for anything interesting I'm not seeing:
    http://monitor.derekgordon.com/munin/derekgordon.com/index.html
     
    Last edited: Oct 13, 2010
  20. MarkSeger

    MarkSeger New Member

    I took a quick look at your graphs and I'm afraid they're not going to be very useful. The sample times for the data are far too infrequent to see anything meaningful such as spikes. Furthermore, you're using RRD which normalizes the data it plots - a fancy name for 'it lies!!!'.

    Perhaps the next thing you might want to look into is downloading collectl-utils which contains a tool called colplot, which uses gnuplot and a web interface to display very detailed and accurate plots from collectl data. The way that works is you use collectl to turn the collected data into something plottable (or even loadable into a spreadsheet). Just use the playback command like this:

    collectl -p /var/log/collectl/filename -P -f/tmp

    and that will create a plottable file in /tmp. Then you run colplot, point it to /tmp and tell it to draw all plots. There is a sample of one of plots on the collectl webpage as well as more info here:
    http://collectl-utils.sourceforge.net/colplot.html

    -mark
     

Share This Page