HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials

HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials (http://www.howtoforge.com/forums/index.php)
-   General (http://www.howtoforge.com/forums/forumdisplay.php?f=25)
-   -   Whole server went down (http://www.howtoforge.com/forums/showthread.php?t=41052)

wxman 17th November 2009 03:41

Whole server went down
 
I only have a minute because I'm trying to get my server back up!

It went down about an hour ago and the ony clue I see is in the apache log:
Code:

DBI connect('database=dbispconfig;host=localhost:3306','ispconfig',...) failed: Too many connections at /usr/local/ispconfig/server/scripts/vlogger line 255
DBI Error:  at /usr/local/ispconfig/server/scripts/vlogger line 255.
piped log program ' /usr/local/ispconfig/server/scripts/vlogger -s access.log -t "%Y%m%d-access.log" -d "/etc/vlogger-dbi.conf" /var/log/ispconfig/httpd' failed unexpectedly

This shows up over and over.

till 17th November 2009 09:08

Increase the max_connections and max_user_connections setting to e.g. 500 in your mysql my.cnf and restart mysql.

wxman 17th November 2009 22:08

I gave it a try, but I can't tell if it helped. I'm really stumped all day today. The server keeps locking up after being up for an hour or so. I won't be able to log in even to get to the logs, then all I can do is reboot. Stopping apache and mysql does nothing. When it's like that I was able to use top at a command line, and it showed the load averages all over 100. They usually hang around 1.
I get the feeling it has something to do with either apache or mysql, but I can't find anything clear in the logs.

till 17th November 2009 22:09

Which processes cause the load?

wxman 17th November 2009 22:30

Quote:

Originally Posted by till (Post 211060)
Which processes cause the load?

That's half my problem. I'm obviously not as good at running a server as I hoped I was. I don't know now where to find that out.

I've got top open right now, and the load is back to around 1. I do know when the last time it was it went crazy.

wxman 18th November 2009 00:55

It just went down again. While I couldn't get to anything top looked like this:
Code:

top - 18:49:56 up  3:15,  2 users,  load average: 69.95, 71.26, 48.81
Tasks: 260 total,  1 running, 259 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.8%us,  1.7%sy,  0.0%ni, 38.0%id, 59.1%wa,  0.0%hi,  0.0%si,  0.3%st
Mem:  1575132k total,  1562748k used,    12384k free,    8300k buffers
Swap:  7815580k total,  4844568k used,  2971012k free,    57876k cached
PID to renice:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20918 www-data  20  0  270m  43m 3456 S    0  2.9  0:00.66 apache2
20824 www-data  20  0  270m  32m 3444 S    0  2.1  0:00.60 apache2
 6392 mysql    20  0  446m  30m 2924 S    0  2.0  1:17.98 mysqld
20801 www-data  20  0  270m  30m 3424 S    0  2.0  0:00.52 apache2
20749 www-data  20  0  270m  28m 3424 D    0  1.9  0:00.74 apache2
20730 www-data  20  0  270m  26m 3444 S    0  1.7  0:00.76 apache2
20675 www-data  20  0  269m  25m 3416 S    0  1.7  0:00.92 apache2
20710 www-data  20  0  270m  25m 3536 D    0  1.7  0:00.80 apache2
20707 www-data  20  0  270m  25m 3476 S    0  1.6  0:00.56 apache2
20690 www-data  20  0  270m  24m 3428 S    0  1.6  0:00.84 apache2
20684 www-data  20  0  270m  22m 3480 D    0  1.4  0:00.68 apache2
20641 www-data  20  0  270m  20m 3476 S    0  1.4  0:00.82 apache2
20653 www-data  20  0  270m  20m 3480 D    0  1.3  0:00.92 apache2
20539 www-data  20  0  270m  19m 4084 D    0  1.3  0:01.30 apache2
20622 www-data  20  0  269m  18m 3460 S    0  1.2  0:01.06 apache2
20624 www-data  20  0  269m  18m 3464 S    0  1.2  0:00.98 apache2
20553 www-data  20  0  279m  18m 3544 D    0  1.2  0:01.16 apache2
20623 www-data  20  0  270m  16m 3424 S    0  1.1  0:00.94 apache2
20676 www-data  20  0  269m  16m 3416 S    0  1.1  0:00.98 apache2
20302 www-data  20  0  270m  16m 3512 D    0  1.1  0:01.30 apache2
20556 www-data  20  0  269m  15m 3812 S    0  1.0  0:00.94 apache2
19881 www-data  20  0  270m  15m 4292 D    0  1.0  0:03.30 apache2
21225 www-data  20  0  237m  15m 3360 S    0  1.0  0:00.16 apache2
20682 www-data  20  0  270m  14m 3428 S    0  1.0  0:00.92 apache2
20651 www-data  20  0  270m  14m 3428 S    0  1.0  0:00.98 apache2
20225 www-data  20  0  268m  14m 3536 D    1  0.9  0:01.52 apache2
20552 www-data  20  0  274m  14m 3524 S    0  0.9  0:00.92 apache2
20373 www-data  20  0  279m  14m 3500 S    0  0.9  0:01.32 apache2
19903 www-data  20  0  272m  14m 3920 D    0  0.9  0:03.46 apache2

It took less than 5 minutes to go from running normally to overload. The traffic in and out stayed the same until it overloaded. After it did, the traffic died completely. There were a lot more apache COMMAND running but I didn't include it here.

edge 18th November 2009 08:10

Wow..... load average: 69.95

You could install munin (a howto is in the Howtos section).
This might give you an indication on what is causing the high load.

To low hardware memory and a lot of traffic could also cause high load.

wxman 18th November 2009 15:44

Quote:

Originally Posted by edge (Post 211074)
Wow..... load average: 69.95

You could install munin (a howto is in the Howtos section).
This might give you an indication on what is causing the high load.

To low hardware memory and a lot of traffic could also cause high load.

At it's worst the average was around 170!
I've looked into munin, and I'll look again. I've got 2GB of RAM which I thought should be plenty. I forgot to say that I'm already running Ganglia, which looks like does as much as Munin.

till 18th November 2009 15:46

Please check the apache access log, do you see if a specific URL causes these high amount of simultanious connections in your server?

wxman 18th November 2009 16:52

Quote:

Originally Posted by till (Post 211189)
Please check the apache access log, do you see if a specific URL causes these high amount of simultanious connections in your server?

No there isn't. My first thought was a DOS attack, so I checked the log. I don't see any one IP or URL the jumps out. Also the traffic seems steady right up to when it locks up. You can see it on ganglia. The graph goes from the bottom to over the top in 5 minutes. The only thing I saw the shocked me, was watching top. You can see the number of apache processes almost double in an instant.
There was an increase in the total traffic for the day. On the 16th the total hits was 70890 and on the 17th it went to 113208.


All times are GMT +2. The time now is 16:30.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.