HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials

HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials (http://www.howtoforge.com/forums/index.php)
-   General (http://www.howtoforge.com/forums/forumdisplay.php?f=15)
-   -   Server dies every night at 4:00 (http://www.howtoforge.com/forums/showthread.php?t=11295)

smartcall 11th March 2007 09:13

Server dies every night at 4:00
 
Hi,

I have this bug issue for already two nights. It started at 4:00 am on Saturday and again repeated at 4:00 am on Sunday - today.
I have FC6 with ISPConfig 2.2.9. I haven't made any changes to the system for months. And it is a production one with many sites.
The server dies completely. Only a hardware reboot fixes the problem. I can't identify what causes it.
It must be connected to the scripts that start to run at 4:00 am, but I don't know where to look. All common log files don't show anything.
I also monitor the server with snmpd and the graphs are normal. Nothing special. Just after 4:00 am there is no more data for the graphs, because the server is dead.

Please HELP. I need to resolve this before 4:00 am tomorrow.

djtremors 11th March 2007 11:27

I'm using fedora 5 and 6 too. I have no issues though cron wise.

cron.daily runs at 4am and so does webalizer for ispconfig.
PHP Code:

# cd /etc/cron.daily
# ls -l
-rwxr-xr-x 1 root root  577 Feb 27 00:40 000-delay.cron
-rwxr-xr-x 1 root root  379 Oct 30 18:37 0anacron
-rwxr-xr-x 1 root root 2936 Nov 29 00:16 beagle-crawl-system
-rwxr-xr-x 1 root root  118 Jan 25 01:06 cups
-rwxr-xr-x 1 root root  180 Feb  9 01:45 logrotate
-rwxr-xr-x 1 root root  418 Jan  9 20:56 makewhatis.cron
-rwxr-xr-x 1 root root  137 Nov 26 23:04 mlocate.cron
-rwxr-xr-x 1 root root 2181 Jun 21  2006 prelink
-rwxr-xr-x 1 root root  114 Sep  7  2006 rpm
-rwxr-xr-x 1 root root  290 Jul 13  2006 tmpwatch 

PHP Code:

# crontab -e
0 4 * * * /root/ispconfig/php/php /root/ispconfig/scripts/shell/webalizer.php &> /dev/null 

You can try remarking or moving these out and see which is causing it.
My guess is to check /var/log/cron and see what was the last message before the crash/hang.
Also, was the console sitting at a login or was there any kernel messages?

martinfst 11th March 2007 11:30

As you say: at 4.00 couple of scripts are started. Could it be a hardware memory problem? Or are you running out of memory in general (swap full)
Code:

vmstat -s
might be useful. For real hardware problem, you will have to run vendor specific memory tests; often you need to boot from a diagnostics CD.

till 11th March 2007 11:42

Quote:

Originally Posted by smartcall
Hi,

I have this bug issue for already two nights. It started at 4:00 am on Saturday and again repeated at 4:00 am on Sunday - today.
I have FC6 with ISPConfig 2.2.9. I haven't made any changes to the system for months. And it is a production one with many sites.
The server dies completely. Only a hardware reboot fixes the problem. I can't identify what causes it.
It must be connected to the scripts that start to run at 4:00 am, but I don't know where to look. All common log files don't show anything.
I also monitor the server with snmpd and the graphs are normal. Nothing special. Just after 4:00 am there is no more data for the graphs, because the server is dead.

Please HELP. I need to resolve this before 4:00 am tomorrow.

If you dont find anything in the logs then its most likely hardware related. At 4 AM run serveral cronjobs which may cause a higher load on your server, if there is e.g. some bad RAM or power supply, the server might die.

smartcall 11th March 2007 13:09

This is the last I see in /var/log/cron.1

Code:

Mar 11 04:00:01 ns1 crond[21239]: (root) CMD (/usr/bin/rdate -s ntp3.fau.de)
Mar 11 04:00:01 ns1 crond[21240]: (root) CMD (/root/ispconfig/php/php /root/ispconfig/scripts/shell/check_services.php &> /dev
/null)
Mar 11 04:00:01 ns1 crond[21241]: (root) CMD (/root/ispconfig/php/php /root/ispconfig/scripts/shell/webalizer.php &> /dev/null)

And I have Intel Dual Core CPU.
All the scripts besides webalizer one re-run after I boot the server and nothing happends.
Could that be the webalizer script?

Thanks

dlpc 11th March 2007 13:16

Quote:

Originally Posted by till
If you dont find anything in the logs then its most likely hardware related. At 4 AM run serveral cronjobs which may cause a higher load on your server, if there is e.g. some bad RAM or power supply, the server might die.

Could be a heat problem, have a look at the cpu-cooler.
Had the same problem here, cron start >> server shut down no entry in any log.
Cpu cooler not running :(

smartcall 11th March 2007 13:46

Cooler is working. I'm currently running the wealizer script manually to see what's happenrng. It's taking a long time to finish as I have more than 300 sites. But I don't see any significant load on the CPU.

martinfst 11th March 2007 14:34

Any read/write error for your disks? If it runs long, something is the bottleneck. That's either CPU, Disk or Memory. find out what's (over-)used and you probably have an indication where to look for a possible hardware problem.

smartcall 11th March 2007 14:52

Webalizer still runs for already an hour. Memory, CPU and disks are OK. The reason for such long operation is extreemly big web.log files. I have sites in my server that have over 150MB web.log files. But I monitor it now, while webalizer script runs and I don't see anything strange.

Code:

top - 15:48:13 up  5:48,  2 users,  load average: 0.15, 0.51, 0.65
Tasks: 151 total,  1 running, 150 sleeping,  0 stopped,  0 zombie
Cpu(s):  2.9%us,  0.8%sy,  0.0%ni, 92.2%id,  3.9%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  2074448k total,  1992100k used,    82348k free,  194392k buffers
Swap:  2939868k total,        0k used,  2939868k free,  1500392k cached

I may move the cron.daily to run later in the morning, so I could be there and look at the console output. Because now the screensaver prevents me from seing the output.

martinfst 11th March 2007 16:27

Quote:

3.9%wa
It's waiting on data to be retrieved from disk. I'd suspect disk problems. On my system 200Mb log files are processed within 10 minutes on a 2.8M dual core.


All times are GMT +2. The time now is 04:02.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.