Go Back   HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials > Linux Forums > Technical

Do you like HowtoForge? Please consider supporting us by becoming a subscriber.
Reply
 
Thread Tools Display Modes
  #11  
Old 22nd June 2010, 22:20
Torsson Torsson is offline
Member
 
Join Date: Mar 2006
Posts: 62
Thanks: 0
Thanked 3 Times in 3 Posts
Default

falko.

i think you have right, it haves to be some crontab that is messed up

when i gave it a hard think i realized that i have tryed to move everything to the servers 2 time

Both times night between Thursday and friday
And both servers died 06:00-07:00

And this has happend twice, so it needs to be cron.

ill let you know when i have tryed that!
Reply With Quote
Sponsored Links
  #12  
Old 26th July 2010, 14:34
Torsson Torsson is offline
Member
 
Join Date: Mar 2006
Posts: 62
Thanks: 0
Thanked 3 Times in 3 Posts
Default

i have now stopped cron.daily and have created a script that creates 40 gb files of different size, and then i tar everything and untar it, then it starts over, so now i just need to wait and se what happends
Reply With Quote
  #13  
Old 28th July 2010, 14:00
Torsson Torsson is offline
Member
 
Join Date: Mar 2006
Posts: 62
Thanks: 0
Thanked 3 Times in 3 Posts
Default

I stopped everything in cron.daily, hourly, weekly and monthly and everything seemd to work ok. but this morning server 2 died. and then i looked on server 1 and found out that motd-update and vnstat vas in cron.d folder and stopped it on the server 1 and it has not died yet.

So i started digging in the logs on server1 for information from the resent crash and found out these things. and after these things the logs dosent say anything untill manually reboot. it would be nice if someone had the time to look at the logs and se if you can se why it dies

Kernel log:
Quote:
Jun 11 06:26:40 wendecoserver1 kernel: [298628.412913] tg3: peth1: Link is down.
Jun 11 06:26:40 wendecoserver1 kernel: [298628.425390] eth1: port 1(peth1) entering disabled state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970014] tg3: peth1: Link is up at 100 Mbps, full duplex.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970020] tg3: peth1: Flow control is off for TX and off for RX.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971316] eth1: port 1(peth1) entering learning state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971870] eth1: topology change detected, propagating
Messages log:
Quote:
Jun 11 06:26:40 wendecoserver1 kernel: [298628.412913] tg3: peth1: Link is down.
Jun 11 06:26:40 wendecoserver1 kernel: [298628.425390] eth1: port 1(peth1) entering disabled state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970014] tg3: peth1: Link is up at 100 Mbps, full duplex.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970020] tg3: peth1: Flow control is off for TX and off for RX.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971316] eth1: port 1(peth1) entering learning state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971870] eth1: topology change detected, propagating
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971872] eth1: port 1(peth1) entering forwarding state
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: node wendecoserver2: is dead
Jun 11 06:27:29 wendecoserver1 ipfail: [6196]: info: Status update: Node wendecoserver2 now has status dead
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: No STONITH device configured.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: Shared disks are not protected.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Resources being acquired from wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Link wendecoserver2:eth0 dead.
Jun 11 06:27:29 wendecoserver1 harc[23566]: info: Running /etc/ha.d/rc.d/status status
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: mach_down takeover complete for node wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: mach_down takeover complete.
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: NS: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Link Status update: Link wendecoserver2/eth0 now has status dead
Jun 11 06:27:30 wendecoserver1 heartbeat: [23567]: info: Local Resource acquisition completed.
Jun 11 06:27:30 wendecoserver1 harc[23651]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jun 11 06:27:30 wendecoserver1 ip-request-resp[23651]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Acquiring resource group: wendecoserver1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::91.142.186.70/24/eth0/91.142.186.71
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Asking other side for ping node count.
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jun 11 06:27:31 wendecoserver1 IPaddr2[23763]: INFO: Running OK
Jun 11 06:44:24 wendecoserver1 -- MARK --
Syslog:
Quote:
Jun 11 06:25:01 wendecoserver1 /USR/SBIN/CRON[23508]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:25:01 wendecoserver1 /USR/SBIN/CRON[23509]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Jun 11 06:26:40 wendecoserver1 kernel: [298628.412913] tg3: peth1: Link is down.
Jun 11 06:26:40 wendecoserver1 kernel: [298628.425390] eth1: port 1(peth1) entering disabled state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970014] tg3: peth1: Link is up at 100 Mbps, full duplex.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970020] tg3: peth1: Flow control is off for TX and off for RX.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971316] eth1: port 1(peth1) entering learning state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971870] eth1: topology change detected, propagating
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971872] eth1: port 1(peth1) entering forwarding state
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: node wendecoserver2: is dead
Jun 11 06:27:29 wendecoserver1 ipfail: [6196]: info: Status update: Node wendecoserver2 now has status dead
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: No STONITH device configured.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: Shared disks are not protected.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Resources being acquired from wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Link wendecoserver2:eth0 dead.
Jun 11 06:27:29 wendecoserver1 heartbeat: [23566]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 11 06:27:29 wendecoserver1 harc[23566]: info: Running /etc/ha.d/rc.d/status status
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: mach_down takeover complete for node wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: mach_down takeover complete.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: debug: StartNextRemoteRscReq(): child count 1
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: NS: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Link Status update: Link wendecoserver2/eth0 now has status dead
Jun 11 06:27:30 wendecoserver1 heartbeat: [23567]: info: Local Resource acquisition completed.
Jun 11 06:27:30 wendecoserver1 heartbeat: [6037]: debug: StartNextRemoteRscReq(): child count 1
Jun 11 06:27:30 wendecoserver1 heartbeat: [23651]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 11 06:27:30 wendecoserver1 harc[23651]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jun 11 06:27:30 wendecoserver1 ip-request-resp[23651]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Acquiring resource group: wendecoserver1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::91.142.186.70/24/eth0/91.142.186.71
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Asking other side for ping node count.
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: debug: Message [num_ping] sent.
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: debug: Starting /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jun 11 06:27:31 wendecoserver1 ResourceManager[23670]: debug: /etc/ha.d/resource.d/ldirectord ldirectord.cf start done. RC=0
Jun 11 06:27:31 wendecoserver1 IPaddr2[23763]: INFO: Running OK
Jun 11 06:30:02 wendecoserver1 /USR/SBIN/CRON[23838]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:30:02 wendecoserver1 /USR/SBIN/CRON[23837]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Jun 11 06:35:01 wendecoserver1 /USR/SBIN/CRON[23922]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:40:02 wendecoserver1 /USR/SBIN/CRON[23945]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Jun 11 06:40:02 wendecoserver1 /USR/SBIN/CRON[23946]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:43:53 wendecoserver1 crontab[24024]: (root) BEGIN EDIT (root)
Jun 11 06:45:01 wendecoserver1 /USR/SBIN/CRON[24035]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:46:12 wendecoserver1 crontab[24024]: (root) END EDIT (root)
Jun 11 06:50:01 wendecoserver1 /USR/SBIN/CRON[24058]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:50:01 wendecoserver1 /USR/SBIN/CRON[24059]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Reply With Quote
  #14  
Old 28th July 2010, 14:23
Torsson Torsson is offline
Member
 
Join Date: Mar 2006
Posts: 62
Thanks: 0
Thanked 3 Times in 3 Posts
Default

And now it seems that server 1 is dead, when i try to ssh the xen virt server on server 1 it asks for the password but when type the password nothing happends and trying to ssh server 1 it dosent do anything, not timeout or something like that.
Reply With Quote
  #15  
Old 29th July 2010, 17:46
Torsson Torsson is offline
Member
 
Join Date: Mar 2006
Posts: 62
Thanks: 0
Thanked 3 Times in 3 Posts
 
Default

I have been looking around on Xen lists and i have noticed other pepole with similar problems. and it seems to be a bug in the xen kernel on DELL R200-R300 machines. so i think i need to fix that or upgrade to 4.0 or something
Reply With Quote
The Following User Says Thank You to Torsson For This Useful Post:
falko (30th July 2010)
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Management/system config/settings & /server/settings not working!! dactor Installation/Configuration 9 6th February 2008 09:11
Problems to receive mail from external servers ideafix Installation/Configuration 5 8th January 2008 08:44
Unable send receive emails vassilis3 Installation/Configuration 15 19th May 2007 14:34
No SPF record. beryl Installation/Configuration 6 17th May 2007 19:52
Empty Recycle Bin jon335 General 40 6th May 2006 11:56


All times are GMT +2. The time now is 18:22.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.