I stopped everything in cron.daily, hourly, weekly and monthly and everything seemd to work ok. but this morning server 2 died. and then i looked on server 1 and found out that motd-update and vnstat vas in cron.d folder and stopped it on the server 1 and it has not died yet.
So i started digging in the logs on server1 for information from the resent crash and found out these things. and after these things the logs dosent say anything untill manually reboot. it would be nice if someone had the time to look at the logs and se if you can se why it dies
Kernel log:
Quote:
Jun 11 06:26:40 wendecoserver1 kernel: [298628.412913] tg3: peth1: Link is down.
Jun 11 06:26:40 wendecoserver1 kernel: [298628.425390] eth1: port 1(peth1) entering disabled state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970014] tg3: peth1: Link is up at 100 Mbps, full duplex.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970020] tg3: peth1: Flow control is off for TX and off for RX.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971316] eth1: port 1(peth1) entering learning state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971870] eth1: topology change detected, propagating
|
Messages log:
Quote:
Jun 11 06:26:40 wendecoserver1 kernel: [298628.412913] tg3: peth1: Link is down.
Jun 11 06:26:40 wendecoserver1 kernel: [298628.425390] eth1: port 1(peth1) entering disabled state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970014] tg3: peth1: Link is up at 100 Mbps, full duplex.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970020] tg3: peth1: Flow control is off for TX and off for RX.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971316] eth1: port 1(peth1) entering learning state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971870] eth1: topology change detected, propagating
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971872] eth1: port 1(peth1) entering forwarding state
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: node wendecoserver2: is dead
Jun 11 06:27:29 wendecoserver1 ipfail: [6196]: info: Status update: Node wendecoserver2 now has status dead
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: No STONITH device configured.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: Shared disks are not protected.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Resources being acquired from wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Link wendecoserver2:eth0 dead.
Jun 11 06:27:29 wendecoserver1 harc[23566]: info: Running /etc/ha.d/rc.d/status status
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: mach_down takeover complete for node wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: mach_down takeover complete.
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: NS: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Link Status update: Link wendecoserver2/eth0 now has status dead
Jun 11 06:27:30 wendecoserver1 heartbeat: [23567]: info: Local Resource acquisition completed.
Jun 11 06:27:30 wendecoserver1 harc[23651]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jun 11 06:27:30 wendecoserver1 ip-request-resp[23651]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Acquiring resource group: wendecoserver1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::91.142.186.70/24/eth0/91.142.186.71
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Asking other side for ping node count.
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jun 11 06:27:31 wendecoserver1 IPaddr2[23763]: INFO: Running OK
Jun 11 06:44:24 wendecoserver1 -- MARK --
|
Syslog:
Quote:
Jun 11 06:25:01 wendecoserver1 /USR/SBIN/CRON[23508]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:25:01 wendecoserver1 /USR/SBIN/CRON[23509]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Jun 11 06:26:40 wendecoserver1 kernel: [298628.412913] tg3: peth1: Link is down.
Jun 11 06:26:40 wendecoserver1 kernel: [298628.425390] eth1: port 1(peth1) entering disabled state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970014] tg3: peth1: Link is up at 100 Mbps, full duplex.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.970020] tg3: peth1: Flow control is off for TX and off for RX.
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971316] eth1: port 1(peth1) entering learning state
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971870] eth1: topology change detected, propagating
Jun 11 06:26:42 wendecoserver1 kernel: [298629.971872] eth1: port 1(peth1) entering forwarding state
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: node wendecoserver2: is dead
Jun 11 06:27:29 wendecoserver1 ipfail: [6196]: info: Status update: Node wendecoserver2 now has status dead
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: No STONITH device configured.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: WARN: Shared disks are not protected.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Resources being acquired from wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: Link wendecoserver2:eth0 dead.
Jun 11 06:27:29 wendecoserver1 heartbeat: [23566]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 11 06:27:29 wendecoserver1 harc[23566]: info: Running /etc/ha.d/rc.d/status status
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 11 06:27:29 wendecoserver1 mach_down[23591]: info: mach_down takeover complete for node wendecoserver2.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: info: mach_down takeover complete.
Jun 11 06:27:29 wendecoserver1 heartbeat: [6037]: debug: StartNextRemoteRscReq(): child count 1
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: NS: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Link Status update: Link wendecoserver2/eth0 now has status dead
Jun 11 06:27:30 wendecoserver1 heartbeat: [23567]: info: Local Resource acquisition completed.
Jun 11 06:27:30 wendecoserver1 heartbeat: [6037]: debug: StartNextRemoteRscReq(): child count 1
Jun 11 06:27:30 wendecoserver1 heartbeat: [23651]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 11 06:27:30 wendecoserver1 harc[23651]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Jun 11 06:27:30 wendecoserver1 ip-request-resp[23651]: received ip-request-resp ldirectord::ldirectord.cf OK yes
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Acquiring resource group: wendecoserver1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::91.142.186.70/24/eth0/91.142.186.71
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: We are dead. :<
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: info: Asking other side for ping node count.
Jun 11 06:27:30 wendecoserver1 ipfail: [6196]: debug: Message [num_ping] sent.
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jun 11 06:27:30 wendecoserver1 ResourceManager[23670]: debug: Starting /etc/ha.d/resource.d/ldirectord ldirectord.cf start
Jun 11 06:27:31 wendecoserver1 ResourceManager[23670]: debug: /etc/ha.d/resource.d/ldirectord ldirectord.cf start done. RC=0
Jun 11 06:27:31 wendecoserver1 IPaddr2[23763]: INFO: Running OK
Jun 11 06:30:02 wendecoserver1 /USR/SBIN/CRON[23838]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:30:02 wendecoserver1 /USR/SBIN/CRON[23837]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Jun 11 06:35:01 wendecoserver1 /USR/SBIN/CRON[23922]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:40:02 wendecoserver1 /USR/SBIN/CRON[23945]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Jun 11 06:40:02 wendecoserver1 /USR/SBIN/CRON[23946]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:43:53 wendecoserver1 crontab[24024]: (root) BEGIN EDIT (root)
Jun 11 06:45:01 wendecoserver1 /USR/SBIN/CRON[24035]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:46:12 wendecoserver1 crontab[24024]: (root) END EDIT (root)
Jun 11 06:50:01 wendecoserver1 /USR/SBIN/CRON[24058]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi)
Jun 11 06:50:01 wendecoserver1 /USR/SBIN/CRON[24059]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
|
Recent comments
14 hours 15 min ago
14 hours 21 min ago
19 hours 19 min ago
1 day 2 hours ago
1 day 2 hours ago
1 day 4 hours ago
1 day 8 hours ago
1 day 15 hours ago
1 day 18 hours ago
1 day 20 hours ago