Slave DNS server hangs during restart (Bind)

Discussion in 'Server Operation' started by mangia, Mar 29, 2009.

  mangia

    mangia


    I have two DNS servers in master - slave configuration

    When I restart master, everything is OK but slave server sometimes hangs.


    Mar  7 01:55:27 ns2 named[1712]: shutting down: flushing changes
    Mar  7 01:55:27 ns2 named[1712]: stopping command channel on
    Mar  7 01:55:27 ns2 named[1712]: stopping command channel on ::1#953
    Mar  7 01:55:27 ns2 named[1712]: no longer listening on
    Mar  7 01:55:27 ns2 named[1712]: no longer listening on
    After that, bind doesn't work and I can see that it is still running and it does not respond to queries....

    [root@ns2 ~]# ps ax|grep named
     1712 ?        Ssl    0:02 /usr/sbin/named -u named -t /var/named/chroot
     2488 pts/0    R+     0:00 grep named
    [root@ns2 ~]#
    Logs from master server are similar but after I stop bind I have a line

    Mar  6 23:58:39 ns1 named[10781]: exiting
    As you can see above, this line doesn't exist in slave server log

    Commands pkill named, service named restart, service named stop,... doesn't help..... Only option is to force server restart... this is not a solution because server needs 5-10 min. to boot up again...

    Centos, BIND is chrooted, v 9.3.4

    Any ideas ?
  falko

    falko

    What's the output of
    netstat -tap
    after you've tried to restart BIND? I guess there's still a BIND process running.
  mangia

    mangia

    Hi... Sorry for digging out but the problem is still here :(

    output of netstat -tap doesn't show named (or bind) as running... All I can see is http, mysql and few other services ...
    Last edited: Dec 23, 2009
  falko

    falko

    Can you post the full output?
  mangia

    mangia

    The only difference when BIND works and in the moment it doesn't work is the next few lines visible in the moment after reboot when everything works fine

    tcp 0 0 localhost6.localdomain:rndc *:* LISTEN 2129/named
    tcp 0 0 localhost.localdomain:rndc *:* LISTEN 2129/named
    tcp 0 0 localhost.localdomai:domain *:* LISTEN 2129/named
    tcp 0 0 *:* LISTEN 2129/named
    tcp 0 0 ns2.DOMAIN.COM:domain *:* LISTEN 2129/named
  falko

    falko

  mangia

    mangia


    Here is the difference when named works

    and when it doesn't work, I don't see the lines posted above...
  falko

    falko

    netstat -tap | grep domain
    give you any output when BIND isn't working?
  mangia

    mangia

    Tried that but "domain" or "named" does not exist any more in list...
  falko

    falko

    And there are no errors in your logs?
  mangia

    mangia


    Please see my first post and difference between working primary and not working sec. DNS

