Issue resolving to correct domain name with telnet

Discussion in 'Installation/Configuration' started by chillifire, Feb 26, 2008.

  1. chillifire

    chillifire New Member

    Hi,

    I have installed the munin package on one of my Ubuntu 7.10 servers and I run munin agents on all other servers (2 x Ubuntu 7.10). These munin agents are linking through to the one server, on which I can see all servers and all domain's perfomrance graphs -well, all but one server.

    One server shows as node in munin, but shows no graphs. To ensure it is not a firewall or connection issue, I have installed the munin server on that physical server as well - to no avail - still no graphs.

    I found this in the munin wiki:
    Now, I tried all combinations of host_name and use_node_directive to no avail. I tested connections with telnet and got this result:
    Code:
    root@finch:/etc/bind# telnet login02.chillifire.net 4949
    Trying 210.48.62.11...
    Connected to finch.chillifire.net.
    Escape character is '^]'.
    # munin node at login02.chillifire.net
    Connection closed by foreign host.
    root@finch:/etc/bind# telnet login03.chillifire.net 4949
    Trying 210.48.62.36...
    Connected to login03.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    root@finch:/etc/bind# telnet login01.chillifire.net 4949
    Trying 210.48.62.43...
    Connected to login01.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    root@finch:/etc/bind# telnet login02.chillifire.net 4949
    Trying 210.48.62.11...
    Connected to finch.chillifire.net.
    Escape character is '^]'.
    # munin node at login02.chillifire.net
    Connection closed by foreign host.
    root@finch:/etc/bind# telnet login03.chillifire.net 4949
    Trying 210.48.62.36...
    Connected to login03.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    root@finch:/etc/bind# telnet login01.chillifire.net 4949
    Trying 210.48.62.43...
    Connected to login01.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    root@finch:/etc/bind#
    'login02.chillifire.net' is the culprit. And see how that domain is not resolved by telnet to 'login02.chillifire.net' but to 'finch.chillifire.net'? finch is the hostname by the way. Now, when I run the same test on one of the other servers, the behaviour is expected:
    Code:
    root@blackbird:/etc# telnet login01.chillifire.net 4949
    Trying 210.48.62.43...
    Connected to login01.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    root@blackbird:/etc# telnet login02.chillifire.net 4949
    Trying 210.48.62.11...
    Connected to login02.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    root@blackbird:/etc# telnet login03.chillifire.net 4949
    Trying 210.48.62.36...
    Connected to login03.chillifire.net.
    Escape character is '^]'.
    Connection closed by foreign host.
    So it seems quite likely that the problems I am observing come from the behaviour of the finch server to resolve 210.48.62.11 to finch.chillifre.net instead of login02.chillifire.net.

    So the $60000 question is this: Where do telnet (and munin) get the idea the server's domain name is finch.chillifire.net instead of login02.chillifire.net?


    I checked the /etc/hosts files and could see no significant differences. No DNS record for finch.chillifire.net exists, so Bind cannot be the culprit.

    Please help (I am beginning to be desperate)

    Cheers

    chillifire

    Attachement

    hosts file on finch (which behaves incorrectly)
    Code:
    127.0.0.1       localhost.localdomain   localhost
    210.48.62.11    finch.chillifire.net    finch
    210.48.62.11    radius02.chillifire.net radius02
    210.48.62.11    login02.chillifire.net  login02
    210.48.62.11    mysql02.chillifire.net  mysql02
    
    ::1     ip6-localhost   ip6-loopback finch.chillifire.net
    fe00:0  ip6-localnet
    ff00::0 ip6-macastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts
    hosts file on blackbird (which behaves correctly)
    Code:
    127.0.0.1       localhost.localadmin            localhost
    210.48.62.30    blackbird.chillifire.net        blackbird
    
    ::1     ip6-localhost   ip6-loopback blackbird.chillifire.net
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff03::3 ip6-allhosts
     
  2. topdog

    topdog HowtoForge Supporter

    It is either the reverse dns returns that our the service you are talking to thinks that is the hostname.
     
  3. chillifire

    chillifire New Member

    what you mean?

    Thanks for your fats response.
    You say, it is either the reverse dns returns, or what ...? I did not quite understand your response.
    Can I ask you to elaborate just a little bit?
    Thanks

    chillifire
     
  4. topdog

    topdog HowtoForge Supporter

    The service running on port 4949 thinks that is its hostname possibly due to a configuration option.
     
  5. chillifire

    chillifire New Member

    reverse DNS

    I see, thanks.

    The service running on 4949 is munin, more precisely munin-node on this server. There is nothing in this service's config that would lead it to believe it's hostname is 'finch.chillifire.net'. In fact, in my desperation I copied the config file from the working blackbird server (as they are all connecting back to the same reporting server the config is exchangeable between agents) - to no avail.

    Leaves the 'reverse DNS entries' you talked about. How would I find out about those? As I wrote, there are no DNS entries in existence for that hostname in the authoriative Bind DNS server.
    Code:
    $  dig finch.chillifire.net
    will give you 0 answers, as it should.
    Code:
    $  dig login02.chillifire.net
    will give you 2 valid DNS entries - as it should)
    I am not quite clear how there could be reverse DNS entries, if there are no DNS entries in the first place?
    Could you explain, where and what to look for these reverse DNS entries, please?

    Thanks, your help is appreciated

    chillifire
     
  6. topdog

    topdog HowtoForge Supporter

    DNS maps names to ip, reverse DNS maps ip to name. I have had a similer problem before with telnet, i think it is the way it handles nsswitch.

    Try running it under strace you should be able to see, how it is resolving names.
     
  7. chillifire

    chillifire New Member

    reverse DNS record looks allright

    Hi topdog,

    server finch is a slave dns server to blackbird (and has been for a long time). blackbird's reverse dns record looks as attached. As you can probably guess from the last line, it is not even constructed by me but generated by ISPConfig.
    As you can see there is no reference to finch.chillifire.net in there. Also, if it was a reverse DNS problem coming from this set up, the command telnet login02.chillifire.net should point to finch.chillifire.net everywhere. The fact is it does so only on finch. There must be some configuration on finch somewhere, that is nowhere else. I just cannot think of what that configuration could be out side /etc/hosts and Bind DNS (which both do not seem problematic).

    Any other ideas?

    Code:
    $TTL        86400
    @               IN      SOA     ns01.chillifire.net. hostmaster.chillifire.net. (
                                    2008022301       ; serial, todays date + todays serial #
                                    28800   ; Refresh
                                    7200    ; Retry
                                    604800  ; Expire
                                    86400)  ; Minimum TTL
                            NS      ns01.chillifire.net.
                            NS      ns02.chillifire.net.
    30       PTR     chillifire.net.
    30       PTR     www.chillifire.net.
    30       PTR     mail.chillifire.net.
    30       PTR     ns01.chillifire.net.
    11       PTR     ns02.chillifire.net.
    11       PTR     radius02.chillifire.net.
    36       PTR     radius03.chillifire.net.
    11       PTR     mysql02.chillifire.net.
    30       PTR     mysql01.chillifire.net.
    11       PTR     login02.chillifire.net.
    43       PTR     login01.chillifire.net.
    30       PTR     radius01.chillifire.net.
    36       PTR     mysql03.chillifire.net.
    30       PTR     admin01.chillifire.net.
    36       PTR     login03.chillifire.net.
    36       PTR     prewikka.chillifire.net.
    30       PTR     onlinecellardoor.com.
    30       PTR     www.onlinecellardoor.com.
    30       PTR     mail.onlinecellardoor.com.
    30       PTR     chillifire.co.nz.
    30       PTR     www.chillifire.co.nz.
    30       PTR     mail.chillifire.co.nz.
    
    ;;;; MAKE MANUAL ENTRIES BELOW THIS LINE! ;;;;
     
  8. topdog

    topdog HowtoForge Supporter

    The last time i had a similar problem i used strace to see what system calls telnet was making, it turned out to be nscd which had cached the wrong name,

    Do you have nscd running ?

    Running strace will help you to get to the bottom of the problem.
     
  9. chillifire

    chillifire New Member

    no nscd

    I don't have NSCD installed on my server. I also have no strace yet. I will give that a try to see what it tells me. You say it should tell me how IP addresses are resilve to hostnames and vice versa?
     
  10. topdog

    topdog HowtoForge Supporter

    it will show you the system calls being made by the telnet command.
     
  11. chillifire

    chillifire New Member

    strace result

    I had a look at the trace and the only fishy thing I saw was a read of a file /etc/resolv.conf which poointed at my hosting provider's nameservers. i replaced those ip addresses with my own nameservers, in case they cache something on theirs that is incorrect - but it did not change anything.

    I then compared the strace results of the server that works correctly with the one that does not. I noticed the one that does not work correctly fell back on the loopback interface 127.0.0.1 while the other one properly tried to go for the proper domain name. That made me think the extra lines in the /etc/hosts file might confuse the system and deleted all line other than the loopback interface and the line for the server name. Lo and behold, since then telnet resilves correctly.

    Bad news is: Munin still does not work, although now according to configuration it should. the same effect a webpage is generated with logo and domain name, but no link to any graphs. munin-update.log shows that no data is read - regardless.

    What can I do?
     
  12. falko

    falko Super Moderator

    Any errors in your logs? What's in /etc/munin/munin.conf and /etc/munin/munin-node.conf?
     
  13. chillifire

    chillifire New Member

    Never found error - reinstalled server

    This was consuming more time than it was worth. As this was my first linux server I ever built, I wipped and reinstalled it from scratch, as I expect there is still some 'experimental' stuff on there. Of course, on a clean install it all works fine.
    Thnaks for the help
     

Share This Page