PDA

View Full Version : Issue resolving to correct domain name with telnet


chillifire
26th February 2008, 09:32
Hi,

I have installed the munin package on one of my Ubuntu 7.10 servers and I run munin agents on all other servers (2 x Ubuntu 7.10). These munin agents are linking through to the one server, on which I can see all servers and all domain's perfomrance graphs -well, all but one server.

One server shows as node in munin, but shows no graphs. To ensure it is not a firewall or connection issue, I have installed the munin server on that physical server as well - to no avail - still no graphs.

I found this in the munin wiki:
No graphs at all
Problem: Munin does not create any graphs. There is only an almost empty web page.
Solution: The hostname that the Munin master expects is probably different from the hostname reported by the node. Normally, the hostname in the host definition [host.example.com] in the master's munin.conf must be the same that the node reports when telnetting to port 4949. You can change the reported hostname with the host_name directive in the node's munin-node.conf file. Another solution is to add use_node_name yes to the host section in the master's munin.conf file.
Now, I tried all combinations of host_name and use_node_directive to no avail. I tested connections with telnet and got this result:

root@finch:/etc/bind# telnet login02.chillifire.net 4949
Trying 210.48.62.11...
Connected to finch.chillifire.net.
Escape character is '^]'.
# munin node at login02.chillifire.net
Connection closed by foreign host.
root@finch:/etc/bind# telnet login03.chillifire.net 4949
Trying 210.48.62.36...
Connected to login03.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
root@finch:/etc/bind# telnet login01.chillifire.net 4949
Trying 210.48.62.43...
Connected to login01.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
root@finch:/etc/bind# telnet login02.chillifire.net 4949
Trying 210.48.62.11...
Connected to finch.chillifire.net.
Escape character is '^]'.
# munin node at login02.chillifire.net
Connection closed by foreign host.
root@finch:/etc/bind# telnet login03.chillifire.net 4949
Trying 210.48.62.36...
Connected to login03.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
root@finch:/etc/bind# telnet login01.chillifire.net 4949
Trying 210.48.62.43...
Connected to login01.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
root@finch:/etc/bind#
'login02.chillifire.net' is the culprit. And see how that domain is not resolved by telnet to 'login02.chillifire.net' but to 'finch.chillifire.net'? finch is the hostname by the way. Now, when I run the same test on one of the other servers, the behaviour is expected:
root@blackbird:/etc# telnet login01.chillifire.net 4949
Trying 210.48.62.43...
Connected to login01.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
root@blackbird:/etc# telnet login02.chillifire.net 4949
Trying 210.48.62.11...
Connected to login02.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
root@blackbird:/etc# telnet login03.chillifire.net 4949
Trying 210.48.62.36...
Connected to login03.chillifire.net.
Escape character is '^]'.
Connection closed by foreign host.
So it seems quite likely that the problems I am observing come from the behaviour of the finch server to resolve 210.48.62.11 to finch.chillifre.net instead of login02.chillifire.net.

So the $60000 question is this: Where do telnet (and munin) get the idea the server's domain name is finch.chillifire.net instead of login02.chillifire.net?


I checked the /etc/hosts files and could see no significant differences. No DNS record for finch.chillifire.net exists, so Bind cannot be the culprit.

Please help (I am beginning to be desperate)

Cheers

chillifire

Attachement

hosts file on finch (which behaves incorrectly)
127.0.0.1 localhost.localdomain localhost
210.48.62.11 finch.chillifire.net finch
210.48.62.11 radius02.chillifire.net radius02
210.48.62.11 login02.chillifire.net login02
210.48.62.11 mysql02.chillifire.net mysql02

::1 ip6-localhost ip6-loopback finch.chillifire.net
fe00:0 ip6-localnet
ff00::0 ip6-macastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

hosts file on blackbird (which behaves correctly)
127.0.0.1 localhost.localadmin localhost
210.48.62.30 blackbird.chillifire.net blackbird

::1 ip6-localhost ip6-loopback blackbird.chillifire.net
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff03::3 ip6-allhosts

topdog
26th February 2008, 10:55
It is either the reverse dns returns that our the service you are talking to thinks that is the hostname.

chillifire
26th February 2008, 10:59
Thanks for your fats response.
You say, it is either the reverse dns returns, or what ...? I did not quite understand your response.
Can I ask you to elaborate just a little bit?
Thanks

chillifire

topdog
26th February 2008, 11:03
The service running on port 4949 thinks that is its hostname possibly due to a configuration option.

chillifire
26th February 2008, 11:42
I see, thanks.

The service running on 4949 is munin, more precisely munin-node on this server. There is nothing in this service's config that would lead it to believe it's hostname is 'finch.chillifire.net'. In fact, in my desperation I copied the config file from the working blackbird server (as they are all connecting back to the same reporting server the config is exchangeable between agents) - to no avail.

Leaves the 'reverse DNS entries' you talked about. How would I find out about those? As I wrote, there are no DNS entries in existence for that hostname in the authoriative Bind DNS server.
$ dig finch.chillifire.net will give you 0 answers, as it should.
$ dig login02.chillifire.net will give you 2 valid DNS entries - as it should)
I am not quite clear how there could be reverse DNS entries, if there are no DNS entries in the first place?
Could you explain, where and what to look for these reverse DNS entries, please?

Thanks, your help is appreciated

chillifire

topdog
26th February 2008, 11:48
DNS maps names to ip, reverse DNS maps ip to name. I have had a similer problem before with telnet, i think it is the way it handles nsswitch.

Try running it under strace you should be able to see, how it is resolving names.

chillifire
26th February 2008, 18:46
Hi topdog,

server finch is a slave dns server to blackbird (and has been for a long time). blackbird's reverse dns record looks as attached. As you can probably guess from the last line, it is not even constructed by me but generated by ISPConfig.
As you can see there is no reference to finch.chillifire.net in there. Also, if it was a reverse DNS problem coming from this set up, the command telnet login02.chillifire.net should point to finch.chillifire.net everywhere. The fact is it does so only on finch. There must be some configuration on finch somewhere, that is nowhere else. I just cannot think of what that configuration could be out side /etc/hosts and Bind DNS (which both do not seem problematic).

Any other ideas?

$TTL 86400
@ IN SOA ns01.chillifire.net. hostmaster.chillifire.net. (
2008022301 ; serial, todays date + todays serial #
28800 ; Refresh
7200 ; Retry
604800 ; Expire
86400) ; Minimum TTL
NS ns01.chillifire.net.
NS ns02.chillifire.net.
30 PTR chillifire.net.
30 PTR www.chillifire.net.
30 PTR mail.chillifire.net.
30 PTR ns01.chillifire.net.
11 PTR ns02.chillifire.net.
11 PTR radius02.chillifire.net.
36 PTR radius03.chillifire.net.
11 PTR mysql02.chillifire.net.
30 PTR mysql01.chillifire.net.
11 PTR login02.chillifire.net.
43 PTR login01.chillifire.net.
30 PTR radius01.chillifire.net.
36 PTR mysql03.chillifire.net.
30 PTR admin01.chillifire.net.
36 PTR login03.chillifire.net.
36 PTR prewikka.chillifire.net.
30 PTR onlinecellardoor.com.
30 PTR www.onlinecellardoor.com.
30 PTR mail.onlinecellardoor.com.
30 PTR chillifire.co.nz.
30 PTR www.chillifire.co.nz.
30 PTR mail.chillifire.co.nz.

;;;; MAKE MANUAL ENTRIES BELOW THIS LINE! ;;;;

topdog
26th February 2008, 18:59
The last time i had a similar problem i used strace to see what system calls telnet was making, it turned out to be nscd which had cached the wrong name,

Do you have nscd running ?

Running strace will help you to get to the bottom of the problem.

chillifire
26th February 2008, 20:36
I don't have NSCD installed on my server. I also have no strace yet. I will give that a try to see what it tells me. You say it should tell me how IP addresses are resilve to hostnames and vice versa?

topdog
26th February 2008, 21:14
it will show you the system calls being made by the telnet command.

chillifire
27th February 2008, 02:18
I had a look at the trace and the only fishy thing I saw was a read of a file /etc/resolv.conf which poointed at my hosting provider's nameservers. i replaced those ip addresses with my own nameservers, in case they cache something on theirs that is incorrect - but it did not change anything.

I then compared the strace results of the server that works correctly with the one that does not. I noticed the one that does not work correctly fell back on the loopback interface 127.0.0.1 while the other one properly tried to go for the proper domain name. That made me think the extra lines in the /etc/hosts file might confuse the system and deleted all line other than the loopback interface and the line for the server name. Lo and behold, since then telnet resilves correctly.

Bad news is: Munin still does not work, although now according to configuration it should. the same effect a webpage is generated with logo and domain name, but no link to any graphs. munin-update.log shows that no data is read - regardless.

What can I do?

falko
27th February 2008, 13:14
Bad news is: Munin still does not work, although now according to configuration it should. the same effect a webpage is generated with logo and domain name, but no link to any graphs. munin-update.log shows that no data is read - regardless.

What can I do?
Any errors in your logs? What's in /etc/munin/munin.conf and /etc/munin/munin-node.conf?

chillifire
1st March 2008, 01:05
This was consuming more time than it was worth. As this was my first linux server I ever built, I wipped and reinstalled it from scratch, as I expect there is still some 'experimental' stuff on there. Of course, on a clean install it all works fine.
Thnaks for the help