Hi, I have installed the munin package on one of my Ubuntu 7.10 servers and I run munin agents on all other servers (2 x Ubuntu 7.10). These munin agents are linking through to the one server, on which I can see all servers and all domain's perfomrance graphs -well, all but one server. One server shows as node in munin, but shows no graphs. To ensure it is not a firewall or connection issue, I have installed the munin server on that physical server as well - to no avail - still no graphs. I found this in the munin wiki: Now, I tried all combinations of host_name and use_node_directive to no avail. I tested connections with telnet and got this result: Code: root@finch:/etc/bind# telnet login02.chillifire.net 4949 Trying 210.48.62.11... Connected to finch.chillifire.net. Escape character is '^]'. # munin node at login02.chillifire.net Connection closed by foreign host. root@finch:/etc/bind# telnet login03.chillifire.net 4949 Trying 210.48.62.36... Connected to login03.chillifire.net. Escape character is '^]'. Connection closed by foreign host. root@finch:/etc/bind# telnet login01.chillifire.net 4949 Trying 210.48.62.43... Connected to login01.chillifire.net. Escape character is '^]'. Connection closed by foreign host. root@finch:/etc/bind# telnet login02.chillifire.net 4949 Trying 210.48.62.11... Connected to finch.chillifire.net. Escape character is '^]'. # munin node at login02.chillifire.net Connection closed by foreign host. root@finch:/etc/bind# telnet login03.chillifire.net 4949 Trying 210.48.62.36... Connected to login03.chillifire.net. Escape character is '^]'. Connection closed by foreign host. root@finch:/etc/bind# telnet login01.chillifire.net 4949 Trying 210.48.62.43... Connected to login01.chillifire.net. Escape character is '^]'. Connection closed by foreign host. root@finch:/etc/bind# 'login02.chillifire.net' is the culprit. And see how that domain is not resolved by telnet to 'login02.chillifire.net' but to 'finch.chillifire.net'? finch is the hostname by the way. Now, when I run the same test on one of the other servers, the behaviour is expected: Code: root@blackbird:/etc# telnet login01.chillifire.net 4949 Trying 210.48.62.43... Connected to login01.chillifire.net. Escape character is '^]'. Connection closed by foreign host. root@blackbird:/etc# telnet login02.chillifire.net 4949 Trying 210.48.62.11... Connected to login02.chillifire.net. Escape character is '^]'. Connection closed by foreign host. root@blackbird:/etc# telnet login03.chillifire.net 4949 Trying 210.48.62.36... Connected to login03.chillifire.net. Escape character is '^]'. Connection closed by foreign host. So it seems quite likely that the problems I am observing come from the behaviour of the finch server to resolve 210.48.62.11 to finch.chillifre.net instead of login02.chillifire.net. So the $60000 question is this: Where do telnet (and munin) get the idea the server's domain name is finch.chillifire.net instead of login02.chillifire.net? I checked the /etc/hosts files and could see no significant differences. No DNS record for finch.chillifire.net exists, so Bind cannot be the culprit. Please help (I am beginning to be desperate) Cheers chillifire Attachement hosts file on finch (which behaves incorrectly) Code: 127.0.0.1 localhost.localdomain localhost 210.48.62.11 finch.chillifire.net finch 210.48.62.11 radius02.chillifire.net radius02 210.48.62.11 login02.chillifire.net login02 210.48.62.11 mysql02.chillifire.net mysql02 ::1 ip6-localhost ip6-loopback finch.chillifire.net fe00:0 ip6-localnet ff00::0 ip6-macastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts hosts file on blackbird (which behaves correctly) Code: 127.0.0.1 localhost.localadmin localhost 210.48.62.30 blackbird.chillifire.net blackbird ::1 ip6-localhost ip6-loopback blackbird.chillifire.net fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff03::3 ip6-allhosts
It is either the reverse dns returns that our the service you are talking to thinks that is the hostname.
what you mean? Thanks for your fats response. You say, it is either the reverse dns returns, or what ...? I did not quite understand your response. Can I ask you to elaborate just a little bit? Thanks chillifire
reverse DNS I see, thanks. The service running on 4949 is munin, more precisely munin-node on this server. There is nothing in this service's config that would lead it to believe it's hostname is 'finch.chillifire.net'. In fact, in my desperation I copied the config file from the working blackbird server (as they are all connecting back to the same reporting server the config is exchangeable between agents) - to no avail. Leaves the 'reverse DNS entries' you talked about. How would I find out about those? As I wrote, there are no DNS entries in existence for that hostname in the authoriative Bind DNS server. Code: $ dig finch.chillifire.net will give you 0 answers, as it should. Code: $ dig login02.chillifire.net will give you 2 valid DNS entries - as it should) I am not quite clear how there could be reverse DNS entries, if there are no DNS entries in the first place? Could you explain, where and what to look for these reverse DNS entries, please? Thanks, your help is appreciated chillifire
DNS maps names to ip, reverse DNS maps ip to name. I have had a similer problem before with telnet, i think it is the way it handles nsswitch. Try running it under strace you should be able to see, how it is resolving names.
reverse DNS record looks allright Hi topdog, server finch is a slave dns server to blackbird (and has been for a long time). blackbird's reverse dns record looks as attached. As you can probably guess from the last line, it is not even constructed by me but generated by ISPConfig. As you can see there is no reference to finch.chillifire.net in there. Also, if it was a reverse DNS problem coming from this set up, the command telnet login02.chillifire.net should point to finch.chillifire.net everywhere. The fact is it does so only on finch. There must be some configuration on finch somewhere, that is nowhere else. I just cannot think of what that configuration could be out side /etc/hosts and Bind DNS (which both do not seem problematic). Any other ideas? Code: $TTL 86400 @ IN SOA ns01.chillifire.net. hostmaster.chillifire.net. ( 2008022301 ; serial, todays date + todays serial # 28800 ; Refresh 7200 ; Retry 604800 ; Expire 86400) ; Minimum TTL NS ns01.chillifire.net. NS ns02.chillifire.net. 30 PTR chillifire.net. 30 PTR www.chillifire.net. 30 PTR mail.chillifire.net. 30 PTR ns01.chillifire.net. 11 PTR ns02.chillifire.net. 11 PTR radius02.chillifire.net. 36 PTR radius03.chillifire.net. 11 PTR mysql02.chillifire.net. 30 PTR mysql01.chillifire.net. 11 PTR login02.chillifire.net. 43 PTR login01.chillifire.net. 30 PTR radius01.chillifire.net. 36 PTR mysql03.chillifire.net. 30 PTR admin01.chillifire.net. 36 PTR login03.chillifire.net. 36 PTR prewikka.chillifire.net. 30 PTR onlinecellardoor.com. 30 PTR www.onlinecellardoor.com. 30 PTR mail.onlinecellardoor.com. 30 PTR chillifire.co.nz. 30 PTR www.chillifire.co.nz. 30 PTR mail.chillifire.co.nz. ;;;; MAKE MANUAL ENTRIES BELOW THIS LINE! ;;;;
The last time i had a similar problem i used strace to see what system calls telnet was making, it turned out to be nscd which had cached the wrong name, Do you have nscd running ? Running strace will help you to get to the bottom of the problem.
no nscd I don't have NSCD installed on my server. I also have no strace yet. I will give that a try to see what it tells me. You say it should tell me how IP addresses are resilve to hostnames and vice versa?
strace result I had a look at the trace and the only fishy thing I saw was a read of a file /etc/resolv.conf which poointed at my hosting provider's nameservers. i replaced those ip addresses with my own nameservers, in case they cache something on theirs that is incorrect - but it did not change anything. I then compared the strace results of the server that works correctly with the one that does not. I noticed the one that does not work correctly fell back on the loopback interface 127.0.0.1 while the other one properly tried to go for the proper domain name. That made me think the extra lines in the /etc/hosts file might confuse the system and deleted all line other than the loopback interface and the line for the server name. Lo and behold, since then telnet resilves correctly. Bad news is: Munin still does not work, although now according to configuration it should. the same effect a webpage is generated with logo and domain name, but no link to any graphs. munin-update.log shows that no data is read - regardless. What can I do?
Never found error - reinstalled server This was consuming more time than it was worth. As this was my first linux server I ever built, I wipped and reinstalled it from scratch, as I expect there is still some 'experimental' stuff on there. Of course, on a clean install it all works fine. Thnaks for the help