Local DNS?

Discussion in 'ISPConfig 3 Priority Support' started by BobGeorge, Sep 30, 2017.

  1. BobGeorge

    BobGeorge Member

    I'm playing around with ISPConfig and DNS (Bind9) on my testing machine - an Ubuntu laptop - before I try to make anything work in production.

    I've created a DNS zone in ISPConfig - just something random for testing, so it's for "fish.com" - then I've gone to the command line and tried "nslookup fish.com", and it's giving me the IP address of the "fish.com" website that does actually exist on the Internet (though it is saying that the name server is 127.0.1.1, so it is passing through my local DNS server).

    But I'd set up the dhcp config file for this laptop to have "prepend 127.0.1.1" in it, making the localhost DNS server the primary DNS and then in Bind's configuration, I've added the ISP's name servers as forwarders.

    What I want and was expecting is that the "nslookup" would query the localhost DNS server at 127.0.1.1 and ask it for "fish.com". As I've created a DNS zone in ISPConfig for it, then I wanted and expected this to be returned. Only if the localhost DNS cannot answer the query should it then be forwarded onto my ISP's name servers (as I would still like to retain general Internet access on the laptop and, even in production, I'd still want to be able to access update servers and the like, of course).

    I did also try "supersede 127.0.1.1" in the "dhclient.conf", so that only the local DNS server would be queried (but it would forward on any queries it can't answer to my ISP). But there's no response at all when I do this. It just times out.

    How would I set this up, on my Ubuntu laptop, so that my own DNS server will be authoritative about its own DNS zones - both locally and if it was queried externally - and then forwards onto the ISP name servers for everything else?

    So that a query for "fish.com" would actually be answered by my own local DNS zone, even though there is a "fish.com" on the Internet. This is both for testing purposes - to ensure that my DNS server is providing the answers and not some other DNS server in the chain - but it'd also be useful to retain this ability anyway for the purposes of creating "fake" DNS zones, websites and emails locally to get things up and running, before actually pointing a real registered domain name at it all and making it public.
     
  2. HSorgYves

    HSorgYves Active Member HowtoForge Supporter

    Is your DNS server running? What is the output of
    Code:
    ps -Af | grep named
     
  3. BobGeorge

    BobGeorge Member

    It's running. "/usr/sbin/named -f -u bind" is coming up in the process list.

    But your response made me check "service bind9 status" and I noticed that, once running, it was getting errors with DNSSEC on my ISP's name servers (and a few of the root servers). Insecure response. Because they apparently don't run DNSSEC.

    So I've changed it to "dnssec-enable no;" and "dnssec-validation no;" - I'll turn it off for now - and removed any DNSSEC signing from my local zones, then restarted again and I don't see any errors in "journalctl -xe" for "named" now ("all zones loaded. running" and I see my local zone files being loaded up in the lines just before this).

    But it's still not doing what's needed. An "nslookup" or "dig" for "fish.com" is coming back with the Internet "fish.com", not the localhost "fish.com". I've also tried it with a domain name that I do own and that's running on our servers, and that's coming back as the server IP address and not the localhost version. And I did the other test of "supersede 127.0.1.1" so that it's the only DNS server and then, again, it's not responding at all.

    I guess when I have "prepend 127.0.1.1" in "dhclient.conf" then this means my ISP's name servers are being made secondary and tertiary name servers for the laptop and, when my local DNS server fails to respond, it's just going to those. Hence the results here.

    But why is BIND not responding with authority on my local DNS zones? Indeed, not responding at all, it seems, even though "named" is certainly running, according to both "ps" and the "journalctl -xe" log.
     
  4. HSorgYves

    HSorgYves Active Member HowtoForge Supporter

    Try
    Code:
    dig @localhost fish.com
     
  5. BobGeorge

    BobGeorge Member

    Okay, that works. I get back the correct authoritative A record.

    That's what I want it to do all the time.

    So it is running and it is capable of answering correctly, but it's just not doing this by default (because "dig fish.com" is still returning the Internet answer and not the localhost answer, unless I specifically tell it "@localhost").
     
  6. HSorgYves

    HSorgYves Active Member HowtoForge Supporter

    I know Debian but not Ubuntu... let's try the following:
    Code:
    cat /etc/resolv.conf 
     
  7. BobGeorge

    BobGeorge Member

    Code:
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 127.0.1.1
    search laptop.com
    (Where "laptop.com" is just the hostname of the laptop and doesn't equate to any real domain.)

    Yeah, on Ubuntu, it auto-generates "resolv.conf". So, as the comment warns, you don't hand-edit it because it'll just get overwritten by the next auto-generation of the file.

    (That's why I'm making my changes to "/etc/dhcp/dhclient.conf" because that's where you can configure the DHCP options. "prepend domain-name-servers 127.0.1.1" sticks 127.0.1.1 at the top of the list as the primary DNS and "supersede domain-name-servers 127.0.1.1" overwrites the NS list so that there's only the 127.0.1.1 name server.)
     
  8. HSorgYves

    HSorgYves Active Member HowtoForge Supporter

    Damn I was blind. Localhost is 127.0.0.1 and not 127.0.1.1... try with 127.0.0.1 there!
     
  9. BobGeorge

    BobGeorge Member

    Right, I change "resolv.conf" to "127.0.0.1" - even though it does get overwritten on a reboot, it'll remain good for a quick test - and, yeah, it worked. I guess I got confused because I saw "127.0.1.1" in responses and thought that maybe the DNS server uses a different IP address to the usual "127.0.0.1" or something. Obviously not.

    As "resolv.conf" is overwritten, I've made the change to the network configuration, so it ought to be permanent. And the other test - making localhost the only DNS server - also works. Well, if you see this then it's working, as I'm using these settings.

    And I'll just do a quick reboot after this to check that it really has been set permanently.

    Thank you. As usual, it often boils down to some simple dumb mistake, eh?
     
  10. BobGeorge

    BobGeorge Member

    Okay, after my testing with a laptop on the weekend, I'm setting up the DNS servers on the cluster at work.

    "ns1" is working fine. I can create a DNS zone in ISPConfig then do a "dig @localhost" for it and I get back the correct records. And I've got Google's DNS servers set up as forwarders (because my ISP-provided DNS does not support DNSSEC, apparently), so for that which my DNS server is not authoritative for, Google will handle it. Which it does.

    The problem is "ns2". In ISPConfig, I've set up "ns2" as a mirror of "ns1" - is that correct? - and when I look in the "/etc/bind" directory, I do see the zone files (and DNSSEC keys) for the ISPConfig-created DNS zones, just like on "ns1". But when I "dig @localhost" on it, I'm getting back records from the Internet (I'm using the same testing trick as with the laptop of creating a DNS zone that does actually exist on the Internet and I can tell if my DNS server is working or not by whether I see the local or Internet records being returned).

    (And now you can see why I wanted the DNS servers to work both locally and externally, as the cluster itself can use the DNS server internally for locating resources, as well as for the DNS server to answer the outside world, in the usual manner.)

    With "ns2", I want to be entirely sure it's working - via "dig @localhost" - before I make it the actual DNS server for the node, as "ns2" is also "lb1" - that is, load balancer #1 - and it needs to stay up and running.

    (I've done a little criss-cross configuration with the load balancing and DNS, you see. "lb1" is also "ns2" and "lb2" is also "ns1". So when both nodes are up and running, one node is load balancing and the other is serving DNS. But they are each other's backup server too, so should one or the other go down, then either of them can take on both the load balancing and DNS duties.

    Basically, in an active / passive load balancing configuration, the passive node really isn't doing anything but waiting for the active node to fail, so by criss-crossing the duties like this, I've given it the primary DNS duties to do. Which should keep it busy.)

    I guess what I'm wondering is how does one correctly set up a secondary name server with respect to ISPConfig? Is it correct to just make it a mirror of the primary DNS? Does one have to set up a special configuration between "ns1" and "ns2" to make it all function correctly?
     
  11. till

    till Super Moderator Staff Member ISPConfig Developer

    yes.

    No. When you set ns2 to be a mirror of ns1 in ISPConfig, then it will do exactly the same thing that ns1 is doing to write the dns config files. Basically it polls all changes from master "where server_id = ID of ns2 or server_id = ID of ns1".

    The dig @localhost test should work on ns2 as well. Check the named.conf file if it includes the named.conf.local file and then check named.conf.local if it contains the zones.
     
  12. BobGeorge

    BobGeorge Member

    Okay, I've found a difference between them.

    On "ns1", the file "named.conf.local" has the entry that points to the DNS zone file. But, with "ns2", then "named.conf.local" is an empty file.

    So I can see why it's not working. Though ISPConfig has copied the actual zone files and DNSSEC keys to "ns2", the "named.conf.local" file remains empty, so BIND on "ns2" doesn't know about these files to actually load them.

    Why isn't "named.conf.local" being updated on "ns2"?

    Edit: Just tried a quick resync, but "named.conf.local" is still an empty file on "ns2".
     
  13. till

    till Super Moderator Staff Member ISPConfig Developer

    DNSSEC is not supported in DNS mirroring in ISPConfig yet, it get's disabled in the UI as soon as you create a mirror, maybe you created the dnssec records before you enabled mirroring. The reason why it is not available at the moment is a mistake in it's implementation which causes it to fail on mirrors. It is planned to reimplement that for 3.2 as this can not be soved with a small fix. When you need dnssec right now, then the only option at the moment is to disable mirrong for the dns servers and instead create a dns slave zone for the dns zones in ispconfig dns manager (on the ns2 server). This way, BIND itself will take care to mirror the zone and future changes.
     
  14. BobGeorge

    BobGeorge Member

    Ah, right. I did notice that the "DNSSEC" option had disappeared. And, yes, I did create a DNSSEC-based zone when I was testing "ns1" before I moved to "ns2" and made it a mirror.

    I don't need DNSSEC right now. It's more important to have the two name servers - our registrar insists on two - than it is to have DNSSEC. I can wait for the 3.2 fix.

    So how would I undo this? Just manually delete these files myself? As ISPConfig doesn't remove the DNSSEC files when you delete a zone.
     
  15. till

    till Super Moderator Staff Member ISPConfig Developer

    That's probably the only way to achieve that at the moment. And in case that mirroring still does not work then, enable debug log level and run server.sh on the mirror slave to see why they don't get included into named.conf.local.
     
  16. BobGeorge

    BobGeorge Member

    DEBUG - Calling function 'check_phpini_changes' from plugin 'webserver_plugin' raised by action 'server_plugins_loaded'.
    DEBUG - Found 16 changes, starting update process.
    DEBUG - Remove Lock: /usr/local/ispconfig/server/temp/.ispconfig.lock

    What does this mean?
     
  17. till

    till Super Moderator Staff Member ISPConfig Developer

    Seems as if no module or plugin is bound to these changes, which can be ok e.g. when the master hosts more services than the slave. Either these are not dns changes and therefore correctly ignored or the dns module is not activated by symlink in /usr/local/ispconfig/server/mods-enabled/ or the BIND plugin is not active in /usr/local/ispconfig/server/plugins-enabled/
     
  18. BobGeorge

    BobGeorge Member

    The DNS module is in mods-enabled and the BIND plugin is in plugins-enabled.

    I'm noticing, in the system log, that "ns2" is still periodically reporting "found 8 changes, starting update process" once every minute since I re-enabled the server cron job (it's now become 98 pages long).

    But there is no red pulsing indicator in the interface and "show jobqueue" is empty.

    This, by the way, is the server that had the "Stuck Jobqueue" problem before. And I think that the reason why the mysql password in "config.inc.php" was wrong to cause that stuck jobqueue problem is that, when I was copying my interface modifications over to the server, I think I must have copied too much - overwrote "config.inc.php" from my development laptop to the server and, thus, wrongly overwrote the mysql password.

    If that suspicion of mine is correct - as it's the only thing I can think of to explain how the password in "config.inc.php" became wrong to cause that problem - then are there any other settings in "config.inc.php" that could be affected by an accidental overwrite of that file from a different (single server) installation?

    Basically, I copied the files over from a directory higher than I should have. It should have just been a copy of "/usr/local/ispconfig/interface" to copy over my interface modifications but I might have copied over "/usr/local/ispconfig" in a hurry without thinking.
     
  19. till

    till Super Moderator Staff Member ISPConfig Developer

    The file also contains the server_id, this ID identifies the node in the cluster. If that ID is wrong, then the server will pick up changes for a wrong node. Compare the server_id in the server config.inc.php file on that node with the server_id of the server database table in the ispconfig master mysql database for the 'ns2' server record, they must match.
     
  20. BobGeorge

    BobGeorge Member

    Ouch.

    My stupid file copy - that overwrote the "config.inc.php" - caused quite a nasty issue.

    I went through each "config.inc.pgp" on each server and wrote down the username and passwords - a task in itself, as those passwords are long - and then used PhpMyAdmin to fix the users on the master so that the other servers could access it.

    But I must have typed one of the passwords incorrectly - I did mention they're very long, didn't I? - as one of the servers (the storage server) couldn't connect.

    But that didn't stop it trying and that's where the nasty issue comes in. With the cron jobs, it keeps trying to repeatedly access the master to do its work.

    Now, interestingly, if the master's turned off, this is not an issue. I guess it tries, it immediately fails because it's not possible to reach the machine when it's off and gives up.

    But once I turned on the master, something rather nasty happens.

    I think the underlying problem is that trying to access the master many times and failing, and then the system would send "root" an email to inform me of the problem - and I guess there are timeouts involved along this chain of errors, which is extending how long this all takes to process - is taking longer than the gap between successive cron jobs.

    And I'm sure you can work out what that means.

    At first, everything seems fine. But then, after a while, on the storage server, things start slowing down. Typing at the prompt becomes more and more unresponsive. Then, at a certain point, it goes haywire with the kernel printing out a stream of errors referring to write errors with the swap file.

    Wait a minute. The swap file? The server has got 24GB of RAM, Hmm, suddenly getting increasingly slow and unresponsive over time?

    I tested my hutch by repeatedly running "free" as things slowed down. Yeah, my hutch was right. The memory was filling up at a rather alarming rate. So the reason everything was fine at first is that, well, I have 24GB of RAM on that machine. It takes a while to fill that. But then, once filled, it spills over into the swap file. And when the swap file is thrashing, everything grinds down to a glacial pace (as we're basically now running at the speed of the disk, not the speed of the CPU). Not least because the memory is still being filled up.

    Eventually, all of RAM and all of the swap file are filled to the brim and that's when the kernel panics with those write errors on the swap file. Well, yes, it's trying to write to a swap file that no longer has any space.

    I re-typed the password for that server into PhpMyAdmin - and must have gotten it right this time around - and then the issue just instantly went away.

    I won't call this a bug report because, well, I never should have ever overwritten my "config.inc.php" on the master in the first place. That was - mea culpa! - my own damned stupid fault there. So the "bug" is on me, not ISPConfig, which would have been just fine if I'd left everything alone.

    But I thought you ought to know about this because the issue rapidly blows up to ridiculous proportions - happily eating up 24GB of RAM and then all of my swap file - and then, killing off my storage server, the entire cluster's brought down.

    Just because of one wrong password.

    Okay, I should never have touched that password. But it has highlighted a nasty issue that you might want to review in the design. As one wrong password shouldn't have such drastic consequences that can cause a whole cluster to die.

    Perhaps some sort of a lock or something, so that a new cron job is just dropped if the previous one hasn't completed, so that they can't start a new job faster than the jobs complete - or you get this "run away train" problem that ends up eating every last bite of memory and virtual memory available, as it all spirals out of control.
     

Share This Page