YIKES! On of my websites is down on server, but others are fine!

Discussion in 'ISPConfig 3 Priority Support' started by craig baker, Oct 7, 2023.

  1. craig baker

    craig baker Member HowtoForge Supporter

    VERY strange -- and I've rebooted server and the dualwan router.
    one of my most important websites is not responding - says hosts unknown - www.cdbsystems.com
    dig says:
    Code:
     <<>> DiG 9.11.20-RedHat-9.11.20-5.el8_3.1 <<>> www.cdbsystems.com
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 61935
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 512
    ; EDE: 23 (Network Error): ([35.169.39.140] rcode=SERVFAIL for www.cdbsystems.com/a)
    ; EDE: 23 (Network Error): ([70.184.247.92] rcode=REFUSED for www.cdbsystems.com/a)
    ; EDE: 22 (No Reachable Authority): (At delegation cdbsystems.com for www.cdbsystems.com/a)
    ;; QUESTION SECTION:
    ;www.cdbsystems.com.            IN      A
    
    ;; Query time: 40 msec
    ;; SERVER: 8.8.8.8#53(8.8.8.8)
    ;; WHEN: Sat Oct 07 12:18:39 EDT 2023
    ;; MSG SIZE  rcvd: 227
    
    but another site on the same server says:
    Code:
    [root@ns10 ~]# dig www.mandalaresearch.com
    
    ; <<>> DiG 9.11.20-RedHat-9.11.20-5.el8_3.1 <<>> www.mandalaresearch.com
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42515
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 512
    ;; QUESTION SECTION:
    ;www.mandalaresearch.com.       IN      A
    
    ;; ANSWER SECTION:
    www.mandalaresearch.com. 3600   IN      A       70.184.247.92
    
    ;; Query time: 12 msec
    ;; SERVER: 8.8.8.8#53(8.8.8.8)
    ;; WHEN: Sat Oct 07 12:25:01 EDT 2023
    ;; MSG SIZE  rcvd: 68
    
    [root@ns10 ~]#
    
    what on earth is going on - just happened out of the blue. did systemctl restart named and httpd, and infact restarted the server. and the router. behavior is the same... what on EARTH??
    when I look at messages, the other hosts are being loaded but NOT cdbsystems.com:

    Oct 7 11:59:46 ns10 named[2216]: /var/named/pri.mandalaresearchinc.com.signed:10: signature has expired
    Oct 7 11:59:46 ns10 named[2216]: /var/named/pri.mandalaresearch.com.signed:10: signature has expired
    from messages today after reload. but /var/named/pri.cdbsystems.com is not listed even though it exists and I've not changed anything???
    what on EARTH is doing on?? I've not even criticized our president recently! <grin>
    no errors in system messages about any zone NOT loading. but cdbsystems.com is simply never listed.
    just happened one moment to the next! site stopped responding and domain is not known.

    when I do a rndc dump_db -zones
    in the cach file there are my other zones but NOT cdbsystems.com:

    for example:;
    ; Zone dump of 'mandalaresearch.com/IN'
    ;

    so for some reason bind is simply ignoring one of my most important sites!
    in named.run I see thise lines:
    Code:
    client @0x7f5ae49e6570 35.169.39.140#35788 (cdbsystems.com): query (cache) 'cdbsystems.com/SOA/IN' denied
    client @0x7f5a200010d0 35.169.39.140#49441 (cdbsystems.com): bad zone transfer request: 'cdbsystems.com/IN': non-authoritative zone (NOTAUTH)
    client @0x7f5a78016980 35.169.39.140#41533 (cdbsystems.com): query (cache) 'cdbsystems.com/SOA/IN' denied
    client @0x7f5ae48b0d60 208.69.37.72#58388 (cdbsystems.com): query (cache) 'cdbsystems.com/MX/IN' denied
    client @0x7f5ae4b74460 121.4.139.17#20009 (cdbsystems.com): query (cache) 'cdbsystems.com/MX/IN' denied
    client @0x7f5ae4867250 121.4.139.17#50058 (mail.cdbsystems.com): query (cache) 'mail.cdbsystems.com/A/IN' denied
    client @0x7f5ae4a79b90 208.69.37.72#55952 (cdbsystems.com): query (cache) 'cdbsystems.com/MX/IN' denied
    client @0x7f5ae4afe5e0 49.234.123.177#64735 (mail.cdbsystems.com): query (cache) 'mail.cdbsystems.com/A/IN' denied
    client @0x7f5ae4b480f0 121.4.139.17#30744 (cdbsystems.com): query (cache) 'cdbsystems.com/MX/IN' denied
    client @0x7f5ae4961b20 81.69.189.116#39900 (cdbsystems.com): query (cache) 'cdbsystems.com/MX/IN' denied
    client @0x7f5ae4961b20 106.54.160.72#38525 (mail.cdbsystems.com): query (cache) 'mail.cdbsystems.com/A/IN' denied
    client @0x7f5ae483aee0 145.102.6.73#37785 (ns9.cdbsystems.com): query (cache) 'ns9.cdbsystems.com/AAAA/IN' denied
    client @0x7f5ae4ac36a0 74.125.18.68#56540 (wWW.thaiheRbSFooDtrUCK.COm.CdbSYstEMS.cOM): query (cache) 'wWW.thaiheRbSFooDtrUCK.COm.CdbSYstEMS.cOM/NAPTR/IN' denied
    client @0x7f5ae49e6570 74.125.18.66#51608 (wWW.tHaIHErbsFoOdTrUCk.cOm.cDBsyStEmS.coM): query (cache) 'wWW.tHaIHErbsFoOdTrU
    
    there are plenty of ti bogified domain messages (with the domain in upper/lowercases ). have i somehow had my cache poisoned?
    wouldnt systemctl named restart take care of that ? (it did nothing).

    YIKES!




    please save me Till!
    cdb <grin>
     
    Last edited: Oct 7, 2023
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Check your DNS zone with intodns.com to see if your DNS server works correctly. You might also want to check your servers using dig like this:

    dig @ns1.yourdomain.tld www.yourdomain.tld
    dig @ns2.yourdomain.tld www.yourdomain.tld

    to see if your servers return the correct data. If it does not returns the right data, check the data in the zone file and you can also try to restart bind on all DNS servers.
     
  3. craig baker

    craig baker Member HowtoForge Supporter

    intodns reports problems. but I have done NOTHING and infact rebooted server and router. and it happened out of the blue one minute to the next!
    Code:
    Parent [IMG]https://intodns.com/static/images/info.gif[/IMG] Domain NS records Nameserver records returned by the parent servers are:
    
    ns4.cdbsystems.com.  ['35.169.39.140']   [TTL=172800] 
    ns10.cdbsystems.com.  ['70.184.247.92']   [TTL=172800] 
    
    d.gtld-servers.net was kind enough to give us that information. 
    
    
    
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] TLD Parent Check Good. d.gtld-servers.net, the parent server I interrogated, has information for your TLD. This is a good thing as there are some other domain extensions like "co.us" for example that are missing a direct check.
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] Your nameservers are listed Good. The parent server d.gtld-servers.net has your nameservers listed. This is a must if you want to be found as anyone that does not know your DNS servers will first ask the parent nameservers.
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] DNS Parent sent Glue Good. The parent nameserver sent GLUE, meaning he sent your nameservers as well as the IPs of your nameservers. Glue records are A records that are associated with NS records to provide "bootstrapping" information to the nameserver.(see RFC 1912 section 2.3)
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] Nameservers A records Good. Every nameserver listed has A records. This is a must if you want to be found.
    NS [IMG]https://intodns.com/static/images/info.gif[/IMG] NS records from your nameservers NS records got from your nameservers listed at the parent NS are:
    Oups! I could not get any nameservers from your nameservers (the ones listed at the parent server). Please verify that they are not lame nameservers and are configured properly. 
    
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] Recursive Queries Good. Your nameservers (the ones reported by the parent server) do not report that they allow recursive queries for anyone.
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] Same Glue Hmm,I do not consider this to be an error yet, since I did not detect any nameservers at your nameservers.
    [IMG]https://intodns.com/static/images/pass.gif[/IMG] Glue for NS records OK. Your nameservers (the ones reported by the parent server) have no ideea who your nameservers are so this will be a pass since you already have a lot of errors!
    [IMG]https://intodns.com/static/images/error.gif[/IMG] Mismatched NS records WARNING: One or more of your nameservers did not return any of your NS records.
    [IMG]https://intodns.com/static/images/error.gif[/IMG] DNS servers responded ERROR: One or more of your nameservers did not respond:
    The ones that did not respond are:
    35.169.39.140 70.184.247.92
    
    
    and I restarted bind and flushed cache.
    named simply does not LOAD cdbsystems.com thought the pri file is there:
    Code:
    pri.cdbsystems.com contains:
    $TTL        3600
    @       IN      SOA     ns10.cdbsystems.com. cdb.craigscomputers.net. (
                            2023100705       ; serial, todays date + todays serial #
                            7200              ; refresh, seconds
                            540              ; retry, seconds
                            2419200              ; expire, seconds
                            86400 )            ; minimum, seconds
    ;
    
    cdbsystems.com. 3600      A          70.184.247.92
    mail 3600      A          70.184.247.92
    nextcloud 3600      A          70.184.247.92
    ns10 3600      A          70.184.247.92
    ns11 3600      A          108.18.202.58
    ns4 86400      A          35.169.39.140
    ns5 86400      A          70.184.247.92
    ns6 86400      A          70.184.247.92
    ns9 86400      A          204.111.190.136
    nss1 3600      A          70.184.247.92
    nss2 3600      A          35.169.39.140
    nsz 3600      A          70.184.247.92
    www 3600      A          70.184.247.92
    cdbsystems.com. 3600      CAA        0 issue "letsencrypt.org"
    cdbsystems.com. 3600      MX     10  mail.cdbsystems.com.
    cdbsystems.com. 3600      NS         ns10.cdbsystems.com.
    cdbsystems.com. 3600      NS         ns4.cdbsystems.com.
    cdbsystems.com. 3600      TXT        "google-site-verification=hbF_hT7raFZjKaJuB_iuZ4jF4QzwIDRY663CimhESVk"
    cdbsystems.com. 86400      TXT        "v=spf1 mx a ip4:70.184.247.92/32 a:mail.cdbsystems.com ~all"
    cdbsystems.com. 3600      TXT        "google-site-verification=hbF_hT7raFZjKaJuB_iuZ4jF4QzwIDRY663CimhESVk"
    default._domainkey.cdbsystems.com. 3600      TXT        "v=DKIM1; t=s; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDMcZw5tuOWtGQzff8vkO5fSlxiKW1MwqK8tnhaRQwqRMwvBjF0oMlU19o5dOkXkzqiXUNes12kwMZd8EKKaXHPIOIGDuLKx/Bm6kKNi/YzAeWtr+5c+gIJWsK8fDwvE4JkIwFMHWrroWl9hznxlzto1VuM8l24LLl4RtNoU5johwIDAQAB"
    ~                                     
    
    though there is NO pri.cdbsystems.com.signed file
    why does named not load it? its NOT in the dumpdb from rndc -zones.
    I know ns10 is not returning correct info -- named never loads the zone!
    dig returns:
    Code:
    [root@ns10 named]# dig @ns10.cdbsystems.com www.cdbsystems.com
    
    ; <<>> DiG 9.11.20-RedHat-9.11.20-5.el8_3.1 <<>> @ns10.cdbsystems.com www.cdbsystems.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 60040
    ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
    ;; WARNING: recursion requested but not available
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ; COOKIE: eb0341550ccaf22217b5ac9f6522bec2d43287a65ff628ae (good)
    ;; QUESTION SECTION:
    ;www.cdbsystems.com.            IN      A
    
    ;; Query time: 0 msec
    ;; SERVER: 192.168.2.20#53(192.168.2.20)
    ;; WHEN: Sun Oct 08 10:37:54 EDT 2023
    ;; MSG SIZE  rcvd: 75
    
    
    but another domain ns10 serves is returned properly!
    Code:
    [root@ns10 named]# dig @ns10.cdbsystems.com www.mandalaresearch.com
    
    ; <<>> DiG 9.11.20-RedHat-9.11.20-5.el8_3.1 <<>> @ns10.cdbsystems.com www.mandalaresearch.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44262
    ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
    ;; WARNING: recursion requested but not available
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ; COOKIE: 3a0efbb04be78c7037d5d5416522bef446675d0b03dd0018 (good)
    ;; QUESTION SECTION:
    ;www.mandalaresearch.com.       IN      A
    
    ;; ANSWER SECTION:
    www.mandalaresearch.com. 3600   IN      A       70.184.247.92
    
    ;; AUTHORITY SECTION:
    mandalaresearch.com.    3600    IN      NS      ns10.cdbsystems.com.
    mandalaresearch.com.    3600    IN      NS      ns4.cdbsystems.com.
    
    ;; Query time: 0 msec
    ;; SERVER: 192.168.2.20#53(192.168.2.20)
    ;; WHEN: Sun Oct 08 10:38:44 EDT 2023
    ;; MSG SIZE  rcvd: 144
    
    [root@ns10 named]#
    
    nameserver ns10.cdbsystems.com works for all zones EXCEPT cdbsystems.com!
    again this just happened out of the blue! server up for months! nothing changed, no updates (its running centos 8, so no updates possible atm!)

    cdb.
     
  4. till

    till Super Moderator Staff Member ISPConfig Developer

    Is the zone file mentioned in named.conf.local file like the other zones? If not, it will not get loaded. In this case, check if there are any errors in the zone file using named-checkzone command. You can also try to change something in the zone like lower or increase ttl and save, to see if it works then.
     
  5. craig baker

    craig baker Member HowtoForge Supporter

    dont see a named.conf.local. but its in the /var/name folder as pri.cdbsystems.com like all the others.
    named-checkzone says:
    [root@ns10 named]# named-checkzone mandalaresearch.com pri.mandalaresearch.com
    pri.mandalaresearch.com:21: TTL set to prior TTL (86400)
    zone mandalaresearch.com/IN: loaded serial 2023100701
    OK
    [root@ns10 named]# named-checkzone cdbsystems.com pri.cdbsystems.com
    pri.cdbsystems.com:28: TTL set to prior TTL (3600)
    zone cdbsystems.com/IN: loaded serial 2023100705
    OK
    [root@ns10 named]#
    mandalaresearch.com is fine andup. cdbsystems.com is unknown!
     
  6. craig baker

    craig baker Member HowtoForge Supporter

    and I altered some things in the zone file (note the serial is now 1007xxx and saved, but nothing helped.
    big question why would named in /var/log/messages NOT list cdbsystems.com as loaded? where do I see why THAT failed?
    I've grepped logs in named cant find anything?
    just noted that another nameserver, ns9.cdbsystems.com now no longer is found.
    but verisign nameserver check shows it properly!!
    and yes my godaddy account is paid up!

    from a dig response:
    ; EDE: 23 (Network Error): ([35.169.39.140] rcode=SERVFAIL for ns9.cdbsystems.com/a)
    ; EDE: 23 (Network Error): ([70.184.247.92] rcode=REFUSED for ns9.cdbsystems.com/a)
    why would it be seeing a REFUSED code and SERVFAIL codes for ns9?
    ns9's ip is
    # ping 204.111.190.136
    PING 204.111.190.136 (204.111.190.136) 56(84) bytes of data.
    and again it has not been changed in YEARS. why would it suddenly not be seen?

    cdbsystems.com is registered at godaddy and ns10, ns4, and ns9 are all hostnames associated. but the nameservers are ns10.cdbsystems.com and ns4.cdbsystems.com
    very perplexing!
     
    Last edited: Oct 8, 2023
  7. till

    till Super Moderator Staff Member ISPConfig Developer

    The file might be named differently on CentOS. There must be one file which contains all zones, you can see the name of this file in ISPConfig under System > server config > DNS in the field "BIND named.conf.local path". Look into that file and see if it includes the zone file for that zone. Also, have you checked the nameserver directly using dig like this:

    dig @localhost cdbsystems.com
     
  8. craig baker

    craig baker Member HowtoForge Supporter

    some further information: the files in /var/named (when did we quite chrooting named in perfect server?);

    [root@ns10 named]# ls *cdbsystems*
    dsset-cdbsystems.com. Kcdbsystems.com.+007+07510.private Kcdbsystems.com.+007+13560.private
    Kcdbsystems.com.+007+07510.key Kcdbsystems.com.+007+13560.key pri.cdbsystems.com
    [root@ns10 named]# ls *mandalaresearch.com*
    dsset-mandalaresearch.com. Kmandalaresearch.com.+007+57245.private Kmandalaresearch.com.+013+61968.private
    Kmandalaresearch.com.+007+46641.key Kmandalaresearch.com.+013+24602.key pri.mandalaresearch.com
    Kmandalaresearch.com.+007+46641.private Kmandalaresearch.com.+013+24602.private pri.mandalaresearch.com.signed
    Kmandalaresearch.com.+007+57245.key Kmandalaresearch.com.+013+61968.key
    [root@ns10 named]#

    now cdbsystems.com says its signed, but only few .key files are there?? and no pri.cdbsystems.com.signed
     
  9. craig baker

    craig baker Member HowtoForge Supporter

    ENLIGHTENMENT??? I went back. DELETED the DNSSEC signing. saved it. then re did signing. saved it.
    now cdbsystems.com reports and is back up. HOWEVER ns9.cdbsystems.com is still unknown. just percolation maybe??
    /var/log/messages now contains:
    Oct 8 14:28:02 ns10 named[21697]: /var/named/pri.cdbsystems.com:28: TTL set to prior TTL (3600)

    but still need to know why ns9 is not showing up?
    and since I did nothing to CAUSE this problem wtf is really going on??

    cdb.
     
  10. till

    till Super Moderator Staff Member ISPConfig Developer

  11. craig baker

    craig baker Member HowtoForge Supporter

    its /etc/named.conf.local and cdbsystems is NOT in this file!!! isnt this file created by ISPCONFIG?
    do I add it manually?

    edited it added manually and systemctl restart named. all seems happier?
    but WHY DID THIS HAPPEN?
     
    Last edited: Oct 8, 2023
  12. till

    till Super Moderator Staff Member ISPConfig Developer

    By adding it manually, you will not find out why it could not be added. See post #10 about debug mode, which you should have used if you would have wanted to know why all this happened. You might also find details in ispconfig system log.

    A zone file is not added when named-checkzone reported an issue at that time or named failed at that time when the zone should get added. You can see all the details in debug mode, that's why I recommended you using it.
     
  13. craig baker

    craig baker Member HowtoForge Supporter

    ok, I'm can I'm sure recreate the problem - by removing what i added at the end of named.conf.local.
    but when is this file created? any change to any zone? and why would it change from one minute to the next?
    I was not adding any new zones yesterday when this happened to me!
    I'm also seeing emails saying LE certs are expiring soon. I assume the auto-renew has failed for me to get these messages?
    this server is using certbot so I will see how to get certbot to renew that site then see what happens :)
     
  14. till

    till Super Moderator Staff Member ISPConfig Developer

    When a zone gets added or changed.

    The file gets changed when you change a zone on your system, this can be a manual change made by you or one of your customers on any zone (as it contains all zones, it gets rewritten for any zone change) or when dnssec keys were renewed.

    This is a unrelated topic. If you want to see if auto-renew failed, see certbot log file.
     
  15. craig baker

    craig baker Member HowtoForge Supporter

    I'll try to pin this down - for the edification of myself but also in case others run into this. if ispconfig is generating the zone file missing a pri entry, thats not a small thing should have various alarm bells going off. and my customers do NOT access ispconfig and I know *I* didnt make a change that morning. was busy programming :)
    might the zone listing (which included my zone that was missing from /etc/named.conf.local) not have a flashing red symbol by a zone that was NOT being written out to named.conf.local? or a pri.cdbsystems.com.err file I could look at? there was nothing visible except my site quit responding! either would have been helpful!

    I'll turn on debug and try and recreate the issue. at the time I was just trying to get things back up asap!
     

Share This Page