[solved]HTTP to HTTPS redirects under ISPConfig 3.1 (beta) for nginx — causes strange 404 errors?...

Discussion in 'Installation/Configuration' started by Gwyneth Llewelyn, Jun 8, 2016.

  1. Bear with me just for a minute, because this issue might not be related to ISPConfig at all... however, since this might be related to the way ISPConfig writes the vhosts files for nginx, there might be an explanation for this behaviour.
    So I have migrated most of my sites running on an 'old' Ubuntu 14.04 (nginx 1.10 + PHP 5.5.9, ISPConfig 3.0) to a 'new' server, this time with the shiny brand new Ubuntu 16.04 which comes... with PHP 7 (7.0.4 by the time I write this). This, in turn, meant upgrading to ISPConfig 3.1 Beta (love the new look, btw!), even if it might still be a bit unpolished and rough on the edges... but, alas, I don't want to switch back to PHP 5.
    After painstakingly getting everything configured again (yay for WordPress and Pydio, both of which have no issues with PHP 7), I also went further with the configuration, and activated https on pretty much all the sites I had (previously I had just two). Note that I'm actually cheating, and not using Let's Encrypt. Instead, I use CloudFlare in front of everything. CloudFlare deals with the HTTPS connection with the client browsers, presenting their certificate. All I need to worry about is to create an encrypted HTTPS connection to CloudFlare, and for that, CloudFlare is kind enough to sign a few certificates for free (they will never be presented to a browser anyway).
    As said, I had this previously working fine under ISPConfig 3.0, nginx 1.10, and PHP 5.5.9. As far as I can see, the vhost configuration created by ISPConfig 3.1 has not changed at all. I redirected everything from HTTP to HTTPS (i.e. checking the appropriate boxes... and checking again to make sure that the vhost files did, indeed, contain the correct statements!), and, as far as I could see, things were working flawlessly.
    My first hint that something was not quite right came from the Google Webmaster Tools, because they complained about an unusual amount of 404 errors. I didn't care much about that: after all, the old server was failing constantly (that was the reason for the move!) and I supposed that Googlebot had a seriously hard time to crawl through the sites — so I basically thought that this was just 'old data', and I would have to wait a few more weeks to make sure that Google crawled everything again. After all, I didn't get any errors whatsoever on all browsers I could get my hands on, and doing tests from different locations and networks...
    Then, one day, I set up New Relic to try it out. I'm still a bit confused with the massive amount of data being generated, and the insane level of alerts that New Relic spews out... but at some point it was clear that it was also complaining from an unusual amount of errors, namely, 404 errors. This didn't make any sense, since New Relic is analysing data in pretty much real time, and I didn't have it working on the old server anyway — so the errors must be on the new server.
    And, indeed, there are very suspicious log entries on error.log for one of the sites:

    [crit] 417#417: *109763 open() "/var/www/my.web.site/web/" failed (13: Permission denied), client: AAA.BBB.CCC.DDD, server: my.web.site, request: "GET / HTTP/1.1", host: "my.web.site"​

    There really a lot of those entries — in fact, an astonishing amount of them! And this completely baffled me. Why would nginx be passing an HTTP request to the docroot directory and drop an error? That didn't make sense: it should get redirected (and that means a 301). Trying to access the docroot directory is also definitely something which I do not want to see (and I'm glad to see that the permission was denied...).
    First, of course, not recognising any of the IP addresses, I thought that I was being a victim of some sort of attack. But then I started recognising my own IP address! And, indeed, every time I accessed that server with HTTP, it would redirect to HTTPS as expected — but also drop the error.
    From the browser's perspective, everything is fine: the user is redirected, and no error is actually being displayed on the browser. It shows exactly the page I was expecting to see, under an https:// URL. That's exactly the expected behaviour.
    Nevertheless, those errors keep popping up. I tried with different browsers, all of them with the same result. Then I went further, turning CloudFlare off, clearing caches, and so forth. The error persists.
    The vhost file is precisely as expected:
    server {
    listen *:80;
    listen *:443 ssl;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_certificate /var/www/clients/clientX/webY/ssl/my.web.site.crt;
    ssl_certificate_key /var/www/clients/clientX/webY/ssl/my.web.site.key;
    server_name my.web.site www.my.web.site;

    root /var/www/my.web.site/web/;

    if ($http_host = "www.my.web.site") {
    rewrite ^ $scheme://my.web.site$request_uri? permanent;
    }

    if ($scheme != "https") {
    rewrite ^ https://$http_host$request_uri? permanent;
    }

    index index.html index.htm index.php index.cgi index.pl index.xhtml;

    [... etc ...]​
    I don't really see any difference here from what I used to have under ISPConfig 3.0. Note that the strange error happens on all vhosts on that server, not just on a particular one. Because most of those websites were http:// — some of which over a decade! — there are still gazillions of links all over the 'net with http:// which require a redirect. And when this happens, I get another error in the logs. And this is what apparently the Googlebot is picking up, as well as New Relic (which retrieves data in quite different ways — but nevertheless catches the 404 error)

    Now, my question is if this is somehow a 'new' behaviour introduced by nginx 1.10 + PHP 7 or if it is something else, somehow related to the way ISPConfig 3.1 handles directories and/or other settings, compared to 3.0? (Remember, I had to move things from one server to the other, but I did it manually, not with the 'official' migration tool)
    Because what seems to be happening is that the request to http://... somehow is honoured by nginx before the rewriting rules are being applied, and, in that case, nginx tries to access the docroot directory — which it cannot! — thus dropping the error; in the mean time, however, the rewriting takes place, and what the browser gets is a 301 redirect, which happens invisibly in thebackground, and the (correct) page is instantly retrieved via https — so the user never sees any 'error'.
    Googlebot (and other tools, like apparently New Relic), however, are not that easily fooled. How exactly they manage to 'see' an error is beyond me — retrieving the homepage via curl (i.e. curl -i http://my.web.site) correctly shows a 301, and a Location: https://my.web.site/ header. That's the expected behaviour!
    So what is going on here? Where exactly should I be looking for problems/misconfigurations?

    Thanks in advance!
     
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    We changed a lot of things in ISPConfig 3.1 but there was no change in the folder permission scheme and also no major changes in the nginx vhost template, so it should be pretty the same then in 3.0.5 as you already noticed. This issue is really strange and I have not seen that yet.

    The web site root folder should exist, even if you redirect. So the folder /var/www/my.web.site/web/ is there when you check with "ls -la" on the shell? And it is owned by web user and client group? Does the folder has 660 or 661 permission?
     
  3. Hi @till, thanks so much for looking into this!
    And thanks for confirming that there were no changes in the folder permission scheme, as well as in the vhost templates; it looked like there weren't any substantial changes, but it's always nice to have a confirmation! :)

    Ok, I think that you might actually have hit it! That folder is definitely there, owned by the correct user and group, but its permissions are 710 (drwx--x---) like on all other web folders; the content inside is 644 (folders are 755). This is the same on every site — I have 'inherited' those permissions from 3.0.X and let them stay that way!

    (And, on an impulse, I remembered to check ACLs for that folder — I use unison to synchronize it with a cold-standby backup server, and unison is known to sometimes toggle the archive bits while working. Here is the output (exactly as expected):

    $ getfacl web
    # file: web
    # owner: web12
    # group: client6
    user::rwx
    group::--x
    other::---​
    )

    Changing the 'web' folder to 755 (just to make some tests) definitely made the error disappear!
    Good catch! That was awesome, and it reminded me of my Unix guru telling me often: '90% of all Unix issues are permission problems'. Again, he was right. Ironically, I have been so used for the permissions to be 710 on the /web folder that I never thought to recheck them...
    I've been looking at /usr/local/ispconfig/server/plugins-available/nginx_plugin.inc.php. At around line 770, I most definitely have:
    $app->system->chmod($data['new']['document_root'].'/web', 0751);​
    and on 836 (for medium security settings):
    $app->system->chmod($data['new']['document_root'].'/web', 0755);​
    However, around lines 880+, there seem to be some settings for 'higher security', and it seems that one of those settings is to chmod the /web directory with 0710:

    if($web_config['security_level'] == 20) {
    $app->system->chmod($data['new']['document_root'].'/' . $web_folder, 0710);
    ...​

    Now, I don't really remember how exactly I have configured my server the first time. I don't even know where the 'security settings' are — these are a complete novelty to me :) But it's likely that I may have inadvertently clicked on some silly option to raise the security level, ISPConfig changed the web folder to 0710, and I never bothered to check it again; all worked fine under HTTP, but as I switched to HTTPS, apparently the permissions have to be at least 0751...

    My fault for being lazy! I never thought this would turn out to be a simple permissions problem! :) I'm almost ashamed to have wasted your time...

    You can most certainly lock this thread and mark it as solved!
     

Share This Page