Do the Googlebot create a problem? (access.log) Why do they show a .pdf

Discussion in 'Server Operation' started by Milly, Jan 30, 2020.

  1. Milly

    Milly Member

    I am trying to learn about the meaning of these messages, but it is not clear to me why they are, what they mean, why a pdf, cause a problem?

    Or is this normal from day to day?

    "GET /the_tale_of_ginger_and_pickles.pdf HTTP/1.1"

    66.249.64.158 - - [29/Jan/2020:13:19:05 -0600] "GET /manual/tr/mod/mod_dbd.html HTTP/1.1" 200 6876 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    66.249.64.158 - - [29/Jan/2020:13:26:17 -0600] "GET /manual/de/programs/logresolve.html HTTP/1.1" 200 2733 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    66.249.64.158 - - [29/Jan/2020:13:26:32 -0600] "GET /the_tale_of_ginger_and_pickles.pdf HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    66.249.64.158 - - [30/Jan/2020:11:15:43 -0600] "GET /manual/de/platform/index.html HTTP/1.1" 200 2250 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    45.136.108.42 - - [30/Jan/2020:11:16:10 -0600] "GET / HTTP/1.1" 200 3324 "-" "Mozilla/5.0 (Windows NT 5.1; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"

    45.136.108.42 - - [30/Jan/2020:11:16:10 -0600] "GET /HNAP1/ HTTP/1.1" 404 434 "http://159.168.87.40/" "Mozilla/5.0 (Windows NT 5.1; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"

    66.249.64.158 Google LLC (GOGL)
    45.136.108.42 ORG-CL547-RIPE RUSSIAN FEDERATION
    159.168.87.40 Example ip

    Debian 10

    Thanks
     
    Last edited: Jan 30, 2020
  2. Steini86

    Steini86 Active Member

    yes
    More information on Goolebot and how it works: http://www.google.com/bot.html
    What the parts in the logfile mean is described here: https://httpd.apache.org/docs/2.4/logs.html
    As an example for this one:
    Code:
    66.249.64.158 - - [29/Jan/2020:13:26:32 -0600] "GET /the_tale_of_ginger_and_pickles.pdf HTTP/1.1" 404 490 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    66.249.64.158 - IP address of your visitor
    first "-" The identity of the client. "-" means not available
    second "-" User if connection would be authentificated
    [29/Jan/2020:13:26:32 -0600] - date and time of your server at the time of visit
    "GET /the_tale_of_ginger_and_pickles.pdf HTTP/1.1" - Request from the visitor. It is a GET Request for this .pdf file via HTTP Protocol Version 1.1. This happens when someone (in this case the googlebot types in www.yourdomain.com/the_tale_of_ginger_and_pickles.pdf You can try that by yourself and see the changes in the logfile (to watch the logfile "live", use: "tail -f /path/to/logfile"
    404 - Reply from Webserver. 404 means: File not found
    490 - size of the request (the error page is pretty small)
    "-" The "referrer", where the client came from
    The rest is the user agent of the client. Most of the time has the browser version, operating system, etc ... In this case also information about the google bot.
     
    Milly likes this.
  3. Milly

    Milly Member

    Excellent answer, now I understand it and it is clear to me.

    Thank you
     
    Steini86 likes this.

Share This Page