robots.txt

Discussion in 'Server Operation' started by latcarf, Aug 24, 2006.

  1. latcarf

    latcarf New Member

    I have a Vhost site that I cannot seem to get out there into the search engines... I have been reading up on "robots.txt" file and thought this might help get it out there. My log indicates the search engine robots, spiders, whatever... have been in crawling (pardon the pun!) around looking for this file. I have some different config examples, etc. I also know there are pros and cons to it's use.

    One bit of info I have not found is where to put the file in your server docs!? Anyone know? Have experience using the file!?
     
  2. neil6179

    neil6179 New Member

    Hi,

    I wrote a spider last year that obeyed the rules set in the robots.txt file so I've had some experience.

    Firstly, the file should be located in the root of the site. e.g.
    http://www.example.com/robots.txt

    I'm not sure however having a robots.txt will achieve your goal of getting into the Search Engines. The file is designed purely to prevent the spider accessing URLs which you do not wish to be indexed. Therefore the file is only useful to stop pages appearing in Searching Engines and not the other way around.

    I'd recommend you look up some other Search Engine Optimisation (SEO) techniques. From a few bits of experience and building my own indexing algorithms I've found the following things important:

    • Ensure all the HTML is valid.
    • Avoid table based layouts, stick to CSS.
    • Supply plenty of relevant content. This is what they are looking for after all!
    • Ensure there is a full site map using text links so the Spider can easily follow them.
    • Never try to trick the spider. They will always find out eventually and blacklist your site.

    I hope this helps a little.

    Neil
     
  3. latcarf

    latcarf New Member

    Hey Neil

    Thanks for the info. I thought the robots.txt worked that way but hey, I am a novice at this.

    HTML is good but table based. I'll have to read up on the site map. The other site I have seems to have gotten out there and has been for some time, this new one is the problem.

    I'll go through everything again though.

    Thanks again!
     

Share This Page