[Addon] Masking/Protecting robots.txt file from view

For Articles relating to more than one ISC version
Post Reply
Snooper
Posts: 264
Joined: Sat Jun 26, 2010 9:22 pm

[Addon] Masking/Protecting robots.txt file from view

Post by Snooper »

I was making amendments to my Google Analytics ‘stuff’ and while doing so, did a few external tests. To my horror I was able to discover that my ‘robots.txt’ was also viewable making this public via a simple Google search !!! :o

I kid you not...

Image

So it seems not only are Google helping promote my site, they are offering my site to be abused (assume also yours?), and have provided the route to do so. Clearly I am not willing expose my site any further.

My approch may be old school and uses cloaking as an option come solution. Now if somebody or 3rd party masks themselves as a Googlebot they will fail during the scripts robot validation so will be redirected to the main domain via php. This also ties Google from exposing my robots file..

1) As first step you need to add these lines to your .htaccess file, or if you don't have, create one and upload it to the root domain folder.

RewriteEngine On
RewriteCond %{http_user_agent} !(googlebot|Msnbot|Slurp) [NC]
RewriteRule ^robots\.txt$ http://www.YOURSITE.com/ [R,NE,L]
AddHandler application/x-httpd-php .txt


2) Open a text editor or your favourite web editor application and insert the code below into a new file save as reversedns.php and upload it to your root folder.

<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
if(stristr($ua, 'msnbot') || stristr($ua, 'Googlebot') || stristr($ua, 'Yahoo Slurp')){
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname) &&!preg_match("/crawl\.yahoo\.net$/", $hostname)) {
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$real_ip = gethostbyname($hostname);
if($ip!= $real_ip){
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$block = FALSE;
}
}
}
?>


3) And as the last step to to open the robots.txt file you would like to protect and insert the code below to the first line.

<?php include("reversedns.php"); ?>

I shall monitor the result of this change and feed back. If however, YOU have an alternative version that may also acheave what I'm trying here; Do share...
Last edited by Snooper on Sun Apr 22, 2012 12:43 pm, edited 1 time in total.
ISC 5.5.4 Ultimate : Being used here -- http://www.kdklondon.com
Martin
Site Admin
Site Admin
Posts: 1854
Joined: Wed Jun 17, 2009 6:30 pm
Location: South Yorkshire UK
Contact:

Re: [Addon] Masking/Protecting robots.txt file from view

Post by Martin »

Only one problem with that in that it will limit it's masking to anyone spoofing specific spiders but still a good way to block any attemps to circumnavigate access you might have allowed for spider bots from google, etc... to access limited content/product info.
Post Reply