Page 1 of 1

[Addon] Masking/Protecting robots.txt file from view

Posted: Sat Apr 21, 2012 1:35 am
by Snooper
I was making amendments to my Google Analytics ‘stuff’ and while doing so, did a few external tests. To my horror I was able to discover that my ‘robots.txt’ was also viewable making this public via a simple Google search !!! :o

I kid you not...

Image

So it seems not only are Google helping promote my site, they are offering my site to be abused (assume also yours?), and have provided the route to do so. Clearly I am not willing expose my site any further.

My approch may be old school and uses cloaking as an option come solution. Now if somebody or 3rd party masks themselves as a Googlebot they will fail during the scripts robot validation so will be redirected to the main domain via php. This also ties Google from exposing my robots file..

1) As first step you need to add these lines to your .htaccess file, or if you don't have, create one and upload it to the root domain folder.

RewriteEngine On
RewriteCond %{http_user_agent} !(googlebot|Msnbot|Slurp) [NC]
RewriteRule ^robots\.txt$ http://www.YOURSITE.com/ [R,NE,L]
AddHandler application/x-httpd-php .txt


2) Open a text editor or your favourite web editor application and insert the code below into a new file save as reversedns.php and upload it to your root folder.

<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
if(stristr($ua, 'msnbot') || stristr($ua, 'Googlebot') || stristr($ua, 'Yahoo Slurp')){
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname) &&!preg_match("/crawl\.yahoo\.net$/", $hostname)) {
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$real_ip = gethostbyname($hostname);
if($ip!= $real_ip){
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$block = FALSE;
}
}
}
?>


3) And as the last step to to open the robots.txt file you would like to protect and insert the code below to the first line.

<?php include("reversedns.php"); ?>

I shall monitor the result of this change and feed back. If however, YOU have an alternative version that may also acheave what I'm trying here; Do share...

Re: [Addon] Masking/Protecting robots.txt file from view

Posted: Sat Apr 21, 2012 11:28 pm
by Martin
Only one problem with that in that it will limit it's masking to anyone spoofing specific spiders but still a good way to block any attemps to circumnavigate access you might have allowed for spider bots from google, etc... to access limited content/product info.