[Addon] Masking/Protecting robots.txt file from view

For Articles relating to more than one ISC version
Post Reply
Snooper
Posts: 264
Joined: Sat Jun 26, 2010 9:22 pm

[Addon] Masking/Protecting robots.txt file from view

Post by Snooper » Sat Apr 21, 2012 1:35 am

I was making amendments to my Google Analytics ‘stuff’ and while doing so, did a few external tests. To my horror I was able to discover that my ‘robots.txt’ was also viewable making this public via a simple Google search !!! :o

I kid you not...

Image

So it seems not only are Google helping promote my site, they are offering my site to be abused (assume also yours?), and have provided the route to do so. Clearly I am not willing expose my site any further.

My approch may be old school and uses cloaking as an option come solution. Now if somebody or 3rd party masks themselves as a Googlebot they will fail during the scripts robot validation so will be redirected to the main domain via php. This also ties Google from exposing my robots file..

1) As first step you need to add these lines to your .htaccess file, or if you don't have, create one and upload it to the root domain folder.

RewriteEngine On
RewriteCond %{http_user_agent} !(googlebot|Msnbot|Slurp) [NC]
RewriteRule ^robots\.txt$ http://www.YOURSITE.com/ [R,NE,L]
AddHandler application/x-httpd-php .txt


2) Open a text editor or your favourite web editor application and insert the code below into a new file save as reversedns.php and upload it to your root folder.

<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
if(stristr($ua, 'msnbot') || stristr($ua, 'Googlebot') || stristr($ua, 'Yahoo Slurp')){
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname) &&!preg_match("/crawl\.yahoo\.net$/", $hostname)) {
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$real_ip = gethostbyname($hostname);
if($ip!= $real_ip){
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$block = FALSE;
}
}
}
?>


3) And as the last step to to open the robots.txt file you would like to protect and insert the code below to the first line.

<?php include("reversedns.php"); ?>

I shall monitor the result of this change and feed back. If however, YOU have an alternative version that may also acheave what I'm trying here; Do share...
Last edited by Snooper on Sun Apr 22, 2012 12:43 pm, edited 1 time in total.
ISC 5.5.4 Ultimate : Being used here -- http://www.kdklondon.com

Martin
Site Admin
Site Admin
Posts: 1854
Joined: Wed Jun 17, 2009 6:30 pm
Location: South Yorkshire UK
Contact:

Re: [Addon] Masking/Protecting robots.txt file from view

Post by Martin » Sat Apr 21, 2012 11:28 pm

Only one problem with that in that it will limit it's masking to anyone spoofing specific spiders but still a good way to block any attemps to circumnavigate access you might have allowed for spider bots from google, etc... to access limited content/product info.

Vickigedly
Posts: 2
Joined: Wed Sep 04, 2019 2:38 am

Addon Masking/Protecting robots txt file from view

Post by Vickigedly » Sun Dec 08, 2019 5:01 am

MQ Batch Toolkit MQBT can insert a file as a message to a queue from any path i.e. local drive, network drive, UNC path, etc..

MQBT does not have any builtin ftp capabilities but you could create a script that first does the ftp then invokes MQBT to insert the file. i.e.
Attachments
260.gif
260.gif (3.57 KiB) Viewed 1004 times

Post Reply