HighDots Forums  

blocking robots.txt from non-robots

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss blocking robots.txt from non-robots in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #21  
Old   
John Bokma
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 05:58 PM






Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Quote:
Can we simply agree to disagree and save discussion of *why* for
another time and go into some details about *how*?
Welcome to Usenet. Remember people try to help you in *their* spare time,
for *free*.

That being said: there are two ways that might do what you want:

1 IP address based: you have to find out the IP address ranges
each bot you want to allow.
2 UserAgent string based: you have to find out each UA string for
each bot you want to allow.

In .htaccess you can redirect internally using either 1 or 2 to the right
robots.txt.

--
John Bokma http://johnbokma.com/


Reply With Quote
  #22  
Old   
John Bokma
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 06:02 PM






Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Quote:
As I said in another post, Can we save discussion of *why* for another
time and talk about *how*?
Like I wrote in reply to that other post: we're trying to help you for
*free*. But even if you were paying me, I would ask you *why*. Too often
people think they have an X problem, and try to find all kinds of
solutions to that, while the real problem is Y. If you ever have been
helping others on Usenet, you certainly know what I mean.

It sounds to me like you're afraid to educated others (give away your
secrets) by hiding your robots.txt. If that is indeed the case you *do*
have a X -> Y problem.

--
John Bokma http://johnbokma.com/


Reply With Quote
  #23  
Old   
Paul
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 06:44 PM



On 22 Feb 2008 00:02:25 GMT, John Bokma <john (AT) castleamber (DOT) com> wrote:

Quote:
--
John Bokma http://johnbokma.com/
You have email John.
plh
Paul

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----


Reply With Quote
  #24  
Old   
Joe Fox
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:23 PM



John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4BB6E764926castleamber (AT) 130 (DOT) 133.1.4:

Quote:
Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Can we simply agree to disagree and save discussion of *why* for
another time and go into some details about *how*?

Welcome to Usenet. Remember people try to help you in *their* spare
time, for *free*.
Yes, that's right, they do. and I have always apprecieated the help and
input I find on UseNet and other sources. That's why I didn't see the
need for a big deal about why. I didn't see a need to waste people's
time with *why*.

Now if I were trying to convince folks to do something like this on their
servers & sites (which I wouldn't... not my business), that would be
another matter entirely and I'd have to come with a truckload of *why*
and it'd better be bloody convincing at that.


Quote:
That being said: there are two ways that might do what you want:

1 IP address based: you have to find out the IP address ranges
each bot you want to allow.
2 UserAgent string based: you have to find out each UA string for
each bot you want to allow.

In .htaccess you can redirect internally using either 1 or 2 to the
right robots.txt.
Thank you very much for a useful answer.

Sorry if I've come off like an ass. Real life is intruding. Not a good
excuse I realize but just when you think you've got enough to deal
with....


Reply With Quote
  #25  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:46 PM



Joe Fox <ny152 (AT) none (DOT) invalid> wrote in
news:Xns9A49EC6462EC5891563 (AT) 127 (DOT) 0.0.1:

Quote:
I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.

From what I've read this could be done with .htacess but I've not been
able to make heads or tails out of that.

I'd really be grateful for some help here.

Thanks
Some tutorials
http://baremetal.com/gadgets/htaccess/
http://evolt.org/node/226
http://www.edginet.org/techie/website/htaccess.html
http://www.dimi.uniud.it/labs/docume...hallenger1.2/U
ser/htaccess/htaccess.html
http://www.webhelpinghand.com/htaccess_deny.htm
http://www.javascriptkit.com/howto/htaccess.shtml
http://www.serverwatch.com/tutorials...0825_1127711_1
http://www.verio.com/support/documen...fm?doc_id=3624


Reply With Quote
  #26  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:48 PM



John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Quote:
Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check your
robots.txt anyway.

Many bots these days (even legitimate SE's) are simply changing their UA
to standard browser footprints.

It seems to be OK for them to cloak, however the other side of the coin
for webmasters is frowned upon.


Reply With Quote
  #27  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:50 PM



Joe Fox <ny152 (AT) none (DOT) invalid> wrote in
news:Xns9A4A870814A3D891563 (AT) 127 (DOT) 0.0.1:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check
your robots.txt anyway.


Perhaps I didn't say it right. I'm wanting to block the robots.txt
that I'm feeding search engines from being given to anybody else. I
realize that they *could* spoof the SE's user agent or something, but
my concerns are bright enough to look for robots.txt but not bright
enough to expect to be handed a phoney


Perhpas you might survey two versions of robots.text that are served
behind the scenes?



Reply With Quote
  #28  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:52 PM



Joe Fox <ny152 (AT) none (DOT) invalid> wrote in
news:Xns9A4B60538A9A891563 (AT) 127 (DOT) 0.0.1:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A915154EB7castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled
by search engine bots but I'd like to block anything that isn't a
known search engine bot doesn't get the file I'm feeding to
google, yahoo and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check
your robots.txt anyway.


Perhaps I didn't say it right. I'm wanting to block the robots.txt
that I'm feeding search engines from being given to anybody else.

Why? If the reason is that you want to "protect" some folders: it's
not secure and bound to fail sooner or later. Remember that not all
bots honor the robots.txt, especially not the ones that you don't
want on your site in the first place.

I want to keep certain humans from reading the robots.txt that I give
to search engines because it's none of their bloody business what
pages I tell SE's not to index and there are a few that might have
mind enough to look at robots.txt They will not however expect to be
handed a tailored version of it.

I
realize that they *could* spoof the SE's user agent or something,
but my concerns are bright enough to look for robots.txt but not
bright enough to expect to be handed a phoney

You want to hide the key under the doormat which has in 5 languages
"The key is hidden nearby" written on top...

Not really, or is it possible that they could also get my .htaccess?
I didn't think that was possible. If they ask for a robots.txt and
get one that's got nothing more than a pointer to a sitemap that will
satisfy 'em.
The most effective way to do this is not allow the option of vieweing
robots.txt for denied IP ranges within htaccess.

As far as denying robots.txt to the entire general public?
It's a bad practice as the majority of the GP never even heard of
robots.txt




Reply With Quote
  #29  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:55 PM



John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4B64E22EB17castleamber (AT) 130 (DOT) 133.1.4:

Quote:
Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Not really, or is it possible that they could also get my .htaccess?
I didn't think that was possible. If they ask for a robots.txt and
get one that's got nothing more than a pointer to a sitemap that will
satisfy 'em.

Let's assume for arguments sake that those people *want* to see your
robots.txt. If you feed Google something different than them, they
will notice as soon as they check Google, because if you disallow
Google some directories, while your robots.txt says allow, they will
wonder why all pages in some directory don't show up in Google, but
are available on your site.


You give the majority of the general public too much credit
Comparing a websites robots.txt to google results!



Reply With Quote
  #30  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 07:58 PM



Phil Payne <phil (AT) isham-research (DOT) co.uk> wrote in news:adf1b99f-5c4d-48a8-
b348-49a9cacc2453 (AT) n77g2000hse (...oglegroups.com:

Quote:
Perhaps I didn't say it right. *I'm wanting to block the robots.txt that

I'm feeding search engines from being given to anybody else.

If Google catch you they will exclude you from the index.

'Don't deceive your users or present different content to search
engines than you display to users, which is commonly referred to as
"cloaking." '

It's done ALL the time.
What matters is that it's done for an appropiate reason and is accomplished
server side.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.