HighDots Forums  

blocking robots.txt from non-robots

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss blocking robots.txt from non-robots in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Joe Fox
 
Posts: n/a

Default blocking robots.txt from non-robots - 02-20-2008 , 12:14 AM







I'm using a robots.txt file to control what is and is not crawled by search
engine bots but I'd like to block anything that isn't a known search engine
bot doesn't get the file I'm feeding to google, yahoo and the others.

From what I've read this could be done with .htacess but I've not been able
to make heads or tails out of that.

I'd really be grateful for some help here.

Thanks

Reply With Quote
  #2  
Old   
John Bokma
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-20-2008 , 08:00 AM






Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Quote:
I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.
Why?

I can imagine that you want to block your entire site for any bot that's
known to be abusive though, but those probably don't check your robots.txt
anyway.

--
John Bokma http://johnbokma.com/


Reply With Quote
  #3  
Old   
Joe Fox
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-20-2008 , 02:16 PM



John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Quote:
Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check your
robots.txt anyway.

Perhaps I didn't say it right. I'm wanting to block the robots.txt that
I'm feeding search engines from being given to anybody else. I realize
that they *could* spoof the SE's user agent or something, but my concerns
are bright enough to look for robots.txt but not bright enough to expect
to be handed a phoney




Reply With Quote
  #4  
Old   
John Bokma
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-20-2008 , 03:17 PM



Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check
your robots.txt anyway.


Perhaps I didn't say it right. I'm wanting to block the robots.txt
that I'm feeding search engines from being given to anybody else.
Why? If the reason is that you want to "protect" some folders: it's not
secure and bound to fail sooner or later. Remember that not all bots honor
the robots.txt, especially not the ones that you don't want on your site
in the first place.

Quote:
I
realize that they *could* spoof the SE's user agent or something, but
my concerns are bright enough to look for robots.txt but not bright
enough to expect to be handed a phoney
You want to hide the key under the doormat which has in 5 languages "The
key is hidden nearby" written on top...

--
John Bokma http://johnbokma.com/


Reply With Quote
  #5  
Old   
Phil Payne
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-20-2008 , 03:55 PM



Quote:
Perhaps I didn't say it right. *I'm wanting to block the robots.txt that
I'm feeding search engines from being given to anybody else.
If Google catch you they will exclude you from the index.

'Don't deceive your users or present different content to search
engines than you display to users, which is commonly referred to as
"cloaking." '


Reply With Quote
  #6  
Old   
Joe Fox
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 01:10 AM



Phil Payne <phil (AT) isham-research (DOT) co.uk> wrote in
news:adf1b99f-5c4d-48a8-b348-49a9cacc2453 (AT) n77g2000hse (DOT) googlegroups.com:

Quote:
Perhaps I didn't say it right. *I'm wanting to block the robots.txt
that

I'm feeding search engines from being given to anybody else.

If Google catch you they will exclude you from the index.

'Don't deceive your users or present different content to search
engines than you display to users, which is commonly referred to as
"cloaking." '


I can't believe this.

I'm not trying to cloak my content or pull anything underhanded.

I have robots.txt set to tell google and others disallow certain pages.

I don't want certain humans (only a few hundred in number but all on
dynamic IPs in several countries) to be able to read the robots.txt that
I'm giving search engines because I don't want them to know what pages I
am telling SE's "disallow"

What's so wrong with this?

That robots.txt is not these people's business and I don't want them to
read it. If I knew all of the IP addresses that they connected from I
would block 'em that way but as I said, they're all dynamic from a
variety of ISP's in several countries.


Reply With Quote
  #7  
Old   
Joe Fox
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 01:35 AM



John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A915154EB7castleamber (AT) 130 (DOT) 133.1.4:

Quote:
Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled
by search engine bots but I'd like to block anything that isn't a
known search engine bot doesn't get the file I'm feeding to google,
yahoo and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check
your robots.txt anyway.


Perhaps I didn't say it right. I'm wanting to block the robots.txt
that I'm feeding search engines from being given to anybody else.

Why? If the reason is that you want to "protect" some folders: it's
not secure and bound to fail sooner or later. Remember that not all
bots honor the robots.txt, especially not the ones that you don't want
on your site in the first place.
I want to keep certain humans from reading the robots.txt that I give to
search engines because it's none of their bloody business what pages I
tell SE's not to index and there are a few that might have mind enough to
look at robots.txt They will not however expect to be handed a tailored
version of it.

Quote:
I
realize that they *could* spoof the SE's user agent or something, but
my concerns are bright enough to look for robots.txt but not bright
enough to expect to be handed a phoney

You want to hide the key under the doormat which has in 5 languages
"The key is hidden nearby" written on top...
Not really, or is it possible that they could also get my .htaccess? I
didn't think that was possible. If they ask for a robots.txt and get one
that's got nothing more than a pointer to a sitemap that will satisfy
'em.


Reply With Quote
  #8  
Old   
Big Bill
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 06:54 AM



On Wed, 20 Feb 2008 13:16:26 -0600, Joe Fox <ny152 (AT) none (DOT) invalid>
wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled by
search engine bots but I'd like to block anything that isn't a known
search engine bot doesn't get the file I'm feeding to google, yahoo
and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check your
robots.txt anyway.


Perhaps I didn't say it right. I'm wanting to block the robots.txt that
I'm feeding search engines from being given to anybody else. I realize
that they *could* spoof the SE's user agent or something, but my concerns
are bright enough to look for robots.txt but not bright enough to expect
to be handed a phoney
So you'll have two robots.txts, one that's actually working and one
that's just there for cosmetic purposes. You're weird, sir!

BB
--

http://www.kruse.co.uk/
http://www.fat-odin.com/
http://www.here-be-posters.co.uk/


Reply With Quote
  #9  
Old   
Big Bill
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 06:54 AM



On Thu, 21 Feb 2008 00:35:30 -0600, Joe Fox <ny152 (AT) none (DOT) invalid>
wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A915154EB7castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

I'm using a robots.txt file to control what is and is not crawled
by search engine bots but I'd like to block anything that isn't a
known search engine bot doesn't get the file I'm feeding to google,
yahoo and the others.

Why?

I can imagine that you want to block your entire site for any bot
that's known to be abusive though, but those probably don't check
your robots.txt anyway.


Perhaps I didn't say it right. I'm wanting to block the robots.txt
that I'm feeding search engines from being given to anybody else.

Why? If the reason is that you want to "protect" some folders: it's
not secure and bound to fail sooner or later. Remember that not all
bots honor the robots.txt, especially not the ones that you don't want
on your site in the first place.

I want to keep certain humans from reading the robots.txt that I give to
search engines because it's none of their bloody business what pages I
tell SE's not to index and there are a few that might have mind enough to
look at robots.txt They will not however expect to be handed a tailored
version of it.

I
realize that they *could* spoof the SE's user agent or something, but
my concerns are bright enough to look for robots.txt but not bright
enough to expect to be handed a phoney

You want to hide the key under the doormat which has in 5 languages
"The key is hidden nearby" written on top...

Not really, or is it possible that they could also get my .htaccess? I
didn't think that was possible. If they ask for a robots.txt and get one
that's got nothing more than a pointer to a sitemap that will satisfy
'em.
Essentially you'd need to claok, feed different content to different
requests. Ask Fantomaster.

BB
--

http://www.kruse.co.uk/
http://www.fat-odin.com/
http://www.here-be-posters.co.uk/


Reply With Quote
  #10  
Old   
Big Bill
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 06:54 AM



On Thu, 21 Feb 2008 00:10:25 -0600, Joe Fox <ny152 (AT) none (DOT) invalid>
wrote:

Quote:
Phil Payne <phil (AT) isham-research (DOT) co.uk> wrote in
news:adf1b99f-5c4d-48a8-b348-49a9cacc2453 (AT) n77g2000hse (DOT) googlegroups.com:

Perhaps I didn't say it right. *I'm wanting to block the robots.txt
that

I'm feeding search engines from being given to anybody else.

If Google catch you they will exclude you from the index.

'Don't deceive your users or present different content to search
engines than you display to users, which is commonly referred to as
"cloaking." '



I can't believe this.

I'm not trying to cloak my content or pull anything underhanded.
You are, though, you're trying to cloak your robots.txt.

BB
--

http://www.kruse.co.uk/
http://www.fat-odin.com/
http://www.here-be-posters.co.uk/


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.