![]() | |
![]() |
| | Thread Tools | Display Modes |
#21
| |||
| |||
|
|
Can we simply agree to disagree and save discussion of *why* for another time and go into some details about *how*? |
#22
| |||
| |||
|
|
As I said in another post, Can we save discussion of *why* for another time and talk about *how*? |
#23
| |||
| |||
|
|
-- John Bokma http://johnbokma.com/ |
#24
| |||
| |||
|
|
Joe Fox <ny152 (AT) none (DOT) invalid> wrote: Can we simply agree to disagree and save discussion of *why* for another time and go into some details about *how*? Welcome to Usenet. Remember people try to help you in *their* spare time, for *free*. |
|
That being said: there are two ways that might do what you want: 1 IP address based: you have to find out the IP address ranges each bot you want to allow. 2 UserAgent string based: you have to find out each UA string for each bot you want to allow. In .htaccess you can redirect internally using either 1 or 2 to the right robots.txt. |
#25
| |||
| |||
|
|
I'm using a robots.txt file to control what is and is not crawled by search engine bots but I'd like to block anything that isn't a known search engine bot doesn't get the file I'm feeding to google, yahoo and the others. From what I've read this could be done with .htacess but I've not been able to make heads or tails out of that. I'd really be grateful for some help here. Thanks |
#26
| |||
| |||
|
|
Joe Fox <ny152 (AT) none (DOT) invalid> wrote: I'm using a robots.txt file to control what is and is not crawled by search engine bots but I'd like to block anything that isn't a known search engine bot doesn't get the file I'm feeding to google, yahoo and the others. Why? I can imagine that you want to block your entire site for any bot that's known to be abusive though, but those probably don't check your robots.txt anyway. |
#27
| |||
| |||
|
|
John Bokma <john (AT) castleamber (DOT) com> wrote in news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4: Joe Fox <ny152 (AT) none (DOT) invalid> wrote: I'm using a robots.txt file to control what is and is not crawled by search engine bots but I'd like to block anything that isn't a known search engine bot doesn't get the file I'm feeding to google, yahoo and the others. Why? I can imagine that you want to block your entire site for any bot that's known to be abusive though, but those probably don't check your robots.txt anyway. Perhaps I didn't say it right. I'm wanting to block the robots.txt that I'm feeding search engines from being given to anybody else. I realize that they *could* spoof the SE's user agent or something, but my concerns are bright enough to look for robots.txt but not bright enough to expect to be handed a phoney |
#28
| |||
| |||
|
|
John Bokma <john (AT) castleamber (DOT) com> wrote in news:Xns9A4A915154EB7castleamber (AT) 130 (DOT) 133.1.4: Joe Fox <ny152 (AT) none (DOT) invalid> wrote: John Bokma <john (AT) castleamber (DOT) com> wrote in news:Xns9A4A4758D9F83castleamber (AT) 130 (DOT) 133.1.4: Joe Fox <ny152 (AT) none (DOT) invalid> wrote: I'm using a robots.txt file to control what is and is not crawled by search engine bots but I'd like to block anything that isn't a known search engine bot doesn't get the file I'm feeding to google, yahoo and the others. Why? I can imagine that you want to block your entire site for any bot that's known to be abusive though, but those probably don't check your robots.txt anyway. Perhaps I didn't say it right. I'm wanting to block the robots.txt that I'm feeding search engines from being given to anybody else. Why? If the reason is that you want to "protect" some folders: it's not secure and bound to fail sooner or later. Remember that not all bots honor the robots.txt, especially not the ones that you don't want on your site in the first place. I want to keep certain humans from reading the robots.txt that I give to search engines because it's none of their bloody business what pages I tell SE's not to index and there are a few that might have mind enough to look at robots.txt They will not however expect to be handed a tailored version of it. I realize that they *could* spoof the SE's user agent or something, but my concerns are bright enough to look for robots.txt but not bright enough to expect to be handed a phoney You want to hide the key under the doormat which has in 5 languages "The key is hidden nearby" written on top... Not really, or is it possible that they could also get my .htaccess? I didn't think that was possible. If they ask for a robots.txt and get one that's got nothing more than a pointer to a sitemap that will satisfy 'em. |
#29
| |||
| |||
|
|
Joe Fox <ny152 (AT) none (DOT) invalid> wrote: Not really, or is it possible that they could also get my .htaccess? I didn't think that was possible. If they ask for a robots.txt and get one that's got nothing more than a pointer to a sitemap that will satisfy 'em. Let's assume for arguments sake that those people *want* to see your robots.txt. If you feed Google something different than them, they will notice as soon as they check Google, because if you disallow Google some directories, while your robots.txt says allow, they will wonder why all pages in some directory don't show up in Google, but are available on your site. |

#30
| |||
| |||
|
|
Perhaps I didn't say it right. *I'm wanting to block the robots.txt that I'm feeding search engines from being given to anybody else. If Google catch you they will exclude you from the index. 'Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking." ' |
![]() |
| Thread Tools | |
| Display Modes | |
| |