![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||
| |||
|
|
Philip Ronan: the robots.txt protocol is ineffective on (probably) most servers because it can be circumvented without your knowledge by a third party. It always has been, anyway. For numerous reasons. Your multiple slash example is just one of them. Some robots will ignore them altogether, others will deliberately look at what you tell them to ignore. |
#12
| |||
| |||
|
|
Philip Ronan: the robots.txt protocol is ineffective on (probably) most servers because it can be circumvented without your knowledge by a third party. It always has been, anyway. For numerous reasons. Your multiple slash example is just one of them. Some robots will ignore them altogether, others will deliberately look at what you tell them to ignore. |
#13
| |||
| |||
|
|
It always has been, anyway. For numerous reasons. Your multiple slash example is just one of them. Some robots will ignore them altogether, others will deliberately look at what you tell them to ignore. |
|
The robots.txt protocol has always been ineffective on bad robots, but this is, as far as I know, the first example of it being ineffective on good robots. |
#14
| |||
| |||
|
|
On Sun, 30 Oct 2005 21:45:32 +0100, Dave0x1 <ask (AT) example (DOT) com> wrote: It's not clear exactly what the problem *is*. I've never seen a URL with multiple adjacent forward slashes in my search results. Does someone have an example? |
|
All of these generated 404 in last few weeks on my site. No additional slashes inside of the url, although several times they were added at the end. & vs & and wrong capitalization (bate, casc instead of BATE, CASC) are most prominent sources of errors. But it seems every error is possible ![]() |
#15
| |||
| |||
|
|
Dave0x1 wrote: It's not clear exactly what the problem *is*. I've never seen a URL with multiple adjacent forward slashes in my search results. If there exists a way for someone else on the Internet to override your spidering decisions as defined in robots.txt, there will be those who use that ability to inconvenience, harass or hurt others. |
#16
| |||
| |||
|
|
"Dave0x1" wrote: I don't understand why this is a big deal. The issue can be addressed by numerous methods, including patching of the Apache web server source code. OK, so as long as the robots.txt documentation includes a note saying that you have to patch your server software to get reliable results, then we'll all be fine. |
|
It's not clear exactly what the problem *is*. I've never seen a URL with multiple adjacent forward slashes in my search results. Does someone have an example? Which bit didn't I explain properly? I'm not going to post a link for you to check, but here's the response I got from Google on the issue: Thank you for your note. We apologize for our delayed response. We understand you're concerned about the inclusion of http://###.####.###//contact/ in our index. |
#17
| |||
| |||
|
|
Guy Macon wrote: Dave0x1 wrote: It's not clear exactly what the problem *is*. I've never seen a URL with multiple adjacent forward slashes in my search results. If there exists a way for someone else on the Internet to override your spidering decisions as defined in robots.txt, there will be those who use that ability to inconvenience, harass or hurt others. A robots.txt file doesn't make any decisions about which parts of a site are indexed; it merely offers suggestions. Dave |
#18
| |||
| |||
|
|
Guy Macon wrote: Dave0x1 wrote: It's not clear exactly what the problem *is*. I've never seen a URL with multiple adjacent forward slashes in my search results. If there exists a way for someone else on the Internet to override your spidering decisions as defined in robots.txt, there will be those who use that ability to inconvenience, harass or hurt others. A robots.txt file doesn't make any decisions about which parts of a site are indexed; it merely offers suggestions. |
#19
| |||
| |||
|
|
Philip Ronan wrote: "Dave0x1" wrote: I don't understand why this is a big deal. The issue can be addressed by numerous methods, including patching of the Apache web server source code. OK, so as long as the robots.txt documentation includes a note saying that you have to patch your server software to get reliable results, then we'll all be fine. I wouldn't consider patching of the Apache source code either necessary or desirable in this situation. |
|
Does the URL in question appear in the index as http://###.####.###//contact/>, or as <http://###.####.###/contact/>? My assumption is the latter. |
#20
| |||
| |||
|
|
"Dave0x01" wrote: Philip Ronan wrote: OK, so as long as the robots.txt documentation includes a note saying that you have to patch your server software to get reliable results, then we'll all be fine. I wouldn't consider patching of the Apache source code either necessary or desirable in this situation. I was being sarcastic. (You're American, right?) |
|
Does the URL in question appear in the index as http://###.####.###//contact/>, or as <http://###.####.###/contact/>? My assumption is the latter. Then what the hell do you think this thread is all about?? |
![]() |
| Thread Tools | |
| Display Modes | |
| |