![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
I am seeing two different sets of "browser" strings google's crawler in my apache logs. The majority of the traffic comes from "Googlebot/2.1 (+http://www.google.com/bot.html)" which visits me on average 2000 times a day (just under) And rather less from ""Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Which on average visits me 300 times per day, but is much more "lumpy" in that it will leave me alone for weeks on end, and then suddenly hits me hard. I have noticed an odd pattern in the requests from this second crawler in that it seems to crawl my site in an order that depends on the length of the URLs. For example: It'll take all my 20-character URLs, then all my 21-character ones -- etc. Another oddity is that it is calling for pages that dont exist on my site, and never have - and I am sure they are not pages that I have wrongly linked to from others on my site. The first robot doest exhibit this behaviour. The IP addresses of this second robot confirm that it is the real google, not someone else pretending to be. This robot hit me really heard about 9 months ago and a week later the traffic to my site dropped like a stone and I'm still trying to recover. Any clues as to what this second robot type is doing? |
#3
| |||
| |||
|
|
Wolfman's Brother wrote: I am seeing two different sets of "browser" strings google's crawler in my apache logs. The majority of the traffic comes from "Googlebot/2.1 (+http://www.google.com/bot.html)" which visits me on average 2000 times a day (just under) And rather less from ""Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Which on average visits me 300 times per day, but is much more "lumpy" in that it will leave me alone for weeks on end, and then suddenly hits me hard. I have noticed an odd pattern in the requests from this second crawler in that it seems to crawl my site in an order that depends on the length of the URLs. For example: It'll take all my 20-character URLs, then all my 21-character ones -- etc. Another oddity is that it is calling for pages that dont exist on my site, and never have - and I am sure they are not pages that I have wrongly linked to from others on my site. The first robot doest exhibit this behaviour. The IP addresses of this second robot confirm that it is the real google, not someone else pretending to be. This robot hit me really heard about 9 months ago and a week later the traffic to my site dropped like a stone and I'm still trying to recover. Any clues as to what this second robot type is doing? Update: Just discovered that this second crawler is also downloading my javascript sources as linked to with "<script src=..>" tags (no other crawler seems to do this) - is it possible that google is experimenting with actually executing or checking javascript now [not that I do anything "undesirable" with it, I hasten to add!]. Chris |
#4
| |||
| |||
|
|
Update: Just discovered that this second crawler is also downloading my javascript sources as linked to with "<script src=..>" tags (no other crawler seems to do this) - is it possible that google is experimenting with actually executing or checking javascript now [not that I do anything "undesirable" with it, I hasten to add!]. |
#5
| |||
| |||
|
|
Another oddity is that it is calling for pages that dont exist on my site, and never have - and I am sure they are not pages that I have wrongly linked to from others on my site. The first robot doest exhibit this behaviour. Any ideas about it trying to access: |
![]() |
| Thread Tools | |
| Display Modes | |
| |