HighDots Forums  

2 different google crawlers

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss 2 different google crawlers in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Wolfman's Brother
 
Posts: n/a

Default 2 different google crawlers - 07-13-2005 , 08:02 AM






I am seeing two different sets of "browser" strings google's crawler in
my apache logs.

The majority of the traffic comes from
"Googlebot/2.1 (+http://www.google.com/bot.html)"
which visits me on average 2000 times a day (just under)

And rather less from
""Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"

Which on average visits me 300 times per day, but is much more "lumpy"
in that it will leave me alone for weeks on end, and then suddenly hits
me hard.

I have noticed an odd pattern in the requests from this second crawler
in that it seems to crawl my site in an order that depends on the length
of the URLs. For example: It'll take all my 20-character URLs, then all
my 21-character ones -- etc.

Another oddity is that it is calling for pages that dont exist on my
site, and never have - and I am sure they are not pages that I have
wrongly linked to from others on my site. The first robot doest exhibit
this behaviour.

The IP addresses of this second robot confirm that it is the real
google, not someone else pretending to be.

This robot hit me really heard about 9 months ago and a week later the
traffic to my site dropped like a stone and I'm still trying to recover.

Any clues as to what this second robot type is doing?

Chris

--
http://www.lowth.com/rope - Linux + IpTables + ROPE = effective control
of complex protocols, including many popular
Peer-to-peer ones.

Reply With Quote
  #2  
Old   
Wolfman's Brother
 
Posts: n/a

Default Re: 2 different google crawlers - 07-13-2005 , 08:15 AM






Wolfman's Brother wrote:

Quote:
I am seeing two different sets of "browser" strings google's crawler in
my apache logs.

The majority of the traffic comes from
"Googlebot/2.1 (+http://www.google.com/bot.html)"
which visits me on average 2000 times a day (just under)

And rather less from
""Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"

Which on average visits me 300 times per day, but is much more "lumpy"
in that it will leave me alone for weeks on end, and then suddenly hits
me hard.

I have noticed an odd pattern in the requests from this second crawler
in that it seems to crawl my site in an order that depends on the length
of the URLs. For example: It'll take all my 20-character URLs, then all
my 21-character ones -- etc.

Another oddity is that it is calling for pages that dont exist on my
site, and never have - and I am sure they are not pages that I have
wrongly linked to from others on my site. The first robot doest exhibit
this behaviour.

The IP addresses of this second robot confirm that it is the real
google, not someone else pretending to be.

This robot hit me really heard about 9 months ago and a week later the
traffic to my site dropped like a stone and I'm still trying to recover.

Any clues as to what this second robot type is doing?
Update: Just discovered that this second crawler is also downloading my
javascript sources as linked to with "<script src=..>" tags (no other
crawler seems to do this) - is it possible that google is experimenting
with actually executing or checking javascript now [not that I do
anything "undesirable" with it, I hasten to add!].

Chris
--
http://www.lowth.com/rope - Linux + IpTables + ROPE = effective control
of complex protocols, including many popular
Peer-to-peer ones.


Reply With Quote
  #3  
Old   
Big Bill
 
Posts: n/a

Default Re: 2 different google crawlers - 07-13-2005 , 09:50 AM



On Wed, 13 Jul 2005 12:15:31 GMT, Wolfman's Brother
<my.address (AT) is (DOT) chris.at.lowth.dot.com> wrote:

Quote:
Wolfman's Brother wrote:

I am seeing two different sets of "browser" strings google's crawler in
my apache logs.

The majority of the traffic comes from
"Googlebot/2.1 (+http://www.google.com/bot.html)"
which visits me on average 2000 times a day (just under)

And rather less from
""Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"

Which on average visits me 300 times per day, but is much more "lumpy"
in that it will leave me alone for weeks on end, and then suddenly hits
me hard.

I have noticed an odd pattern in the requests from this second crawler
in that it seems to crawl my site in an order that depends on the length
of the URLs. For example: It'll take all my 20-character URLs, then all
my 21-character ones -- etc.

Another oddity is that it is calling for pages that dont exist on my
site, and never have - and I am sure they are not pages that I have
wrongly linked to from others on my site. The first robot doest exhibit
this behaviour.

The IP addresses of this second robot confirm that it is the real
google, not someone else pretending to be.

This robot hit me really heard about 9 months ago and a week later the
traffic to my site dropped like a stone and I'm still trying to recover.

Any clues as to what this second robot type is doing?

Update: Just discovered that this second crawler is also downloading my
javascript sources as linked to with "<script src=..>" tags (no other
crawler seems to do this) - is it possible that google is experimenting
with actually executing or checking javascript now [not that I do
anything "undesirable" with it, I hasten to add!].

Chris
Rumour has had it that they're trying to do this, yes.

BB
--
www.kruse.co.uk/ seo (AT) kruse (DOT) demon.co.uk
seo that watches the river flow...
--


Reply With Quote
  #4  
Old   
John Bokma
 
Posts: n/a

Default Re: 2 different google crawlers - 07-13-2005 , 12:36 PM



Wolfman's Brother <my.address (AT) is (DOT) chris.at.lowth.dot.com> wrote:

Quote:
Update: Just discovered that this second crawler is also downloading
my javascript sources as linked to with "<script src=..>" tags (no
other crawler seems to do this) - is it possible that google is
experimenting with actually executing or checking javascript now [not
that I do anything "undesirable" with it, I hasten to add!].
Very likely not executing it, at least not in full, since that would be a
resource hog. They probably do what they already do (or so is understood):
looking for common URL patterns, and extracting those.

--
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html


Reply With Quote
  #5  
Old   
Eric Johnston
 
Posts: n/a

Default Re: 2 different google crawlers - 07-13-2005 , 06:04 PM




Quote:
Another oddity is that it is calling for pages that dont exist on my
site, and never have - and I am sure they are not pages that I have
wrongly linked to from others on my site. The first robot doest
exhibit this behaviour.

Any ideas about it trying to access:

amptqgesaz.html
qzjkqiszq.html
ucrczhwi.html
jnxyozbhan.html
hpvregay.html

Why would a reputable robot ask for 5 almost certainly absent files in quick
succession ? Is it probing for some abnormal or intermittent 404 response ?

There is also an occasional phenomena when something (I don't know what)
makes repeating calls for:
goodfilename1.htm&s=svsxahafavfp
goodfilename2.htm&s=svsxahafavfp
goodfilename3.htm&e=9707 (other popular numbers are 9888 and 10313)
This is conceptually similar and possibly probing the server for an abnormal
response to invalid extensions.

Best regards, Eric




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.