HighDots Forums  

Re: Google causing excessive bandwidth uasage.

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss Re: Google causing excessive bandwidth uasage. in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Stan Brown
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-19-2005 , 10:21 AM






Sat, 19 Nov 2005 10:32:26 +0000 from Philip Ronan
<invalid (AT) invalid (DOT) invalid>:

Quote:
"Doug Laidlaw" wrote:
Google has been around to my site twice this month and downloaded almost a
GB, putting me over my bandwidth limit both times I imagine that if I
wasn't paying a flat fee, that would be costing me money.
Is there a way of limiting this while at the same time allowing Google
reasonable indexing?

alt.internet.search-engines might have been a better place to ask.
(follow-ups redirected accordingly)

Quote:
Then I don't really see what the problem is. You've got all this content on
your website, and presumably you want it indexed by Google. So you can't
complain when the googlebot comes along and looks at the stuff.
I'm _not_ paying a flat fee, unlike the OP, and I'd like to know the
answer to this also.

Quote:
I think you would be better off reading this:
http://www.google.com/intl/en/webmasters/bot.html
Good heavens! That page says Google trawls my site every few
_seconds_. Not long ago I remember it used to be every few _days_. I
noticed activity on my site grew quite a bit a little less that a
year ago; I wonder if this was the reason?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/200..._wont_help_you


Reply With Quote
  #2  
Old   
Alan J. Flavell
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-19-2005 , 11:52 AM






On Sat, 19 Nov 2005, Stan Brown wrote:

Quote:
http://www.google.com/intl/en/webmasters/bot.html

Good heavens! That page says Google trawls my site every few
_seconds_.
I don't think so! It says the server shouldn't get *an* access from
Googlebot more often than a few seconds. That's a rate control
mechanism, not a frequency of revisiting.

Though I'm a bit surprised to see that when I count up the log entries
for Googlebot on our server, I count some 68K accesses in the current
log, 13th November onwards, out of the total of some 400K accesses
over that period.

But the accesses are clustered by date, implying that they're doing a
trawl no more than twice a week (in that week) or once (30K hits in
the previous week, in just a single cluster), with only a few hundreds
of hits per day on the intermediate days.

I see most of the Google accesses here are returning status 200,
although the references to my own personal space are mostly returning
status 304. But I see a few cases where my "xbithack full" pages
are missing the g+x bit, which I need to rectify.

Hmmm, I have to look into those status 200 responses elsewhere, and
probably do something about it. I have a theory.



Reply With Quote
  #3  
Old   
Philip Ronan
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-19-2005 , 12:41 PM



"Stan Brown" wrote:

Quote:
Sat, 19 Nov 2005 10:32:26 +0000 from Philip Ronan
invalid (AT) invalid (DOT) invalid>:

I think you would be better off reading this:
http://www.google.com/intl/en/webmasters/bot.html

Good heavens! That page says Google trawls my site every few
_seconds_. Not long ago I remember it used to be every few _days_. I
noticed activity on my site grew quite a bit a little less that a
year ago; I wonder if this was the reason?
What it actually says is "For most sites, Googlebot shouldn't access your
site more than once every few seconds on average." Think of that as a hit
rate. It would be pointless trawling through your *entire site* every few
seconds. In my experience the Googlebot generates no more traffic than an
ordinary visitor to the site.

It just occurred to me that the problems you and the OP are experiencing
might be caused by things like poor cacheability. You're both generating
pages dynamically, aren't you? Are they cacheable? Can they handle
conditional requests? If not, you're creating extra traffic for your site,
and not just from the search engine robots.

Here's your homework:

1. Read RFC2616, especially the bits about conditional requests
2. Check your content for cacheability

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/



Reply With Quote
  #4  
Old   
Nick Kew
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-19-2005 , 05:56 PM



Stan Brown wrote:
Quote:
(follow-ups redirected accordingly)
And ignored. I'm not posting *only* to a group I don't read.

Quote:
Good heavens! That page says Google trawls my site every few
_seconds_. Not long ago I remember it used to be every few _days_. I
Erm, that'll be URLs that get visited at a high rate while it's
spidering. So if it visits one per minute and you have 1440 pages,
it'll take one day to spider the site from scratch.

It'll then revisit in [???] days/weeks to check for changes.

--
Nick Kew


Reply With Quote
  #5  
Old   
Alan J. Flavell
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-19-2005 , 06:18 PM



On Sat, 19 Nov 2005, Stan Brown wrote:

Quote:
alt.internet.search-engines might have been a better place to ask.
(follow-ups redirected accordingly)
Urgl. I missed that, first time, but this server doesn't do alt
groups. So here goes again, including a group that I not only read
but can post to...

Quote:
http://www.google.com/intl/en/webmasters/bot.html

Good heavens! That page says Google trawls my site every few
_seconds_.
I don't think so! It says the server shouldn't get *an* access from
Googlebot more often than a few seconds. That's a rate control
mechanism, not a frequency of revisiting.

Though I'm a bit surprised to see that when I count up the log entries
for Googlebot on our server, I count some 68K accesses in the current
log, 13th November onwards, out of the total of some 400K accesses
over that period.

But the accesses are clustered by date, implying that they did a trawl
twice this week - or once (30K hits in the previous week, in just a
single cluster), with only a few hundreds of Googlebot hits per day on
the intermediate days (presumably to re-check pages which were
recently active?).

I see most of the Googlebot accesses here are returning status 200.
The references to my own personal space are mostly returning status
304, but I see a few cases where my "xbithack full" pages are missing
the g+x bit, and so they always return status 200, which I need to
rectify.

Hmmm, and I have to look into those status 200 responses elsewhere on
the server, and probably do something about it. I have a theory.



Reply With Quote
  #6  
Old   
David
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-19-2005 , 07:47 PM



On Sat, 19 Nov 2005 17:41:21 GMT, Philip Ronan
<invalid (AT) invalid (DOT) invalid> wrote:

Quote:
In my experience the Googlebot generates no more traffic than an
ordinary visitor to the site.
It depends a lot on the size of the site, a small site then yes a lot
like a very interested visitor (most real visitors view a small number
of pages, unlike the bots), but a large site you feel like you've been
mugged some visits :-))

Does depend a lot on the number and quality (PR) of the links to a
site though.

David
--
Free Search Engine Optimization Tutorial
http://www.seo-gold.com/tutorial/


Reply With Quote
  #7  
Old   
Stan Brown
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-20-2005 , 07:59 AM



Sat, 19 Nov 2005 17:41:21 GMT from Philip Ronan
<invalid (AT) invalid (DOT) invalid>:
Quote:
"Stan Brown" wrote:

Sat, 19 Nov 2005 10:32:26 +0000 from Philip Ronan
invalid (AT) invalid (DOT) invalid>:

I think you would be better off reading this:
http://www.google.com/intl/en/webmasters/bot.html

Good heavens! That page says Google trawls my site every few
_seconds_. Not long ago I remember it used to be every few _days_. I
noticed activity on my site grew quite a bit a little less that a
year ago; I wonder if this was the reason?

What it actually says is "For most sites, Googlebot shouldn't access your
site more than once every few seconds on average." Think of that as a hit
rate. It would be pointless trawling through your *entire site* every few
seconds. In my experience the Googlebot generates no more traffic than an
ordinary visitor to the site.
Thanks, that makes more sense.

Quote:
It just occurred to me that the problems you and the OP are experiencing
might be caused by things like poor cacheability. You're both generating
pages dynamically, aren't you? Are they cacheable? Can they handle
conditional requests? If not, you're creating extra traffic for your site,
and not just from the search engine robots.
No, my pages are all static, and (just checked with lynx -dump -head)
the server does return last-modified dates. So, unless I'm
misunderstanding you, they're pretty darn cacheable. :-)

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
"If there's one thing I know, it's men. I ought to: it's
been my life work." -- Marie Dressler, in /Dinner at Eight/


Reply With Quote
  #8  
Old   
Philip Ronan
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-20-2005 , 06:27 PM



"Stan Brown" wrote:

Quote:
No, my pages are all static, and (just checked with lynx -dump -head)
the server does return last-modified dates. So, unless I'm
misunderstanding you, they're pretty darn cacheable. :-)
But you're still having your bandwidth eaten up by Googlebot? That's odd.
Only 1% of my traffic comes from Googlebots (12,400 hits last month). The
site has about 3000 pages indexed, IIRC.

What sort of traffic are you getting?

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/



Reply With Quote
  #9  
Old   
Stan Brown
 
Posts: n/a

Default Re: Google causing excessive bandwidth uasage. - 11-20-2005 , 07:55 PM



Sun, 20 Nov 2005 23:27:22 GMT from Philip Ronan
<invalid (AT) invalid (DOT) invalid>:
Quote:
"Stan Brown" wrote:

No, my pages are all static, and (just checked with lynx -dump -head)
the server does return last-modified dates. So, unless I'm
misunderstanding you, they're pretty darn cacheable. :-)

But you're still having your bandwidth eaten up by Googlebot? That's odd.
That is not what I said. I said I'd noticed a sudden upsurge in usage
about a year ago (never going back down) and wondered whether it was
Google starting to recheck the site every few seconds. But someone
pointed out that I'd misread the Google page, so that's not the
explanation.

(I should have known anyway that it wasn't, because I examined the
logs and couldn't see any one domain taking the lion's share of the
accesses. Maybe my site just got popular. :-)

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
"If there's one thing I know, it's men. I ought to: it's
been my life work." -- Marie Dressler, in /Dinner at Eight/


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.