HighDots Forums  

bots

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss bots in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Douglas Clark
 
Posts: n/a

Default bots - 05-09-2006 , 04:12 PM






After three fairly constant years the number of hits I am getting from bots
has been double the usual for the past two months and there is no slackening
off this month. Is anything up? My site has had only tiny alterations.



--
Douglas Clark ..................... Bath, Somerset, UK ......
http://usergroup.plus.net .......... http://www.dgdclynx.plus.com




Reply With Quote
  #2  
Old   
www.1-script.com
 
Posts: n/a

Default Re: bots - 05-09-2006 , 05:35 PM






Douglas Clark wrote:


Quote:
After three fairly constant years the number of hits I am getting from
bots has been double the usual for the past two months and there is no
slackening off this month. Is anything up? My site has had only tiny
alterations.


You are not alone. There has been a serious increase of overall bots
activity. Each search engine has its own reason to send the bots out more
often though:

Google messed up their index cache during the last big update and now
needs to catch up with Y! and Ask before users notice something bad
happened. That’s my Google theory. Due to Google’s secrecy the number of
theories out there almost equals to the number of people trying to crack
that problem, googlers themselves included.

Y! wants to beat Google to the largest index size and so it needs to crawl
more pages. The more pages they get the more links to your pages they
discover whish sets a flag for their bot to visit the site again.

Ask is re-branding itself into a mainstream search engine and is pumping
some serious money into both infrastructure and marketing. I guess, Teoma
crawls more (since last year, actually) simply because they got more
machines to run it from.

MSN is still toying with their algorithms and they look like from time to
time they dump large chunks of data from their database and need to
re-crawl the sites again to restore it

A bunch of smaller guys are full of ambition to become the next Google,
and so they need their own cache of your site so they can analyze it to
death.

I hope that pretty much covers most of it. Oh yeah, and there is always
rogue bots out there, of course, trying your site for all kinds of
exploits, so keep your shields up!


--
Cheers,
Dmitri
See Site Sig Below



--
##-----------------------------------------------##
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums/
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 30817 messages and counting!
##-----------------------------------------------##


Reply With Quote
  #3  
Old   
Big Bill
 
Posts: n/a

Default Re: bots - 05-09-2006 , 07:20 PM



On Tue, 09 May 2006 21:35:55 GMT, info_at_1-script_dot_com (AT) foo (DOT) com
(www.1-script.com) wrote:

Quote:
Douglas Clark wrote:


After three fairly constant years the number of hits I am getting from
bots has been double the usual for the past two months and there is no
slackening off this month. Is anything up? My site has had only tiny
alterations.


You are not alone. There has been a serious increase of overall bots
activity. Each search engine has its own reason to send the bots out more
often though:

Google messed up their index cache during the last big update and now
needs to catch up with Y! and Ask before users notice something bad
happened. That’s my Google theory. Due to Google’s secrecy the number of
theories out there almost equals to the number of people trying to crack
that problem, googlers themselves included.

Y! wants to beat Google to the largest index size and so it needs to crawl
more pages. The more pages they get the more links to your pages they
discover whish sets a flag for their bot to visit the site again.

Ask is re-branding itself into a mainstream search engine and is pumping
some serious money into both infrastructure and marketing. I guess, Teoma
crawls more (since last year, actually) simply because they got more
machines to run it from.
Teoma no longer exists as a separate entity. A shame.

BB
--

http://www.kruse.co.uk/sandbox.htm
http://www.here-be-posters.co.uk/jim...ix-posters.htm
http://www.crystal-liaison.com/armani/index.html



Reply With Quote
  #4  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: bots - 05-10-2006 , 12:31 AM



__/ [ www.1-script.com ] on Tuesday 09 May 2006 22:35 \__

Quote:
Douglas Clark wrote:

After three fairly constant years the number of hits I am getting from
bots has been double the usual for the past two months and there is no
slackening off this month. Is anything up? My site has had only tiny
alterations.

You are not alone. There has been a serious increase of overall bots
activity. Each search engine has its own reason to send the bots out more
often though:

Ditto. You are not alone in this.


Quote:
Google messed up their index cache during the last big update and now
needs to catch up with Y! and Ask before users notice something bad
happened. That's my Google theory. Due to Google's secrecy the number of
theories out there almost equals to the number of people trying to crack
that problem, googlers themselves included.

I have not heard this theory before. I haven't noticed any degradation in
terms of search results either. Suggesting that Google have fallen behind is
something that would make big headlines (same with studies that argue Google
lost a top position), so it's probably just wishful thinking. In operation,
I am sure that they take into consideration all such risks and replicate the
data as required. Even "Big Daddy" seems to have been corrected/re-aligned.


Quote:
Y! wants to beat Google to the largest index size and so it needs to crawl
more pages. The more pages they get the more links to your pages they
discover whish sets a flag for their bot to visit the site again.

Yes, that's true.


Quote:
Ask is re-branding itself into a mainstream search engine and is pumping
some serious money into both infrastructure and marketing. I guess, Teoma
crawls more (since last year, actually) simply because they got more
machines to run it from.

They should probably aim for a niche if they haven't the required capacity.


Quote:
MSN is still toying with their algorithms and they look like from time to
time they dump large chunks of data from their database and need to
re-crawl the sites again to restore it

Yesterday I discovered that MSN put my at number 5 for 'othello'. That should
be a real embarrassment for them. I wasn't bombing that site _at all_. It
must have been their fluke, or else their algorithms remain as terrible as
ever. But then again, what else is new? *smile*


Quote:
A bunch of smaller guys are full of ambition to become the next Google,
and so they need their own cache of your site so they can analyze it to
death.

I hope that pretty much covers most of it. Oh yeah, and there is always
rogue bots out there, of course, trying your site for all kinds of
exploits, so keep your shields up!

Shields up? I am not sure about exclusions. However, it is good to keep an
eye on the logs/stats. Some ratbots can crawl an entire site within hours,
depending on its size and available bandwidth. This slows down real
visitors, crawlers and it can cost you money, as well.

Best wishes,

Roy

--
Roy S. Schestowitz | "World ends in five minutes - please log out"
http://Schestowitz.com | SuSE Linux ¦ PGP-Key: 0x74572E8E
5:20am up 12 days 12:17, 7 users, load average: 0.89, 0.49, 0.44
http://iuron.com - Open Source knowledge engine project


Reply With Quote
  #5  
Old   
www.1-script.com
 
Posts: n/a

Default Re: bots - 05-10-2006 , 11:47 AM



Roy Schestowitz wrote:


Quote:
Google messed up their index cache during the last big update and
now needs to catch up with Y! and Ask before users notice something
bad happened. That's my Google theory. Due to Google's secrecy the
number of theories out there almost equals to the number of people trying to
crack that problem, googlers themselves included.

Quote:
I have not heard this theory before. I haven't noticed any degradation
in terms of search results either. Suggesting that Google have fallen
behind is something that would make big headlines (same with studies that
argue

Well, lucky you, my friend, lucky you! As for the rest of us (see threads
on Webmasterworld such as this:
http://www.webmasterworld.com/forum30/34228.htm or this:
http://www.webmasterworld.com/forum30/34061.htm ) funny things are
happening indeed. People (myself included) report massive page drop-outs,
on a scale of 90-99% of the site being gone. Like I said, there are plenty
of theories why but there is no question about the fact that something
(bad) is going on.

Quote:
Google lost a top position), so it's probably just wishful thinking.
Well, for the better or worse they have ALREADY lost their top position on
my sites! Yahoo had almost replaced the traffic that I lost from Google
which is the only reason I tolerate exorbitant Yahoo Slurp! slurping rate
;-) 9.70GB on a single site since May 1st, 2006

Quote:
In operation, I am sure that they take into consideration all such risks and
replicate the data as required. Even "Big Daddy" seems to have been
corrected/re-aligned.
Replicate? Maybe, maybe not. It depends on whether we can trust their own
words about running out of capacity (earlier thread here). For reliable
replication you need twice the amount of storage which they don't seem to
have enough even for the original data. Besides, have you ever had a
database with the index(es) messed up? It could be pretty frustrating
indeed. You know that the data is there but you cannot get to it (fast
enough), which to the outside would look just like you simply lost that
data.


Quote:
I hope that pretty much covers most of it. Oh yeah, and there is
always
rogue bots out there, of course, trying your site for all kinds of
exploits, so keep your shields up!

Quote:
Shields up? I am not sure about exclusions.
Well, not in a way of putting everything into the robot.txt file of
course. As a matter of fact, to keep ratbots (good term, I'll use it) on a
leash DO NOT put anything sensitive into the robots.txt This is the first
thing they check: what they are not supposed to access and then they
surely try to access it to see what kind of informative errors they can
generate.




--
Cheers,
Dmitri
See Site Sig Below



--
##-----------------------------------------------##
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums/
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 30865 messages and counting!
##-----------------------------------------------##


Reply With Quote
  #6  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: bots - 05-10-2006 , 12:05 PM



__/ [ www.1-script.com ] on Wednesday 10 May 2006 16:47 \__

Quote:
Roy Schestowitz wrote:


Google messed up their index cache during the last big update and
now needs to catch up with Y! and Ask before users notice something
bad happened. That's my Google theory. Due to Google's secrecy the
number of theories out there almost equals to the number of people trying
to crack that problem, googlers themselves included.


I have not heard this theory before. I haven't noticed any degradation
in terms of search results either. Suggesting that Google have fallen
behind is something that would make big headlines (same with studies that
argue

Well, lucky you, my friend, lucky you! As for the rest of us (see threads
on Webmasterworld such as this:
http://www.webmasterworld.com/forum30/34228.htm or this:
http://www.webmasterworld.com/forum30/34061.htm ) funny things are
happening indeed. People (myself included) report massive page drop-outs,
on a scale of 90-99% of the site being gone. Like I said, there are plenty
of theories why but there is no question about the fact that something
(bad) is going on.

That's quite a shocker. I noticed a large-scale change on my site around the
19/20th of April (positive change if that matters), but it reached an end
last week, for no apparent reason.


Quote:
Google lost a top position), so it's probably just wishful thinking.

Well, for the better or worse they have ALREADY lost their top position on
my sites! Yahoo had almost replaced the traffic that I lost from Google
which is the only reason I tolerate exorbitant Yahoo Slurp! slurping rate
;-) 9.70GB on a single site since May 1st, 2006

Ouch! I believe you have your own dedicated server, fortunately. Maybe you
should get another one and spray red "Y!" over it. Sorry, I know it's no
place for sarcasm...


Quote:
In operation, I am sure that they take into consideration all such risks
and replicate the data as required. Even "Big Daddy" seems to have been
corrected/re-aligned.

Replicate? Maybe, maybe not. It depends on whether we can trust their own
words about running out of capacity (earlier thread here). For reliable
replication you need twice the amount of storage which they don't seem to
have enough even for the original data. Besides, have you ever had a
database with the index(es) messed up? It could be pretty frustrating
indeed. You know that the data is there but you cannot get to it (fast
enough), which to the outside would look just like you simply lost that
data.

Replication can be done more efficiently than that. Since much of the content
(that you care about) is textual, one could compress content as set it
aside. Compression algorithms can reduce natural text to about 10-20% of its
original size. I don't know how large their indices are (compared with full
text, i.e. Google Cache), but dumping of that data certainly does not depend
on the way it's stored/structured. If they don't back up their data and send
it to a remote location, they play a very risky game. I'm assuming that the
datacentres serves them as some arrays of redundancy /already/.


Quote:
I hope that pretty much covers most of it. Oh yeah, and there is
always
rogue bots out there, of course, trying your site for all kinds of
exploits, so keep your shields up!


Shields up? I am not sure about exclusions.

Well, not in a way of putting everything into the robot.txt file of
course. As a matter of fact, to keep ratbots (good term, I'll use it) on a
leash DO NOT put anything sensitive into the robots.txt This is the first
thing they check: what they are not supposed to access and then they
surely try to access it to see what kind of informative errors they can
generate.

I fully agree.

Best wishes,

Roy

--
Roy S. Schestowitz | "Have you compiled your kernel today?"
http://Schestowitz.com | Open Prospects ¦ PGP-Key: 0x74572E8E
4:55pm up 12 days 23:52, 8 users, load average: 0.73, 0.49, 0.52
http://iuron.com - knowledge engine, not a search engine


Reply With Quote
  #7  
Old   
www.1-script.com
 
Posts: n/a

Default Re: bots - 05-10-2006 , 03:22 PM



Roy Schestowitz wrote:


Quote:
Suggesting that Google have fallen
behind is
something that would make big headlines (same with studies that argue
Google
lost a top position)
It just did:

http://www.eweek.com/article2/0,1895,1959865,00.asp

Not exactly CNN but large enough publication for the subject that's so
technical in nature.

--
Cheers,
Dmitri
See Site Sig Below


--
##-----------------------------------------------##
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums/
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 30878 messages and counting!
##-----------------------------------------------##


Reply With Quote
  #8  
Old   
www.1-script.com
 
Posts: n/a

Default Re: bots - 05-10-2006 , 03:42 PM



Roy Schestowitz wrote:


Quote:
Ouch! I believe you have your own dedicated server, fortunately. Maybe
you
should get another one and spray red "Y!" over it. Sorry, I
know it's no
place for sarcasm...
Well, I do but the throughput is not un-metered. So, if they keep at that
rate (and they are not alone!) I'm going to start excluding bots based on
their respective engine's ROI ( not my idea:
http://www.shoemoney.com/2006/05/03/...searchbot-roi/
)

Quote:
Replication can be done more efficiently than that. Since much of the
content
(that you care about) is textual, one could compress content as set it
aside. Compression algorithms can reduce natural text to about 10-20%
of its
original size. I don't know how large their indices are (compared with
full
text, i.e. Google Cache), but dumping of that data certainly does not
depend
on the way it's stored/structured. If they don't back up their data and
send
it to a remote location, they play a very risky game. I'm assuming that
the
datacentres serves them as some arrays of redundancy /already/.
There is no reason to believe that they are not ALREADY storing data in a
compressed format. Google is big but they have to play by the same rules
as everybody else that uses databases. They are (were?) backing data up
until the point where the original data reaches half the size of their
storing capacity. Beyond that you have to decide whether you freeze the
size of your database (and Yahoo! is pushing theirs up, so there is no
stopping here) and selectively drop some old data. That's hard for Google
because for Google old==good. They can also buy more hard drives to the
point where actually powering them may become a HUGE expense in itself,
not even talking about maintenance and other associated expenses.

So, there is really no easy solution to your problems if you are the
worlds #1 search engine!


--
Cheers,
Dmitri
See Site Sig Below



--
##-----------------------------------------------##
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums/
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 30879 messages and counting!
##-----------------------------------------------##


Reply With Quote
  #9  
Old   
www.1-script.com
 
Posts: n/a

Default Re: bots - 05-10-2006 , 03:50 PM



Roy Schestowitz wrote:


Quote:
Replication can be done more efficiently than that. Since much of the
content
(that you care about) is textual, one could compress content as set it
aside. Compression algorithms can reduce natural text to about 10-20%
of its original size.
I forgot to mention one thing: Googlebot is already accepting gzipped data
if your server can send it. So, in my mind, that would definitely mean
that this is how they store the data as well.

--
Cheers,
Dmitri
See Site Sig Below



--
##-----------------------------------------------##
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums/
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 30882 messages and counting!
##-----------------------------------------------##


Reply With Quote
  #10  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: bots - 05-11-2006 , 12:48 AM



__/ [ www.1-script.com ] on Wednesday 10 May 2006 20:22 \__

Quote:
Roy Schestowitz wrote:


Suggesting that Google have fallen
behind is
something that would make big headlines (same with studies that argue
Google
lost a top position)

It just did:

http://www.eweek.com/article2/0,1895,1959865,00.asp

Not exactly CNN but large enough publication for the subject that's so
technical in nature.
Ahhh... I see...

,----[ Snippet ]
Quote:
Is 'Big Daddy' Choking Google?

Web site operators are clamoring to understand what can best be described
as an ongoing disturbance in the Google Force.

Google's search engine, once a clean, lean indexing machine, from
a Webmaster's perspective has been slipping badly lately.
`----

Coincidentally, I came across the following in The Register. I was going to
pass it on to you, but I held myself aback. FWIW:


http://www.theregister.co.uk/2006/05...crosoft_redux/

The worse Google gets, the more money it makes?

Microsoft once tragedy, twice farce

,----[ Quote ]
Quote:
Comment It's hard to imagine now, but there was a time when the
mainstream press was barely acquainted with the genius and foresight
of today's technology leaders.
`----

As usual, The Register correspondents are unnecessarily outspoken.

Best wishes,

Roy

--
Roy S. Schestowitz | "Error, no keyboard - press F1 to continue"
http://Schestowitz.com | SuSE GNU/Linux ¦ PGP-Key: 0x74572E8E
5:45am up 13 days 12:42, 8 users, load average: 1.69, 1.04, 0.82
http://iuron.com - help build a non-profit search engine


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.