![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
After three fairly constant years the number of hits I am getting from bots has been double the usual for the past two months and there is no slackening off this month. Is anything up? My site has had only tiny alterations. |
#3
| |||
| |||
|
|
Douglas Clark wrote: After three fairly constant years the number of hits I am getting from bots has been double the usual for the past two months and there is no slackening off this month. Is anything up? My site has had only tiny alterations. You are not alone. There has been a serious increase of overall bots activity. Each search engine has its own reason to send the bots out more often though: Google messed up their index cache during the last big update and now needs to catch up with Y! and Ask before users notice something bad happened. That’s my Google theory. Due to Google’s secrecy the number of theories out there almost equals to the number of people trying to crack that problem, googlers themselves included. Y! wants to beat Google to the largest index size and so it needs to crawl more pages. The more pages they get the more links to your pages they discover whish sets a flag for their bot to visit the site again. Ask is re-branding itself into a mainstream search engine and is pumping some serious money into both infrastructure and marketing. I guess, Teoma crawls more (since last year, actually) simply because they got more machines to run it from. |
#4
| ||||||
| ||||||
|
|
Douglas Clark wrote: After three fairly constant years the number of hits I am getting from bots has been double the usual for the past two months and there is no slackening off this month. Is anything up? My site has had only tiny alterations. You are not alone. There has been a serious increase of overall bots activity. Each search engine has its own reason to send the bots out more often though: |
|
Google messed up their index cache during the last big update and now needs to catch up with Y! and Ask before users notice something bad happened. That's my Google theory. Due to Google's secrecy the number of theories out there almost equals to the number of people trying to crack that problem, googlers themselves included. |
|
Y! wants to beat Google to the largest index size and so it needs to crawl more pages. The more pages they get the more links to your pages they discover whish sets a flag for their bot to visit the site again. |
|
Ask is re-branding itself into a mainstream search engine and is pumping some serious money into both infrastructure and marketing. I guess, Teoma crawls more (since last year, actually) simply because they got more machines to run it from. |
|
MSN is still toying with their algorithms and they look like from time to time they dump large chunks of data from their database and need to re-crawl the sites again to restore it |
|
A bunch of smaller guys are full of ambition to become the next Google, and so they need their own cache of your site so they can analyze it to death. I hope that pretty much covers most of it. Oh yeah, and there is always rogue bots out there, of course, trying your site for all kinds of exploits, so keep your shields up! |
#5
| ||||||
| ||||||
|
|
Google messed up their index cache during the last big update and now needs to catch up with Y! and Ask before users notice something bad happened. That's my Google theory. Due to Google's secrecy the number of theories out there almost equals to the number of people trying to crack that problem, googlers themselves included. |
|
I have not heard this theory before. I haven't noticed any degradation in terms of search results either. Suggesting that Google have fallen behind is something that would make big headlines (same with studies that argue |
|
Google lost a top position), so it's probably just wishful thinking. |
|
In operation, I am sure that they take into consideration all such risks and replicate the data as required. Even "Big Daddy" seems to have been corrected/re-aligned. |
|
I hope that pretty much covers most of it. Oh yeah, and there is always rogue bots out there, of course, trying your site for all kinds of exploits, so keep your shields up! |
|
Shields up? I am not sure about exclusions. |
#6
| ||||
| ||||
|
|
Roy Schestowitz wrote: Google messed up their index cache during the last big update and now needs to catch up with Y! and Ask before users notice something bad happened. That's my Google theory. Due to Google's secrecy the number of theories out there almost equals to the number of people trying to crack that problem, googlers themselves included. I have not heard this theory before. I haven't noticed any degradation in terms of search results either. Suggesting that Google have fallen behind is something that would make big headlines (same with studies that argue Well, lucky you, my friend, lucky you! As for the rest of us (see threads on Webmasterworld such as this: http://www.webmasterworld.com/forum30/34228.htm or this: http://www.webmasterworld.com/forum30/34061.htm ) funny things are happening indeed. People (myself included) report massive page drop-outs, on a scale of 90-99% of the site being gone. Like I said, there are plenty of theories why but there is no question about the fact that something (bad) is going on. |
|
Google lost a top position), so it's probably just wishful thinking. Well, for the better or worse they have ALREADY lost their top position on my sites! Yahoo had almost replaced the traffic that I lost from Google which is the only reason I tolerate exorbitant Yahoo Slurp! slurping rate ;-) 9.70GB on a single site since May 1st, 2006 |
|
In operation, I am sure that they take into consideration all such risks and replicate the data as required. Even "Big Daddy" seems to have been corrected/re-aligned. Replicate? Maybe, maybe not. It depends on whether we can trust their own words about running out of capacity (earlier thread here). For reliable replication you need twice the amount of storage which they don't seem to have enough even for the original data. Besides, have you ever had a database with the index(es) messed up? It could be pretty frustrating indeed. You know that the data is there but you cannot get to it (fast enough), which to the outside would look just like you simply lost that data. |
|
I hope that pretty much covers most of it. Oh yeah, and there is always rogue bots out there, of course, trying your site for all kinds of exploits, so keep your shields up! Shields up? I am not sure about exclusions. Well, not in a way of putting everything into the robot.txt file of course. As a matter of fact, to keep ratbots (good term, I'll use it) on a leash DO NOT put anything sensitive into the robots.txt This is the first thing they check: what they are not supposed to access and then they surely try to access it to see what kind of informative errors they can generate. |
#7
| |||
| |||
|
|
Suggesting that Google have fallen behind is something that would make big headlines (same with studies that argue lost a top position) |
#8
| |||
| |||
|
|
Ouch! I believe you have your own dedicated server, fortunately. Maybe you should get another one and spray red "Y!" over it. Sorry, I know it's no place for sarcasm... |
|
Replication can be done more efficiently than that. Since much of the content (that you care about) is textual, one could compress content as set it aside. Compression algorithms can reduce natural text to about 10-20% of its original size. I don't know how large their indices are (compared with full text, i.e. Google Cache), but dumping of that data certainly does not depend on the way it's stored/structured. If they don't back up their data and send it to a remote location, they play a very risky game. I'm assuming that the datacentres serves them as some arrays of redundancy /already/. |
#9
| |||
| |||
|
|
Replication can be done more efficiently than that. Since much of the content (that you care about) is textual, one could compress content as set it aside. Compression algorithms can reduce natural text to about 10-20% of its original size. |
#10
| |||
| |||
|
|
Roy Schestowitz wrote: Suggesting that Google have fallen behind is something that would make big headlines (same with studies that argue lost a top position) It just did: http://www.eweek.com/article2/0,1895,1959865,00.asp Not exactly CNN but large enough publication for the subject that's so technical in nature. |
|
Is 'Big Daddy' Choking Google? Web site operators are clamoring to understand what can best be described as an ongoing disturbance in the Google Force. Google's search engine, once a clean, lean indexing machine, from a Webmaster's perspective has been slipping badly lately. `---- |
|
Comment It's hard to imagine now, but there was a time when the mainstream press was barely acquainted with the genius and foresight of today's technology leaders. `---- |
![]() |
| Thread Tools | |
| Display Modes | |
| |