HighDots Forums  

Is Google's database full?

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss Is Google's database full? in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Big Bill
 
Posts: n/a

Default Is Google's database full? - 09-07-2004 , 05:43 AM






might explain a few things. There's an article on the subject here.

http://www.w3reports.com/index.php?itemid=549

BB

Reply With Quote
  #2  
Old   
David Off
 
Posts: n/a

Default Re: Is Google's database full? - 09-07-2004 , 07:31 AM






Big Bill wrote:

Quote:
might explain a few things. There's an article on the subject here.

http://www.w3reports.com/index.php?itemid=549

BB

Can this problem be corrected? Sure it can, but Google has 15,000+ Linux
servers and 4.2 billion document_IDs to convert. This is not going to be an
easy task at this point.


1. long long doc_id

2. gcc google.cc -o google.exe

3. rcp google.exe all_servers

4. kill -HUP all_servers

5. erm... er, that's it!

hmmm sounds like Chicken Licken is doing the rounds to me but here is
another article on the same theme:

http://www.google-watch.org/dying.html

There has always been a lot of bleating going on from people who have
their sites dropped or don't think Google gives them the recognition
they deserve. But does Google have some kind of contract to index every
miserable little page on the web? I don't think so. Personally I think
they could drop some 75% of the stuff they index and still produced good
quality results.


Reply With Quote
  #3  
Old   
Sam
 
Posts: n/a

Default Re: Is Google's database full? - 09-07-2004 , 09:40 AM



Big Bill wrote:
Quote:
might explain a few things. There's an article on the subject here.

http://www.w3reports.com/index.php?itemid=549

BB

Yes it's definiteley true their database is completely full thanks to
seo dave and all his thousands of filler pages. Some people think that
they are the only ones that have to make a living. But does dave care
that you have to make a living too? NOOOOOOO! he don't, and he'll just
go on and on and on making thousands and thousands of filler pages and
say screw everyone else I'm taking google over. Today calssic
literature... Tommorrow..... (you know that's how Hitler got started).
naughty naughty Dave naughty!


Reply With Quote
  #4  
Old   
David Off
 
Posts: n/a

Default Re: Is Google's database full? - 09-07-2004 , 10:32 AM



Sam wrote:

Quote:
Big Bill wrote:

might explain a few things. There's an article on the subject here.

http://www.w3reports.com/index.php?itemid=549

BB



Yes it's definiteley true their database is completely full thanks to
seo dave and all his thousands of filler pages. Some people think that
they are the only ones that have to make a living. But does dave care
that you have to make a living too? NOOOOOOO! he don't
Ha Sam, didn't you say that there are winners and whingers in life?


Quote:
and he'll just
go on and on and on making thousands and thousands of filler pages
Now if the google phds new their shit their duplicate pages algorithm
would spot that they were all a copy of Project Gutenberg and drop them.


Reply With Quote
  #5  
Old   
David Off
 
Posts: n/a

Default Re: Is Google's database full? and is the sky falling in? - 09-07-2004 , 10:41 AM



Has Google run out?

June 13, 2003

Marta Peirano

A rumour is spreading like the plague over the infinite paths of the
Internet: Google has reached overbooking. The most popular creature in
the virtual universe has space problems, or, more precisely, it could
have them in the future. In fact, is seems that the most popular search
engine in the world is about to reach the limit of its capacity of
listed pages: 4,294,967,296. A numeric problem that is mainly due to a
calculation error.

In Google’s giant database, each link can occupy the space of only 4
bytes. In other words, even though the home page of the system has
counted something like three million web pages, it is probable that the
engine is about to run out. Or, if they do not change their system of
listing links, the Internet will continue to grow and expand behind
Google’s back, but the new sites that will be created will be left out
while its database will be full of an enormous quantity of obsolete
pages that will crowd the pages of search results.

The analysts arrived at this conclusion by studying the recent “strange
behaviour of the system” during the last renewal of its “contents.” In
fact, more or less once a month, Google reorders its listed pages.
During this process, the system calculates the so-called PageRank of
each page, based on the number of visits received and therefore, based
on order of importance. It then incorporates the new pages found into
its list of available web sites, periodically modifying the search
results. This process of updating is called Google Dance in jargon and
lasts approximately four days. However, during the last Google Dance
many pages changed their position in the classification in an
unexplainable way. This, together with other anomalous events has
created unrest among users.

Google became a reality on the Internet in 1997, this search system has
evolved and become, without a doubt, the most powerful instrument on the
Internet. Not only have its creators, Sergey Brin and Larry Page become
multimillionaires, but they have also become two of the greatest heroes
of the virtual community whose dominion extends all around the world,
thanks in particular to their list of links. In a certain sense, Google
is the spine of the Internet. And whatever problem afflicts its working
can influence the growth of the very same Internet. But what can they do
to continue to efficiently list a sea of pages that continue to multiply
exorbitantly?

If the problem really exists, and is grave, it must certainly be a
calculation error. Though the Internet is infinite, and can grow in an
unlimited way, its system of protocols (the map of letters and numbers
which allow us to navigate) is not. The greatest error was to design a
system of limited growth, as has already happened in the case of IP
technology. But similar almost panicky situations have already come up:
at the end of the year 1999, the entire world was apprehensive about the
possible consequences of passing from 99 to 00. All of the files would
go crazy, with computers convinced that they were working in 1900, and
create an infinity of problems? This psychophilosophical problem of
nervous media which manifested itself all over the world was called the
year 2000 effect. But everything went well and now we have not seen the
smallest trace of this problem.




Reply With Quote
  #6  
Old   
C.W.
 
Posts: n/a

Default Re: Is Google's database full? - 09-07-2004 , 10:53 AM



On Tue, 07 Sep 2004 16:32:08 +0200, David Off
<david.off_dumpthisbit_ (AT) voila (DOT) fr> wrote:

Quote:
Sam wrote:
[snip]
and he'll just
go on and on and on making thousands and thousands of filler pages

Now if the google phds new their shit their duplicate pages algorithm
would spot that they were all a copy of Project Gutenberg and drop them.
Same could possibly apply for sites that mirror usenet groups [and
those copies of posts get indexed]. Places like SearchGuild and such
wouldn't be affected if those mirrored pages weren't indexed since
they have other forums on their site.

Carol



Reply With Quote
  #7  
Old   
stoma
 
Posts: n/a

Default Re: Is Google's database full? - 09-07-2004 , 02:00 PM



On Tue, 07 Sep 2004 13:40:50 GMT, Sam <. (AT) mail (DOT) com> wrote:

Quote:
Yes it's definiteley true their database is completely full thanks to
seo dave and all his thousands of filler pages. Some people think that
they are the only ones that have to make a living. But does dave care
that you have to make a living too? NOOOOOOO! he don't, and he'll just
go on and on and on making thousands and thousands of filler pages and
say screw everyone else I'm taking google over. Today calssic
literature... Tommorrow..... (you know that's how Hitler got started).
naughty naughty Dave naughty!
Too right! As someone who is also guilty of wasting space in the index
just to have somewhere to post his text links, I've got to say that
Dave takes this tactic beyond the bounds of reason.

Why is he allowed to waste 217 precious DocIDs to post just one novel
by Charles Dickens, "Oliver Twist". And split into such tiny fragments
that you only get through about five paragraphs before having to click
for the next page. So you can't even read it offline! And he doesn't
even get any SEO benefit from this as all the <titles> of the pages
are the same. Saying 'filler pages' isn't just a joke here, it's
simply a bald statement of the truth.

And the saddest part of all? Try searching for:

charles dickens oliver twist

Dave's site is not found in the top 500, and only ranks #200 even if
you use the phrase in quotes.


-stoma




Reply With Quote
  #8  
Old   
Victoria Clare
 
Posts: n/a

Default Re: Is Google's database full? - 09-08-2004 , 03:42 PM



stoma <stoma (AT) bee-tee-internet (DOT) com> wrote in
news:sjtrj09k1vs28e36ndbhqv6tesbrtka3q1 (AT) 4ax (DOT) com:

Quote:
Why is he allowed to waste 217 precious DocIDs to post just one novel
by Charles Dickens, "Oliver Twist".
Um, it seems a bit early to be taking the 'Google is full' theories as
fact.

I dunno about you, but I've seen no evidence that Google is reluctant to
index new content, or that it is binning content it used to list.

(I have seen evidence it's got better at detecting duplication, and you
would expect that to free up some space, if it was in fact limited.)

Quote:
And split into such tiny fragments
that you only get through about five paragraphs before having to click
for the next page. So you can't even read it offline!
An offline browser is what is usually used. I use WinHTtrack Website
Copier for downloading multipage content to read locally, but there are
lots of others.

It would be a long single page that contained the whole of 'Oliver Twist'!

I agree that there is no great skill involved in reproducing out-of-
copyright novels online, but I see no real harm in Project Gutenberg
content being mirrored elsewhere, do you?

Victoria
--
Clare Associates Ltd
http://www.clareassoc.co.uk/
--


Reply With Quote
  #9  
Old   
stoma
 
Posts: n/a

Default Re: Is Google's database full? - 09-08-2004 , 05:41 PM



On Wed, 08 Sep 2004 20:42:29 +0100, Victoria Clare
<victoria (AT) markpoles (DOT) org.uk> wrote:

Quote:
I dunno about you, but I've seen no evidence that Google is reluctant to
index new content, or that it is binning content it used to list.
It does love to index new pages, but is still happy to bin them. I've
seen it spider 400 new pages and add 90% of them to the index in 12
hours, spider them all again the next day, but still drop all but six
within a week. I've also seen the problem mentioned in the article of
page caches being slowly dropped over time, so the number of pages
indexed on your site continues to climb but your hits go down as the
older pages are in as URLs only. It's certainly a very inefficent way
to operate. The way to keep your pages in seems to be to keep your
site supplied with decent PR links, and don't just link to your index,
deep link into your content too.

Quote:
An offline browser is what is usually used. I use WinHTtrack Website
Copier for downloading multipage content to read locally, but there are
lots of others.
It's still a bit inconvenient to have 200+ files on your HD just to
read a book! Plus the way Dave does his indexes it might be hard to
download the book all at once.

Quote:
I agree that there is no great skill involved in reproducing out-of-
copyright novels online, but I see no real harm in Project Gutenberg
content being mirrored elsewhere, do you?
Nah, nothing wrong with that, it's just the way Dave does it I don't
like - I'd never seriously use his site. I have a few book sites
myself, but post them by chapter. It's not just better for the casual
reader, it helps with SEO as you can use the chapter title for the
page title and land a bundle of 'accidental' serps with it. For
example I picked up #1/3,370,000 for 'evil looking man', which was
nice.

-stoma



Reply With Quote
  #10  
Old   
SEO Dave
 
Posts: n/a

Default Re: Is Google's database full? - 09-08-2004 , 08:24 PM



On Wed, 8 Sep 2004 21:41:26 +0000 (UTC), stoma
<stoma (AT) bee-tee-internet (DOT) com> wrote:

Quote:
An offline browser is what is usually used. I use WinHTtrack Website
Copier for downloading multipage content to read locally, but there are
lots of others.

It's still a bit inconvenient to have 200+ files on your HD just to
read a book! Plus the way Dave does his indexes it might be hard to
download the book all at once.
My first consideration when designing the template was visitors and
there should be no reason why popular offline readers shouldn't be
able to access an entire book.

Quote:
I agree that there is no great skill involved in reproducing out-of-
copyright novels online, but I see no real harm in Project Gutenberg
content being mirrored elsewhere, do you?

Nah, nothing wrong with that, it's just the way Dave does it I don't
like - I'd never seriously use his site.
Each to their own, I know thousands of visitors a day visit my sites,
so someone is happy with the format.

Quote:
I have a few book sites
myself,
Wonder where you got that idea from.

Quote:
but post them by chapter. It's not just better for the casual
reader,
Why is it better for the reader?

You have a choice of bookmarking the chapter or bookmarking the page
you are on I know which I prefer. I estimate each page holds the
equivalent of two pages of a paper book. Whilst a chapter could be 50
or 100 pages. For the user my site is much easier to use and come back
to where you left off after bookmarking. It's similar to adding a
bookmark to a real book whilst what most do (breaking up by chapter)
is like remembering you are on chapter 10, but not quite sure which
page.

Takes a lot more effort on my part to do it this way as well (or did
before I automated most of the process).

Quote:
it helps with SEO as you can use the chapter title for the
page title and land a bundle of 'accidental' serps with it. For
example I picked up #1/3,370,000 for 'evil looking man', which was
nice.
This is true, but I'm again thinking of the visitor first and search
engines second. I'm not interested in the odd irrelevant SERP that
will get traffic up, but not result in a visitor who is interested in
the pages I'm offering.

My pages are titled to make it easier for those who bookmark a
specific page to find it later. Someone can read 10 books at my site
at the same time and not get lost since the pages are titled to
prevent this.

If you read to page 50 and bookmark the page to come back to later the
bookmark will have a title like this-

Charles Dickens - Oliver Twist Page 50

for this page-
http://www.charles-dickens.org/olive...ok-page-50.asp

The visitor who made this bookmark will instantly recognise this as
page 50 of the book Oliver Twist by Charles Dickens.

Whilst this one of yours-

http://www.gbs.pi8.com/methuselah/

You wouldn't have a clue what you had bookmarked with half of the
titles!

Someone is making an effort to go after the Classic Literature SERP I
see :-))

BTW do you add the pages/links by hand for the above pages? Must take
you forever! Learn to automate most of the process and you can add
thousands of pages a day. It's easy to automate chapter books as well,
maybe I could do both, give my visitors the choice.

Quote:
-stoma
David
--
http://www.search-engine-optimization-services.co.uk/


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.