HighDots Forums  

Allow robot access to protected content

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss Allow robot access to protected content in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Sholom
 
Posts: n/a

Default Allow robot access to protected content - 06-07-2006 , 02:54 PM






Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.


I can't allow by user-agent since my authentication software doesn't
allow that. Is there any way to give Google a username and password? Or

is there an IP, or range of IPs, that google uses?


Thanks.


Reply With Quote
  #2  
Old   
John Bokma
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 02:58 PM






"Sholom" <sdeen (AT) diamonds (DOT) net> wrote:

Quote:
Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.
Yup, it's called cloaking. I'll report it when I see it.

Quote:
I can't allow by user-agent since my authentication software doesn't
allow that. Is there any way to give Google a username and password? Or

is there an IP, or range of IPs, that google uses?
Yes, and this might get you banned.

--
John Freelance Perl programmer: http://castleamber.com/

A better start menu with Quick Launch:
http://johnbokma.com/windows/quick-launch.html


Reply With Quote
  #3  
Old   
Borek
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 03:03 PM



On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen (AT) diamonds (DOT) net> wrote:

Quote:
Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.
Easy way to get banned.

I hate sites that are indexed but not accessible. Usually I do two things
at the same time - first, I read cached content. Second, I report such
site to Google.

Best,
Borek
--
http://www.chembuddy.com
http://www.ph-meter.info/pH-Nernst-equation
http://www.terapia-kregoslupa.waw.pl


Reply With Quote
  #4  
Old   
Sholom
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 03:53 PM



Thanks to all for the replies. I had no idea this was such a sensitive
issue. If our publication has information that could be helpful to
someone, I figured they should know about it. I guess that upsets some.

However, I'm pretty sure I've come across dozens of legitimate sites on
SE's, particularly on Google News, that require registration. Wall
Street Journal is one example that comes to mind. Their link shows up
with a "(subscription)" tag next to it, and I was just wondering how
they get that done.

I may be misunderstanding the terms and concepts here; perhaps Google
News is not strictly a search engine, and it is only there that it's
allowed.

(As an aside, re the cache issue, I was under the impression that a
"robots=nocache" meta tag prevents the search engine from showing a
cached page.)


Reply With Quote
  #5  
Old   
Big Bill
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 04:46 PM



On 7 Jun 2006 18:58:01 GMT, John Bokma <john (AT) castleamber (DOT) com> wrote:

Quote:
"Sholom" <sdeen (AT) diamonds (DOT) net> wrote:

Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.

Yup, it's called cloaking. I'll report it when I see it.

I can't allow by user-agent since my authentication software doesn't
allow that. Is there any way to give Google a username and password? Or

is there an IP, or range of IPs, that google uses?

Yes, and this might get you banned.
Go talk to fantomaster. www.fantomaster.com

BB
--

http://www.kruse.co.uk/seo-services.htm
http://www.here-be-posters.co.uk/lempicka-prints.htm
http://www.crystal-liaison.com/armani/index.html



Reply With Quote
  #6  
Old   
Big Bill
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 04:46 PM



On Wed, 07 Jun 2006 21:03:41 +0200, Borek
<m.borkowski (AT) delete (DOT) chembuddy.these.com.parts> wrote:

Quote:
On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen (AT) diamonds (DOT) net> wrote:

Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.

Easy way to get banned.

I hate sites that are indexed but not accessible. Usually I do two things
at the same time - first, I read cached content. Second, I report such
site to Google.

Best,
Borek
Third, I stroke white cat in my lap; "So, Meester Bond.. you theenk
you are so clever eendexing your pages, huh? We show zem, eh Comrad
Pussy... A-HAH-HAH-HAH-HAH"

BB (tra-la-la)))
--

http://www.kruse.co.uk/seo-services.htm
http://www.here-be-posters.co.uk/lempicka-prints.htm
http://www.crystal-liaison.com/armani/index.html



Reply With Quote
  #7  
Old   
John Bokma
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 04:59 PM



"Sholom" <sdeen (AT) diamonds (DOT) net> wrote:

Quote:
Thanks to all for the replies. I had no idea this was such a sensitive
issue.
Of course it is. Do you like it when on Google's SERP it appears that the
content is freely available and next you're greeted with a register page?

Quote:
(As an aside, re the cache issue, I was under the impression that a
"robots=nocache" meta tag prevents the search engine from showing a
cached page.)
Yup, that's cloaking, and I report it when I see it.

--
John Freelance Perl programmer: http://castleamber.com/

Creating a customized Command Prompt shortcut:
http://johnbokma.com/windows/command...-shortcut.html


Reply With Quote
  #8  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-07-2006 , 10:34 PM



__/ [ Borek ] on Wednesday 07 June 2006 20:03 \__

Quote:
On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen (AT) diamonds (DOT) net> wrote:

Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the info,
but we'd like to have google index those pages. I see there are many
sites who've managed this.

Easy way to get banned.

I hate sites that are indexed but not accessible. Usually I do two things
at the same time - first, I read cached content. Second, I report such
site to Google.
There is a way around this. Change user-agent string to googlebot and you're
in. To be honest, I didn't know this trick until somebody told me last week.
And I agree with Borek: it's annoying and given that it's a mild form of
cloaking (different content served to SE's and people or hiding
information), it is basis for banishment.

Best wishes,

Roy

--
Roy S. Schestowitz | {Hide sig} {Show sig} >{Close Application}<
http://Schestowitz.com | Free as in Free Beer ¦ PGP-Key: 0x74572E8E
3:30am up 41 days 9:03, 11 users, load average: 2.47, 1.24, 0.78
http://iuron.com - semantic engine to gather information


Reply With Quote
  #9  
Old   
John Bokma
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-08-2006 , 12:23 AM



Roy Schestowitz <newsgroups (AT) schestowitz (DOT) com> wrote:

Quote:
__/ [ Borek ] on Wednesday 07 June 2006 20:03 \__

On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen (AT) diamonds (DOT) net
wrote:

Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the
info, but we'd like to have google index those pages. I see there
are many sites who've managed this.

Easy way to get banned.

I hate sites that are indexed but not accessible. Usually I do two
things at the same time - first, I read cached content. Second, I
report such site to Google.

There is a way around this. Change user-agent string to googlebot and
you're in.
If they check for that, yup. Some sites check for the crawlers, based on
IP or name.


Quote:
To be honest, I didn't know this trick until somebody told
me last week.
Wasn't me, but 2+ years ago:
http://johnbokma.com/mexit/2004/04/2...useragent.html

Funny, I notice that I have a link to report spam with google on my site
:-D My site is getting too big. Or maybe I should say: a site is getting
good when you limit Google to your site when looking for some info (which
I do now and then, I even made a special keymark for it :-D)

--
John isa Perl programmer: http://johnbokma.com/perl/perlprogrammer.html

Fox G Bar: http://johnbokma.com/firefox/google-...stomizing.html


Reply With Quote
  #10  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: Allow robot access to protected content - 06-08-2006 , 03:22 AM



__/ [ John Bokma ] on Thursday 08 June 2006 05:23 \__

Quote:
Roy Schestowitz <newsgroups (AT) schestowitz (DOT) com> wrote:

__/ [ Borek ] on Wednesday 07 June 2006 20:03 \__

On Wed, 07 Jun 2006 20:54:28 +0200, Sholom <sdeen (AT) diamonds (DOT) net
wrote:

Anyone know how to allow Google's robots to index protected content?

My company has a site that requires a subscription to access the
info, but we'd like to have google index those pages. I see there
are many sites who've managed this.

Easy way to get banned.

I hate sites that are indexed but not accessible. Usually I do two
things at the same time - first, I read cached content. Second, I
report such site to Google.

There is a way around this. Change user-agent string to googlebot and
you're in.

If they check for that, yup. Some sites check for the crawlers, based on
IP or name.

In worse scenarios, if you have no browser extensions, wget can be used to
fetch the page in question. There's the "--user-agent" option.


Quote:
To be honest, I didn't know this trick until somebody told
me last week.

Wasn't me, but 2+ years ago:
http://johnbokma.com/mexit/2004/04/2...useragent.html

Funny, I notice that I have a link to report spam with google on my site
:-D My site is getting too big. Or maybe I should say: a site is getting
good when you limit Google to your site when looking for some info (which
I do now and then, I even made a special keymark for it :-D)

*smile* I can remember the time when I ceased to maintain the sitemap and
lost that visual, conceptual idea of how my site was constructed. It is now
somewhat of a messy Web, which I sometimes try to restructure. Same
situation with E-mail accounts, Web hosts, and domain names.

Best wishes,

Roy

--
Roy S. Schestowitz | Othello for Win32/Linux: http://othellomaster.com
http://Schestowitz.com | Free as in Free Beer ¦ PGP-Key: 0x74572E8E
8:15am up 41 days 13:48, 11 users, load average: 0.95, 0.81, 0.77
http://iuron.com - semantic engine to gather information


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.