HighDots Forums  

blocking robots.txt from non-robots

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss blocking robots.txt from non-robots in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #41  
Old   
John Bokma
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 09:26 PM






Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
[..]

Quote:
You tell me. I'm one of the several hundred thousand bloggers that
have had their page rank bitchslapped by google, first to zero pr,
then to status "unranked".
I haven't had that problem (yet) although my PR dropped from 7 to 6, and
my AdSense income dropped from 700+ USD to 300 USD/month (drop started
around April/May or so). So I feel the pain a bit, although I have the
feeling that the PR drop has more to do with the AddThis link I put on
every link (dropped now, also because it was useless, in months it was
only used a few times, and that with 14k visitors a day)

[PR value -> to advertiser important]
Yeah, I hear you on this one.

I've decided to start looking for my own advertisers soon. The AdSense
drop I probably can fix by tweaking colors, moving them to other
positions, etc. but that seems poinless work to me, especially if I have
to do it every 6 months or so. Also, I have the feeling that I can make
more money with my 14k/day blog by finding my own advertisers.

However, I have already decided *not* to link to them with a link that
propagates PR. I'll use JS like Google does. I agree with Google that
one shouldn't be able to buy PR.

Quote:
Andy Beard's blog had an idea recently about using robots.txt to tell
google not to index pages that contain the paid links that they're so
upset about. I'm thinking that this gives the best of both worlds..
the advertiser gets their paid link that doesn't have rel="nofollow"
on it and google gets told "don't crawl this page".
A good question is: does a page gets PR if it's linked to, but Google
can't index it? I have no idea to be honest.

Quote:
The problem is with intermediaries that might decide this is no better
than putting rel="nofollow" on the link (which I don't see that it
is). The idea is to keep them from being able to read the robots.txt
that is being given to Google should they think of it.
Like I wrote in another reply: people who worry about you cloaking
robots.txt are probably smart enough to figure out that if all pages
that have their ads on it don't appear on Google something is amiss.
Moreover, it wouldn't amaze me if that's the way they find out in the
first place (with vanity searches, or even a vanity search limited to
your site).

So as a co-blogger, I would advice not to do this. If your advertisers
figure this out... Moreover, if some blogger finds this out and blogs
about it...

Quote:
Advertiser wants links without rel="nofollow"
Google doesn't seem to have a problem finding advertisers who don't care
about this. On the other hand, maybe give advertorials a try? Or paid
for product reviews (make sure you make clear it's paid for on the post
itself)

Quote:
Google doesn't like paid links unless they have rel="nofollow".
Yeah, I can understand this, otherwise one could buy a PR7 or even a PR8
site with little money, and then the whole PR thing would become a big
joke. It reflects how much money one has, and says nothing about
quality.

Quote:
Please forgive my attitude earlier.... Real life is
.... being a problem
Heh, no worries, and I have had my share (and still have one might say)
of real life problems to know that it doesn't make one a nice and
friendly person while going through it.

Quote:
I'm not trying to defraud, I just want to get back to earning a
living. I was doing pretty good untill the "BitchSlap of '07"
I understand the why now, yet I still advice against it.

--
John Bokma http://johnbokma.com/


Reply With Quote
  #42  
Old   
Don
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 09:37 PM






John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4BD75DC6628castleamber (AT) 130 (DOT) 133.1.4:

Quote:
Don <lostinspace (AT) 123-universe (DOT) com> wrote:

It's done ALL the time.
What matters is that it's done for an appropiate reason and is
accomplished server side.

To me, what matters, is that a user doesn't click on a search result,
and comes on a page that doesn't make the expected data available.

John,
Believe this is the goal of most webnasters, however there is specific
traffic that individual webasters simply have no desire for.
That's their own decision and each webamster must determine what is
benefical or detrimental their own site (s).

Quote:
webmasterworld (IIRC) did use cloaking, maybe still does. I've
reported this several times to Google, but no use (unless it has been
fixed)

Webmaster World has had many problems in getting their extensive forums
and pages sipdered propperly, without allowing harvesting by other forums.
Brett does a superb job at providing mutiple forums for participants the
world over.
Here's a 2005 explanation:
http://www.webmasterworld.com/forum9/9618.htm



Reply With Quote
  #43  
Old   
Joe Fox
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 10:35 PM



Don <lostinspace (AT) 123-universe (DOT) com> wrote in
news:Xns9A4BE192F75F7lostinspace123univer (AT) 207 (DOT) 115.17.102:

Quote:
Joe Fox <ny152 (AT) none (DOT) invalid> wrote in news:Xns9A4BD1118343D891563@
127.0.0.1:

Don <lostinspace (AT) 123-universe (DOT) com> wrote in
news:Xns9A4BD360F498Clostinspace123univer (AT) 207 (DOT) 115.33.102:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote in
news:Xns9A49EC6462EC5891563 (AT) 127 (DOT) 0.0.1:


I'm using a robots.txt file to control what is and is not crawled
by search engine bots but I'd like to block anything that isn't a
known search engine bot doesn't get the file I'm feeding to google,
yahoo and the others.

From what I've read this could be done with .htacess but I've not
been able to make heads or tails out of that.

I'd really be grateful for some help here.

Thanks

Some tutorials
http://baremetal.com/gadgets/htaccess/ http://evolt.org/node/226
http://www.edginet.org/techie/website/htaccess.html

http://www.dimi.uniud.it/labs/docume.../Challenger1.2
/U
ser/htaccess/htaccess.html
http://www.webhelpinghand.com/htaccess_deny.htm
http://www.javascriptkit.com/howto/htaccess.shtml
http://www.serverwatch.com/tutorials...0825_1127711_1
http://www.verio.com/support/documen...fm?doc_id=3624


Some of those are familiar but I'll take a look at 'em anyway.

My big problem is I'm not a coder. Simple stuf I can handle but
figuring out docs and helps takes forever


Joe,
There are more beneficial forums for htaccess and Apache.
The Apache Server forum at Webmaster World is excellent and the
moderator makes a superb effort to assist far too many people.

The Search Engine Spider ID was the predecessor to the Apache as far
as
htaccess coding.

Rgistration is free to most forums.

I may be able to assist you, however my extensive use of htaccess has
been limited to the "KISS" thought.
When it comes to simulated-wildcards and complicated expressions, I'm
daft!

Thanks.. I'll check out the forums and reread the links you provided
along with John's suggested code



Reply With Quote
  #44  
Old   
Paul
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 10:54 PM



On 22 Feb 2008 02:53:29 GMT, John Bokma <john (AT) castleamber (DOT) com> wrote:

Quote:
Paul <noone (AT) houstoncrafts (DOT) com> wrote:

You have email John.

Thanks Paul, looking into it (got the Gecko one as well, haven't had time
to check it out, thanks).
nps John,
i'll hear from you when you are ready.
plh
paul

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----


Reply With Quote
  #45  
Old   
Big Bill
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 11:18 PM



On Thu, 21 Feb 2008 20:00:21 -0600, Joe Fox <ny152 (AT) none (DOT) invalid>
wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4BB7830D344castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

As I said in another post, Can we save discussion of *why* for
another time and talk about *how*?

Like I wrote in reply to that other post: we're trying to help you for
*free*. But even if you were paying me, I would ask you *why*.

How about that. A voice of reason on UseNet. Nearly an extinct species
these days. ;-)
He was extinct for a while. He came back.

Quote:
Too often people think they have an X problem, and try to find all
kinds of solutions to that, while the real problem is Y. If you ever
have been helping others on Usenet, you certainly know what I mean.

NOW I see what you're saying and because you're saying it so reasonably,
I'll go into why.

It sounds to me like you're afraid to educated others (give away your
secrets) by hiding your robots.txt. If that is indeed the case you
*do* have a X -> Y problem.

You tell me. I'm one of the several hundred thousand bloggers that have
had their page rank bitchslapped by google, first to zero pr, then to
status "unranked".

There's a big problem with this because up until then pr has been a
determining factor in the income of these bloggers.
Because you can sell links priced according to your PR?

Quote:
Obviously a lot of
people have want to get pagerank back and still be able to do the work
they enjoy and get paid for it as before... without google hitting them
with pr zero or unranking them entirely. Getting advertisers and
intermediaries to stop using pr as one of their value assessment metrics
is being attempted,
How'd I guess?

Quote:
but simply put, advertisers want "link juice" and
visibility in search engines and they're never going to stop wanting the
links they pay for on pages with a certain pagerank.
They will when they get educated about it.

Quote:
Andy Beard's blog had an idea recently about using robots.txt to tell
google not to index pages that contain the paid links that they're so
upset about. I'm thinking that this gives the best of both worlds.. the
advertiser gets their paid link that doesn't have rel="nofollow" on it
and google gets told "don't crawl this page".
If it isn't indexed or crawled, how will it get any PR? The home page
PR is irrelevant, by the way...

Quote:
Google gets to keep their index "pure" by not crawling (and thus
indexing ) the page with the paid links, and the advertisers get some pr
because while the page won't be crawled, there will still be links to it
so that it can pass pagerank (though maybe not as much as otherwise).
Um. Difficult to see how Google could give PR to a page that's not
even indexed.

Quote:
The problem is with intermediaries that might decide this is no better
than putting rel="nofollow" on the link (which I don't see that it is).
The idea is to keep them from being able to read the robots.txt that is
being given to Google should they think of it.

Advertiser wants links without rel="nofollow"
Google doesn't like paid links unless they have rel="nofollow".

There are bloggers who need the money and must find a way to do both at
the same time.
There are bloggers who shouldn't give up their day job.

Quote:
Seems to me that this method should work as long as
intermediaries don't get the robots.txt being given to google. Thus the
need to ensure that ONLY google or other Search engines get the "real"
robots.txt.

Problem is, I'm not a coder.
I agree :-)

Quote:
I'm trying to figure out how to do this
with .htaccess and it's very slow going. There are seemingly more
pitfalls than answers simply because I do not understand the language.
Thus I seek help from those who Do know the language.

Please forgive my attitude earlier.... Real life is
.... being a problem

I'm not trying to defraud, I just want to get back to earning a living.
I was doing pretty good untill the "BitchSlap of '07"
Doesn't mean you yourself personally were doing anything wrong
(although it sounds like you were actually). Plenty of people, myself
included, lost PR because people linking to them were involved in
buying/selling links and got penalised accordingly. I don't get any
fewer visitors as a consequence, though. PR is not the metric people
think it is.

Mind you, it's five am here. I might feel different when I've had a
tea and rubbed my eyes.

BB
--

http://www.kruse.co.uk/
http://www.fat-odin.com/
http://www.here-be-posters.co.uk/


Reply With Quote
  #46  
Old   
Big Bill
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-21-2008 , 11:18 PM



On Thu, 21 Feb 2008 19:23:18 -0600, Joe Fox <ny152 (AT) none (DOT) invalid>
wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4BB6E764926castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

Can we simply agree to disagree and save discussion of *why* for
another time and go into some details about *how*?

Welcome to Usenet. Remember people try to help you in *their* spare
time, for *free*.

Yes, that's right, they do. and I have always apprecieated the help and
input I find on UseNet and other sources. That's why I didn't see the
need for a big deal about why. I didn't see a need to waste people's
time with *why*.
Because when people come on here asking how to do weird stuff, it's
usually because they're asking the wrong question in the first place.
As you are yourself, I think. I don't think what you suggest is
practical as you'd need to know technical info about your visitors
that I don't imagine you'd be able to.

BB
--

http://www.kruse.co.uk/
http://www.fat-odin.com/
http://www.here-be-posters.co.uk/


Reply With Quote
  #47  
Old   
John Bokma
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-22-2008 , 12:10 AM



Don <lostinspace (AT) 123-universe (DOT) com> wrote:

Quote:
John Bokma <john (AT) castleamber (DOT) com> wrote in
[..]
Quote:
RewriteCond %{HTTP_USER_AGENT} =UA1 [OR]
RewriteCond %{HTTP_USER_AGENT} =UA2 [OR]
RewriteCond %{HTTP_USER_AGENT} =UA3 [OR]
RewriteRule ^robots.txt$ real-robots.txt [L]

with UA1..UAn the *exact* UA plain string, e.g.
Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)

See: http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html


John,
Just a heads up (not critique).
The last "[OR]" is invalid.
:-) yup, 100% right, so much for being lazy and copying one line twice.

--
John Bokma http://johnbokma.com/


Reply With Quote
  #48  
Old   
Joe Fox
 
Posts: n/a

Default Re: blocking robots.txt from non-robots - 02-23-2008 , 10:17 AM



Big Bill <bill (AT) kruse (DOT) co.uk> wrote in
news:jjlsr3dr6o5vmg83osm8m3q3n9mel6fc0v (AT) 4ax (DOT) com:

Quote:
On Thu, 21 Feb 2008 20:00:21 -0600, Joe Fox <ny152 (AT) none (DOT) invalid
wrote:

John Bokma <john (AT) castleamber (DOT) com> wrote in
news:Xns9A4BB7830D344castleamber (AT) 130 (DOT) 133.1.4:

Joe Fox <ny152 (AT) none (DOT) invalid> wrote:

As I said in another post, Can we save discussion of *why* for
another time and talk about *how*?

Like I wrote in reply to that other post: we're trying to help you
for *free*. But even if you were paying me, I would ask you *why*.

How about that. A voice of reason on UseNet. Nearly an extinct
species these days. ;-)

He was extinct for a while. He came back.

Too often people think they have an X problem, and try to find all
kinds of solutions to that, while the real problem is Y. If you ever
have been helping others on Usenet, you certainly know what I mean.

NOW I see what you're saying and because you're saying it so
reasonably, I'll go into why.

It sounds to me like you're afraid to educated others (give away
your secrets) by hiding your robots.txt. If that is indeed the case
you *do* have a X -> Y problem.

You tell me. I'm one of the several hundred thousand bloggers that
have had their page rank bitchslapped by google, first to zero pr,
then to status "unranked".

There's a big problem with this because up until then pr has been a
determining factor in the income of these bloggers.

Because you can sell links priced according to your PR?

Obviously a lot of
people have want to get pagerank back and still be able to do the work
they enjoy and get paid for it as before... without google hitting
them with pr zero or unranking them entirely. Getting advertisers and
intermediaries to stop using pr as one of their value assessment
metrics is being attempted,

How'd I guess?

but simply put, advertisers want "link juice" and
visibility in search engines and they're never going to stop wanting
the links they pay for on pages with a certain pagerank.

They will when they get educated about it.

Andy Beard's blog had an idea recently about using robots.txt to tell
google not to index pages that contain the paid links that they're so
upset about. I'm thinking that this gives the best of both worlds..
the advertiser gets their paid link that doesn't have rel="nofollow"
on it and google gets told "don't crawl this page".

If it isn't indexed or crawled, how will it get any PR? The home page
PR is irrelevant, by the way...
advertisers don't think so

Quote:
Google gets to keep their index "pure" by not crawling (and thus
indexing ) the page with the paid links, and the advertisers get some
pr because while the page won't be crawled, there will still be links
to it so that it can pass pagerank (though maybe not as much as
otherwise).

Um. Difficult to see how Google could give PR to a page that's not
even indexed.
because even though it doens't get crawled, there are still links going
to the page that do get crawled. Beard referred to it as a "dangling
page"

Quote:
The problem is with intermediaries that might decide this is no better
than putting rel="nofollow" on the link (which I don't see that it
is). The idea is to keep them from being able to read the robots.txt
that is being given to Google should they think of it.

Advertiser wants links without rel="nofollow"
Google doesn't like paid links unless they have rel="nofollow".

There are bloggers who need the money and must find a way to do both
at the same time.

There are bloggers who shouldn't give up their day job.
Then there are those of us for whom blogging *IS* their day job.

Quote:
Seems to me that this method should work as long as
intermediaries don't get the robots.txt being given to google. Thus
the need to ensure that ONLY google or other Search engines get the
"real" robots.txt.

Problem is, I'm not a coder.

I agree :-)

I'm trying to figure out how to do this
with .htaccess and it's very slow going. There are seemingly more
pitfalls than answers simply because I do not understand the language.
Thus I seek help from those who Do know the language.

Please forgive my attitude earlier.... Real life is
.... being a problem

I'm not trying to defraud, I just want to get back to earning a
living. I was doing pretty good untill the "BitchSlap of '07"

Doesn't mean you yourself personally were doing anything wrong
(although it sounds like you were actually). Plenty of people, myself
included, lost PR because people linking to them were involved in
buying/selling links and got penalised accordingly. I don't get any
fewer visitors as a consequence, though. PR is not the metric people
think it is.
I agree, but how do you convince advertisers of that? How do you
convince payperpost of that? Ted Murphy won't listen to anybody.. the
man is living in a dream world. one that a lot of bloggers are trapped
in because of the vicious circle effect of pr, paid links, and google's
reaction to paid links by lowering pr or de-listing sites.




Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.