HighDots Forums  

Re: Keep search engines off indexing temporary

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss Re: Keep search engines off indexing temporary in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
ato_zee@hotmail.com
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-22-2006 , 03:47 PM







On 22-Oct-2006, "Lars Bonnesen" <none (AT) none (DOT) זרו> wrote:

Quote:
Is there a commen way to temp. disable indexing and followings on these
search engine crawlers?
Google for robots.txt

Many references. You can have different folders on your server
for testing a new site build, and use robots.txt to exclude the
search engines. The robots.txt has to go in the root directory.


Reply With Quote
  #2  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-22-2006 , 04:09 PM






__/ [ ato_zee (AT) hotmail (DOT) com ] on Sunday 22 October 2006 20:47 \__

Quote:
On 22-Oct-2006, "Lars Bonnesen" <none (AT) none (DOT) זרו> wrote:

Is there a commen way to temp. disable indexing and followings on these
search engine crawlers?

Google for robots.txt

Many references. You can have different folders on your server
for testing a new site build, and use robots.txt to exclude the
search engines. The robots.txt has to go in the root directory.
Here's a golden reference... one that I used years ago.

http://www.robotstxt.org/wc/robots.html

Hope it helps,

Roy

--
Roy S. Schestowitz, Ph.D. Candidate (Medical Biophysics)
http://Schestowitz.com | SuSE Linux | PGP-Key: 0x74572E8E
9:05pm up 4 days 6:19, 8 users, load average: 0.33, 0.46, 0.52
http://iuron.com - Open Source knowledge engine project


Reply With Quote
  #3  
Old   
z
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-22-2006 , 04:44 PM



ato_zee (AT) hotmail (DOT) com wrote:

Quote:
On 22-Oct-2006, "Lars Bonnesen" <none (AT) none (DOT) זרו> wrote:

Is there a commen way to temp. disable indexing and followings on these
search engine crawlers?

Google for robots.txt
The only drawback of robots.txt is that it tells anyone who looks at your
robots.txt that you have a development version in a "hidden" directory.
Also, search engines may include pages forbidden with robots.txt, but just
not cache it. I've seen pages forbidden with robots.txt in both Google and
MSN -- just without a text snippet.

If you are using a template for the files in the hidden directory, you can
use a robots meta tag in the header temporarily. That way your "hidden"
directory wouldn't be advertised in your robots.txt. Just be sure to
remove it before that section of the site goes live.

Or, if you have a static IP that you work from, you can put a snippet of
code at the top of the template:

$myIP = xxx.xxx.xxx.xx
if $remoteIP != $myIP send 404 header


Reply With Quote
  #4  
Old   
Borek
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-22-2006 , 05:54 PM



On Sun, 22 Oct 2006 22:44:50 +0200, z <news01.web (AT) mailnull (DOT) com> wrote:

Quote:
Is there a commen way to temp. disable indexing and followings on these
search engine crawlers?

Google for robots.txt

The only drawback of robots.txt is that it tells anyone who looks at your
robots.txt that you have a development version in a "hidden" directory.
Not necesarilly:

User-agent: *
Disallow: /my_dir

and use /my_dir_guess_that as the development directory

Borek
--
http://www.chembuddy.com
http://www.ph-meter.info
http://www.terapia-kregoslupa.waw.pl


Reply With Quote
  #5  
Old   
MaxPowers
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-23-2006 , 03:45 AM




z wrote:
Quote:
ato_zee (AT) hotmail (DOT) com wrote:

$myIP = xxx.xxx.xxx.xx
if $remoteIP != $myIP send 404 header
Use a 304 header instead... The 404 says it isn't there anymore, which
you don't want. The 304 says "Not Modified" which usually sends spiders
(particularly Google) in another direction assuming the page has not
been changed since the last time it visited.

I have never tested this exactly, but it seems to make sense from the
http standpoint.



Reply With Quote
  #6  
Old   
z
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-23-2006 , 12:43 PM



MaxPowers wrote:

Quote:
z wrote:
ato_zee (AT) hotmail (DOT) com wrote:

$myIP = xxx.xxx.xxx.xx
if $remoteIP != $myIP send 404 header

Use a 304 header instead... The 404 says it isn't there anymore, which
you don't want. The 304 says "Not Modified" which usually sends spiders
(particularly Google) in another direction assuming the page has not
been changed since the last time it visited.
404 is a "not found" header -- that is what you would want to tell visitors
if you don't want anyone to know that a folder exists. If it's just a
temporary development folder, then 404 would be better than risking a
spidering. 304 says "Not Modified", but if they haven't indexed it, what
would stop them from indexing it?

You could accidentally let search engines know about your hidden directory
by navigating from your hidden directory to a page that publishes their
referrers in a free stats counter. So you would send your referrer from
your hidden directory to a stat counter's list of referrers. Search
engines could then follow the link from the stat counter back to your
hidden directory. Then your hidden directory could haunt you for up to a
year or more, published in Yahoo's cache.

Sending a 404 would be the sneaky way to hide a hidden development directory
because it makes the directory look like it doesn't exist or was removed.
403 would work also, but would tell people that something does exist there.
304 might encourage a spidering if it wasn't previously indexed.

(Also good to disable referrers in general... Firefox Web Developer Toolbar
is good for that.)






Reply With Quote
  #7  
Old   
z
 
Posts: n/a

Default Re: Keep search engines off indexing temporary - 10-23-2006 , 12:45 PM



Borek wrote:

Quote:
On Sun, 22 Oct 2006 22:44:50 +0200, z <news01.web (AT) mailnull (DOT) com> wrote:

Is there a commen way to temp. disable indexing and followings on these
search engine crawlers?

Google for robots.txt

The only drawback of robots.txt is that it tells anyone who looks at your
robots.txt that you have a development version in a "hidden" directory.

Not necesarilly:

User-agent: *
Disallow: /my_dir

and use /my_dir_guess_that as the development directory

Good idea... if the search engines obey the robots.txt...


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.