HighDots Forums  

Alexa Monkey Business

Search Engine Optimization Discussion about SEO/Search Engine Optimization (alt.internet.search-engines)


Discuss Alexa Monkey Business in the Search Engine Optimization forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Roy Schestowitz
 
Posts: n/a

Default Alexa Monkey Business - 10-13-2005 , 07:23 AM






I have just found in my logs a request for one hidden file. That request was
/not/ from an IP address that is mine. Made me very worried...

I then proceeded to reverse DNS lookup and guess what? The Alexa/Amazon/A9
toolbars are not only keeping track of traffic, but also retain URL's of
pages that you visit and /use/ them, i.e. visit them and maybe crawling
them. Knowing that the Web Archive, AKA Time Machine belongs to Alexa, this
is scary at the least. These visits from Alexa might include hidden page
that you may have on your fileserver and occasionally access. Handle with
care!

Roy

Reply With Quote
  #2  
Old   
Borek
 
Posts: n/a

Default Re: Alexa Monkey Business - 10-13-2005 , 09:18 AM






On Thu, 13 Oct 2005 13:23:25 +0200, Roy Schestowitz
<newsgroups (AT) schestowitz (DOT) com> wrote:

Quote:
toolbars are not only keeping track of traffic, but also retain URL's of
pages that you visit and /use/ them, i.e. visit them and maybe crawling
That's nothing new, I believe I have read reports of unlinked and
otherwise invisible pages indexed by Google once they were visited
with IE with G toolbar installed. Not that it is unexpected - once
the url is sent to G for PR check it is already there, why not add
it to the urls to be spidered? Someone visited the site, perhaps
it is worth something

Best,
Borek
--
http://www.chembuddy.com - chemical calculators for labs and education
BATE - program for pH calculations
CASC - Concentration and Solution Calculator
pH lectures - guide to hand pH calculation with examples


Reply With Quote
  #3  
Old   
wd
 
Posts: n/a

Default Re: Alexa Monkey Business - 10-13-2005 , 10:53 AM



On Thu, 13 Oct 2005 12:23:25 +0100, Roy Schestowitz wrote:

Quote:
I have just found in my logs a request for one hidden file. That request was
/not/ from an IP address that is mine. Made me very worried...

I then proceeded to reverse DNS lookup and guess what? The Alexa/Amazon/A9
toolbars are not only keeping track of traffic, but also retain URL's of
pages that you visit and /use/ them, i.e. visit them and maybe crawling
them. Knowing that the Web Archive, AKA Time Machine belongs to Alexa, this
is scary at the least. These visits from Alexa might include hidden page
that you may have on your fileserver and occasionally access. Handle with
care!

Roy
From what I've read (and experienced) if you put a page online you risk
getting it indexed.
There are a couple of ways the URLs can be discovered. You can put a
robots.txt file but won't that just tell people where your
hidden directories are? I've found my
hidden directories in Yahoo's cache many months after I've deleted them.
I don't have any hidden directories anymore


From Google:
http://www.google.com/webmasters/bot.html#secretserver



Reply With Quote
  #4  
Old   
www.1-script.com
 
Posts: n/a

Default Re: Alexa Monkey Business - 10-13-2005 , 11:03 AM



Roy Schestowitz wrote:




Quote:
I have just found in my logs a request for one hidden file. That
request was
/not/ from an IP address that is mine. Made me very worried...

I then proceeded to reverse DNS lookup and guess what? The
Alexa/Amazon/A9
toolbars are not only keeping track of traffic, but also retain URL's
of
pages that you visit and /use/ them, i.e. visit them and maybe crawling
them. Knowing that the Web Archive, AKA Time Machine belongs to Alexa,
this
is scary at the least. These visits from Alexa might include hidden
page
that you may have on your fileserver and occasionally access. Handle
with
care!

Roy
Hey, Roy!

You thought I sound paranoid when I replied to your "Semantic Searches -
Knowledge Engines", haven’t you? What do you think about the search
engines vs. privacy case now? ;-)

I guess, the best way to hide a file would be to

#1 get rid of Alexa toolbar
#2 password-protect the directory it's in if you must have it
Web-accessible
#3 anything that does not absolutely have to be accessible should be moved
above the /public_html/ (or your other Wed root) folder
#4 every so often change its location. Make it name and location if you
feel a bit more paranoid now ;-)


--
Cheers,
Dmitri
See Site Sig Below
------------

--
##-----------------------------------------------#
Article posted with Web Developer's USENET Archiv
http://www.1-script.com/forum
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 15929 messages and counting
##-----------------------------------------------##


Reply With Quote
  #5  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: Alexa Monkey Business - 10-13-2005 , 11:44 AM



__/ [wd] on Thursday 13 October 2005 15:53 \__

Quote:
On Thu, 13 Oct 2005 12:23:25 +0100, Roy Schestowitz wrote:

I have just found in my logs a request for one hidden file. That request
was /not/ from an IP address that is mine. Made me very worried...

I then proceeded to reverse DNS lookup and guess what? The
Alexa/Amazon/A9 toolbars are not only keeping track of traffic, but also
retain URL's of pages that you visit and /use/ them, i.e. visit them and
maybe crawling them. Knowing that the Web Archive, AKA Time Machine
belongs to Alexa, this is scary at the least. These visits from Alexa
might include hidden page that you may have on your fileserver and
occasionally access. Handle with care!

Roy

From what I've read (and experienced) if you put a page online you risk
getting it indexed.
There are a couple of ways the URLs can be discovered. You can put a
robots.txt file but won't that just tell people where your
hidden directories are? I've found my
hidden directories in Yahoo's cache many months after I've deleted them.
I don't have any hidden directories anymore
This reminds me of the time when I sought conversion scripts for Palm's TODO
module and stumbled upon somebody's entire Palm data. It turned out to have
been the data of a *nix sysadmin in MIT (you thought they would know their
way around, right?) and that contained all of his passwords along with some
very personal details. I contacted him and informed him about it
immediately. He had to contact Google to have their cache removed, but I
never heard about the end of that story.

Roy

--
Roy S. Schestowitz | Useless fact: the buttocks is the largest muscle
http://Schestowitz.com | SuSE Linux | PGP-Key: 74572E8E
4:40pm up 49 days 4:54, 3 users, load average: 0.48, 0.52, 0.48
http://iuron.com - next generation of search paradigms


Reply With Quote
  #6  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: Alexa Monkey Business - 10-13-2005 , 11:53 AM



__/ [www.1-script.com] on Thursday 13 October 2005 16:03 \__

Quote:
Roy Schestowitz wrote:




I have just found in my logs a request for one hidden file. That
request was
/not/ from an IP address that is mine. Made me very worried...

I then proceeded to reverse DNS lookup and guess what? The
Alexa/Amazon/A9
toolbars are not only keeping track of traffic, but also retain URL's
of
pages that you visit and /use/ them, i.e. visit them and maybe crawling
them. Knowing that the Web Archive, AKA Time Machine belongs to Alexa,
this
is scary at the least. These visits from Alexa might include hidden
page
that you may have on your fileserver and occasionally access. Handle
with
care!

Roy

Hey, Roy!

You thought I sound paranoid when I replied to your "Semantic Searches -
Knowledge Engines", haven’t you? What do you think about the search
engines vs. privacy case now? ;-)

*smile*


Quote:
I guess, the best way to hide a file would be to

#1 get rid of Alexa toolbar

Quite frankly, I only use it for falsified ranks (I won't deny it and I'll
be bluntly honest as I usually am). I do, however, use the Netscraft
toolbar quite heavily (i.e. roll my eyes towards it). It is spying well,
but it provides many valuable facts that complement the page and give some
context. I can't live without it any longer.


Quote:
#2 password-protect the directory it's in if you must have it
Web-accessible

My personal directory (schestowitz.com/RSS) used to be supposedly 'hidden'
rather than password-protected until a year ago (it's almost 3 years of
age). I learned my lesson as people could access my so-called
dashboard/portal by looking at 'History' (public/friends' terminal).


Quote:
#3 anything that does not absolutely have to be accessible should be moved
above the /public_html/ (or your other Wed root) folder

Yes, I now use /tmp for sql dumps that the cron jobs take care of. I also
use the top-level directory for data transfers on occasions. I did not use
to do it, which is glaringly dangerous.


Quote:
#4 every so often change its location. Make it name and location if you
feel a bit more paranoid now ;-)

It no longer exists. The reason I noticed Alexa's behaviour is that a 404
had been flagged. It was a good reminder as to why I must check it every do
often.

Regards,

Roy

--
Roy S. Schestowitz | Useless fact: Sharks are immune to cancer
http://Schestowitz.com | SuSE Linux | PGP-Key: 74572E8E
4:45pm up 49 days 4:59, 3 users, load average: 1.25, 0.83, 0.61
http://iuron.com - next generation of search paradigms


Reply With Quote
  #7  
Old   
www.1-script.com
 
Posts: n/a

Default Re: Alexa Monkey Business - 10-13-2005 , 12:31 PM



Roy Schestowitz wrote:




Quote:
Quite frankly, I only use it for falsified ranks
Like so many other people that the whole concept of the Alexa rank is
pretty much useless. Hey, I got an idea: you could add your site's Alexa
rank to your list of useless facts in the sig ;-)

I do, however, use the Netscraft
Quote:
toolbar quite heavily (i.e. roll my eyes towards it). It is spying
well,
Not familiar, gotta take a look. Is it Netcraft for Firefox you are
talking about?
http://toolbar.netcraft.com/

Got your Iuron.com link up on my homepage. Needed a new image alt tags
(this is a search engines group, after all ;-) in the link. Check it out;
let me know if you OK it.

Speaking of Iuron: the concept diagram is a bit confusing. This is a
knowledge engine, right? So why some of the facts (28.3 deg. elongation or
40% mapped) aren't harvested from the page? I'm assuming the underscored
ones are the ones that make it into the "fact storage (base)". Is it
interactive, like whatever gets asked about, is stored? In this case it
would need to be re-indexed upon every request. I'm sure this is not
something you'd envisioned...

--
Cheers,
Dmitri
See Site Sig Below
-------------------------------------

--
##-----------------------------------------------#
Article posted with Web Developer's USENET Archiv
http://www.1-script.com/forum
Web and RSS gateway to your favorite newsgroup -
alt.internet.search-engines - 15938 messages and counting
##-----------------------------------------------##


Reply With Quote
  #8  
Old   
Roy Schestowitz
 
Posts: n/a

Default Re: Knowledge Engines Discussion - 10-13-2005 , 12:58 PM



__/ [www.1-script.com] on Thursday 13 October 2005 17:31 \__

Quote:
Roy Schestowitz wrote:




Quite frankly, I only use it for falsified ranks

Like so many other people that the whole concept of the Alexa rank is
pretty much useless. Hey, I got an idea: you could add your site's Alexa
rank to your list of useless facts in the sig ;-)

"Useless facts" in my signature are often both meaningful and true. Alexa
ranks are utter garbage. 5 co-workers with Alexa toolbar in a consulting
firm and... voila! Ranked 30,000th in the world. I think not! *smile*


Quote:
I do, however, use the Netscraft
toolbar quite heavily (i.e. roll my eyes towards it). It is spying
well,

Not familiar, gotta take a look. Is it Netcraft for Firefox you are
talking about?
http://toolbar.netcraft.com/

Yes. Once you get used to it, that becomes the first thing you look at when
you reach a page. A picture is worth a thousand words:

http://www.schestowitz.com/temp/1-script.jpg


Quote:
Got your Iuron.com link up on my homepage. Needed a new image alt tags
(this is a search engines group, after all ;-) in the link. Check it out;
let me know if you OK it.

Wow!! Thanks for that. I can add you to my blogroll (~400 hundred pages
ranging from PR0-5). Let me know your desired anchor text...


Quote:
Speaking of Iuron: the concept diagram is a bit confusing. This is a
knowledge engine, right? So why some of the facts (28.3 deg. elongation or
40% mapped) aren't harvested from the page?

I imposed a limit on the scale to keep the diagram simple. As I reflect, I
always find flaws and things I wish to change... but I probably will never
bother.


Quote:
I'm assuming the underscored
ones are the ones that make it into the "fact storage (base)".

No, these are the Wikipedia links, from which I nicked the screenshot.


Quote:
Is it
interactive, like whatever gets asked about, is stored? In this case it
would need to be re-indexed upon every request. I'm sure this is not
something you'd envisioned...

I think you have a misconception here and the draft proposal is needed as a
verbal complement. Facts are learned off-line by fetching plenty of data
from the Net. Facts that repeat themselves are encouraged whereas negation
or rare facts get discouraged. It's a genetic algorithms/machine learning
approach.

An interesting aspect that you led me to thinking about is learning from
user queries and response. I suspect that existing search engines are doing
this already (they ought to have figured it out by now). You can monitor
what pages are followed from the SERP's to understand expectation and
re-shuffle the SERP's (indices) in accordance with the user's selection of
results.

Another implication that I am inclined to ponder: Google have so many users
wandering around their SERP's. It would be very easy for them to automate
re-ordering of SERP's based on behaviour of that vast number of users, some
of whom will run rare and obscure search strings. This essentially means
that they have this element of momentum that money cannot buy. Even if
Yahoo bought more machines for crawling, that would not warrant them as
much information that gets available from users (spying). Advertising
likewise. Why do you think the NYT wants you to register to read articles?
Many newspapers do the same these days. They want to bind a name to the IP
address or have a cookie that binds an address, occupation, etc. to
readers. Learn your readers and you will know how to serve them. Tailored
(on-the-fly) content comes to mind. Imagine a front page that is based on
articles you have read. Amazon/Alexa/A9 do the same with books, but
WORSE... they filed a patent for that immensely genius, unprecedented
(sarcasm) idea.

As for learning from user, this reminds me of:

http://www.espgame.org/

Regards,

Roy

--
Roy S. Schestowitz | UNIX: Because a PC is a terrible thing to waste
http://Schestowitz.com | SuSE Linux | PGP-Key: 74572E8E
5:30pm up 49 days 5:44, 3 users, load average: 0.44, 0.53, 0.63
http://iuron.com - next generation of search paradigms


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.