HighDots Forums  

Correct way to handle these requests?

Website Design comp.infosystems.www.authoring.site-design


Discuss Correct way to handle these requests? in the Website Design forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Stan Brown
 
Posts: n/a

Default Correct way to handle these requests? - 08-10-2004 , 05:59 PM






My server logs show skillions of requests for a nonexistent file,
http://oakroadsystems.com/sharware/whistl19.zip
In the past, I got these about once a minute from a sina.com
address, day after day, so I ended up redirecting all such requests
back to sina.com since they ignored my e-mails.

But recently I had occasion to look back at the server logs for
something else, and saw many requests from a variety of numeric IP
addresses for that same nonexistent file. I guess the simplest thing
to do is return a status of "Forbidden" or "gone" for such requests.
I entered
Redirect 403 /sharware/whistl19.zip
in my .htaccess file, and when I request the above URL I do indeed
get a "Forbidden" response.

Is it better to have "Forbidden", "gone", or something else
entirely?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/

Reply With Quote
  #2  
Old   
Mark Parnell
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-10-2004 , 06:13 PM






On Tue, 10 Aug 2004 18:59:04 -0400, Stan Brown
<the_stan_brown (AT) fastmail (DOT) fm> declared in
comp.infosystems.www.authoring.site-design:

Quote:
My server logs show skillions of requests for a nonexistent file,
http://oakroadsystems.com/sharware/whistl19.zip
snip
Is it better to have "Forbidden", "gone", or something else
entirely?
400 seems the most logical to me. 403 is probably OK - technically it's
not forbidden as such, but it certainly isn't gone, since presumably it
was never there in the first place.

--
Mark Parnell
http://www.clarkecomputers.com.au


Reply With Quote
  #3  
Old   
Mark Parnell
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-10-2004 , 10:58 PM



On Tue, 10 Aug 2004 23:55:39 -0400, Brian
<usenet3 (AT) julietremblay (DOT) com.invalid> declared in
comp.infosystems.www.authoring.site-design:

Quote:
Why 400? That's for a bad request, bad on the level of HTTP. If
the request is not malformed, then 400 seems inappropriate.
Fair enough. I don't know all that much about HTTP. Should learn to keep
my mouth shut. :-)

Then again, I wouldn't have learnt anything if I hadn't said anything...

My head hurts.

--
Mark Parnell
http://www.clarkecomputers.com.au


Reply With Quote
  #4  
Old   
Stan Brown
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-11-2004 , 12:16 AM



"Mark Parnell" <webmaster (AT) clarkecomputers (DOT) com.au> wrote in
comp.infosystems.www.authoring.site-design:
Quote:
On Tue, 10 Aug 2004 18:59:04 -0400, Stan Brown
the_stan_brown (AT) fastmail (DOT) fm> declared in
comp.infosystems.www.authoring.site-design:

My server logs show skillions of requests for a nonexistent file,
http://oakroadsystems.com/sharware/whistl19.zip
snip
Is it better to have "Forbidden", "gone", or something else
entirely?

400 seems the most logical to me. 403 is probably OK - technically it's
not forbidden as such, but it certainly isn't gone, since presumably it
was never there in the first place.
Actually it was. When sina.com kept downloading it many, many times
a day, I changed its name and changed references to it in my site.
(The robots.txt file had already assured that well-behaved spiders
didn't index it.) While I believe site authors have a duty not to
break links, in this case nobody would bookmark the ZIP file -- they
would bookmark the "download" HTML file or more likely just download
the ZIP and not bookmark anything. So I don't believe I broke
bookmarks for any real users.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/


Reply With Quote
  #5  
Old   
Mark Parnell
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-11-2004 , 12:38 AM



On Wed, 11 Aug 2004 01:16:00 -0400, Stan Brown
<the_stan_brown (AT) fastmail (DOT) fm> declared in
comp.infosystems.www.authoring.site-design:

Quote:
Actually it was. When sina.com kept downloading it many, many times
a day, I changed its name and changed references to it in my site.
In that case, gone would be appropriate. Or, as Brian suggested, just
completely redirect any requests for that file.

Quote:
(The robots.txt file had already assured that well-behaved spiders
didn't index it.) While I believe site authors have a duty not to
break links, in this case nobody would bookmark the ZIP file -- they
would bookmark the "download" HTML file or more likely just download
the ZIP and not bookmark anything.
No other sites would have linked directly to it?

Quote:
So I don't believe I broke bookmarks for any real users.
Fair enough - I think that was probably the best option under the
circumstances.

--
Mark Parnell
http://www.clarkecomputers.com.au


Reply With Quote
  #6  
Old   
Stan Brown
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-11-2004 , 08:11 PM



"Brian" <usenet3 (AT) julietremblay (DOT) com.invalid> wrote in
comp.infosystems.www.authoring.site-design:
Quote:
Stan Brown wrote:
http://oakroadsystems.com/sharware/whistl19.zip

Is it better to have "Forbidden", "gone", or something else

"Mark Parnell" wrote:
presumably it was never there in the first place.

Stan Brown wrote:
Actually it was.

Ah, I didn't realize that. In that case, probably 410 is the right
choice, unless there's some part of this I'm ignorant of.
I see your logic, but I think I'm going to go with 403 anyway.
Here's why: "Gone, no forwarding address" (Apache message) seems
offputting to anyone who did bookmark the old file -- not that
anyone should have, but still. "Forbidden" might lead such a person
to write to me asking for access, at which point I can redirect
them.

Of course a real redirect would be best, but then I'll just get
hammered again. As I mentioned, when I checked the logs a few days
ago there were lots of hits, nearly all from numeric IP addresses.

Quote:
Stan Brown wrote:
When sina.com kept downloading it many, many times a day,

Had they hotlinked to it?
Sorry, I don't understand "hotlinked" in this context. Can you
explain please?

Quote:
If you don't mind my prying, what was the
resource, why was it so popular, and why was that a problem for you?
No, I don't mind at all. It was a little DOS shareware program to
play any tune through the PC system speaker, without the use of any
Windows facilities or sound card or anything of the sort.

But it wasn't "popular". The requesting server either was broken or
was doing some sort of attack. Since it was a DOS program in a ZIP
file, it _couldn't_ be run off the Web. You would have to download
it to your computer to do anything useful with it.

If you're curious, have a look at
http://oakroadsystems.com/sharware/whis.htm
That page links to the user guide and the download file. That file
is 49K (yes, forty-nine kilobytes) and includes a few sample tunes.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/


Reply With Quote
  #7  
Old   
jmm-list-gn
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-12-2004 , 12:54 AM



Brian wrote:
Quote:
If the requests appear to be malicious, then I'd just redirect them,
and save your server the trouble of serving up an error page. Just
send them to a non-existent server.

Redirect /sharware/whilsl19.zip http://getlost.invalid

Thank you! I had been bothered by the constant probes on our site for
/cgi-bin/formmail.pl, /mail.cgi, etc., but never took the time to do
something about it.
The Redirect is a good idea for reducing the waste of time and log
space these requests use. The following three redirects get rid of 80% of
garbage. (They could probably be combined into one.) Nothing runs directly
from /cgi-bin/, only from subdirectories.

RedirectMatch .*(Form|form).*(\.cgi|\.pl)? http://nowhere.pit/
RedirectMatch .*(Mail|mail).*(\.cgi|\.pl)? http://nowhere.pit/
RedirectMatch .*(send|contact|friend).*(\.cgi|\.pl)? http://nowhere.pit/

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)


Reply With Quote
  #8  
Old   
Stan Brown
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-12-2004 , 07:38 AM



"Brian" <usenet3 (AT) julietremblay (DOT) com.invalid> wrote in
comp.infosystems.www.authoring.site-design:
Quote:
Stan Brown wrote:
Sorry, I don't understand "hotlinked" in this context. Can you
explain please?

To avoid generating 404 log entries (these groups are all in web forums
of one kind or anohter), let's pretend that your domain is example.com
instead of oakraodsystems. Now let's say that I put on my personal page
(http://www.julietremblay.com/brian/) the following:

Get free <a href="http://www.example.com/sharware/whistl19.zip">DOS
audio software</a>.

That's hotlinking. Unless I've received permission, I'm stealing
bandwith from your domain, passing off your program as my content.
Ah, thanks. That's the same thing as what I've been calling "deep
linking".

I don't think that would be enough to explain it, though. These
requests were coming very fast, and there were a lot of them, and
all from the same domain. (Sorry I can't be more specific, but it's
been more than a year.) The time between them was so small that I
thought at the time it _had_ to be an automated process, not someone
clicking a link many times.

Thanks again for your help and advice.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/


Reply With Quote
  #9  
Old   
Alan J. Flavell
 
Posts: n/a

Default Re: Correct way to handle these requests? - 08-12-2004 , 09:28 AM



On Thu, 12 Aug 2004, Brian wrote:

Quote:
BTW, *please* change the tld in your redirect, unless you're sure that
.pit will never exist. The "invalid" tld is set up to always be, well,
invalid, and is thus an obvious choice.
Have you thought of redirecting them to http://127.0.0.1 ?

(If they were Windows users, you could redirect them to file:///c:/
which sometimes fools the occasional newbie. SCNR)


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.