HighDots Forums  

RFC3986, backslash in URI/URLs

HTML Writing HTML for the Web (comp.infosystems.www.authoring.html)


Discuss RFC3986, backslash in URI/URLs in the HTML forum.

Reply
 
Thread Tools Display Modes
  #1  
Old   
Alan J. Flavell
 
Posts: n/a

Default RFC3986, backslash in URI/URLs - 06-10-2006 , 07:25 AM







[Sorry, there isn't a newsgroup for discussing URLs as such - this
seemed a reasonably on-topic place to discuss it...?]

The story so far: on somewhat unrelated newsgroup, my attention
fell upon the URL:
http://www.speedtouchdsl.com/prod706.htm
which contains a link to the purported URL:
http://www.speedtouchdsl.com/pdf\dat...06WL-780WL.pdf

Comparing the latter with other URLs in that area, it appeared that
the "\" was a probable blunder for "/". However, since their web
server is IIS, it appears that their server silently fixes-up this
blunder[1], and delivers the intended content. My recollection of
RFC1738 was that an unencoded "\" ought not to appear in a URL, so I
was initially inclined to rate this URL as broken...

However, this then led me down the trail of RFC2396, which 'updates
and merges "Uniform Resource Locators" [RFC1738] and "Relative Uniform
Resource Locators" [RFC1808]', and RFC3986, which "obsoletes rfc 1808
and updates rfc 1738".

In RFC2396 2.4.3, the backslash is listed under "Excluded US-ASCII
characters", under the subcategory of "unwise", with the "must"
requirement:

Quote:
Data corresponding to excluded characters must be escaped in order to
be properly represented within a URI.
So far, so good.

But in RFC3986, this character "\" seems to have been stealthily
dropped from the list of characters needint to be escaped. I find no
mention of this change in Appendix D, "Changes from RFC2396".

The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations":

Quote:
Special care should be taken when the URI path interpretation process
involves the use of a back-end file system or related system
functions. File systems typically assign an operational meaning to
special characters, such as the "/", "\", ":", "[", and "]"
Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?

In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

Quote:
Shouldn't backslash itself be included in the must-be-escaped
list?
Shouldn't it?

regards

[1] Of course, this isn't a situation that I meet in my own
serveradmin-ing using Apache. If the author codes "\" instead of "/"
in a URL, and attempts to follow the link with a www-conforming
browser, the link does not work. If they use IE instead, however, it
appears that it silently fixes-up the error on the *client* side. It
seems from my tests that IE6 makes no attempt to access the cited URL
directly - it replaces the "\" by "/" before even trying (whereas
Mozilla replaces the "\" by "%5C", after which, Apache, he say "no").

So it looks as if MS give themselves two bites at this fuxup: once in
their browser-like object, and once in their web server.

(Another reason why authors are misguided if they use MS software as
their only test of their web pages. But I digress.)

--




Reply With Quote
  #2  
Old   
Jack
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 08:06 AM






Alan J. Flavell inquired:

Quote:
In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped |
list?

Shouldn't it?
I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.

Quote:
It seems from my tests that IE6 makes no attempt to access the cited
URL directly - it replaces the "\" by "/" before even trying
Yes, IE mangles URLs from the address-bar in several ways before sending
them off over the interweb.

--
Jack.

[1] The expressions "obviously..." and "it's obvious that..." are
frequently encountered when the author is about to perpetrate some
inadvertent fallacy.


Reply With Quote
  #3  
Old   
Harlan Messinger
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 09:04 AM



Jack wrote:
Quote:
Alan J. Flavell inquired:


In http://lists.w3.org/Archives/Public/...5May/0004.html , I
found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped |
list?

Shouldn't it?

I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.
How can you mean "regardless of language", given that which characters
are escape characters depends entirely on what language is in use?


Reply With Quote
  #4  
Old   
Harlan Messinger
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 09:07 AM



Alan J. Flavell wrote:
Quote:
The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations":

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"

Aside from this potential security exposure, it appears to me that the
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?
May I ask what the source of risk is? You mention that backslash being
the path delimiter on a back-end file system, but that can't be the
problem, since the forward slash is the path delimiter on other file
systems, and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.


Reply With Quote
  #5  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 10:27 AM



Harlan Messinger <hmessinger.removethis (AT) comcast (DOT) net> scripsit:

Quote:
Jack wrote:
- -
I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.

How can you mean "regardless of language", given that which characters
are escape characters depends entirely on what language is in use?
Well, I'd say that the _principle_ of escaping an escape character,
regardless of language (notation), is adequate within broad limits. Jack's
error is that he assumes that the backslash is an escape character in the
"language" of URLs, i.e. URL syntax.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
  #6  
Old   
Alan J. Flavell
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 11:54 AM



On Sat, 10 Jun 2006, Harlan Messinger wrote:

Quote:
Alan J. Flavell wrote:
The only substantive mention of "\" which I can find is in section
7.3 under the main heading of "7. Security Considerations":

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"
[...]

Quote:
Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the
changes?

May I ask what the source of risk is? You mention that backslash
being the path delimiter on a back-end file system,
Well, I might have done so, if I had thought about it; but in fairness
it wasn't *my* mention, it was a quote from the RFC. :-}

Quote:
but that can't be the problem, since
the forward slash is the path delimiter on other file systems,
I don't agree. In principle, the "/" has a defined meaning in a URL
(it's a hierarchy separator, if I can put it loosely), and anyone
interpreting a URL is required to attribute that meaning to it - no
matter what their local file system separator might be.

Whereas "\" has no defined meaning in the structure of a URL, and
could (given an insufficiently paranoid parser) possibly find its way
into a filesystem reference. Which could have significant
consequences on, say, Windows.

Quote:
and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.
Because the URL "/" (the one that functions as a URL hierarchy
separator) never gets that far. By then it would have been turned
into the filesystem hiararchy separator, whatever that might be.
Yes, it might sometimes be "/", but don't let that fool you. It might
just as well been turned into ":" for a different filesystem, or into
a hierarchical database query or whatever, in the general case.

I think that's the sort of thing that the RFC authors have in mind,
anyway.


Reply With Quote
  #7  
Old   
Jack
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 01:56 PM



Harlan Messinger wrote:
Quote:
Jack wrote:
Alan J. Flavell inquired:


In
http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?

I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.

How can you mean "regardless of language", given that which
characters are escape characters depends entirely on what language is
in use?
Try this:

"For any language x, that set of characters which are escape characters
in x should themselvesd be escaped if they are to take their normal
values in some expression."

I thought that was obviously my meaning, and it seems to require some
perverse gymnastics to get my original utterance to mean something
different.

--
Jack.


Reply With Quote
  #8  
Old   
David E. Ross
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 08:56 PM



John Dunlop wrote:
Quote:
Alan J. Flavell

In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?

I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped: the word 'excluded'
appears in RFC3986 only in unrelated contexts, and I can't find mention
of this removal anywhere in the changeover notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html

Anyway, backslashes still can't occur in URLs, since no production
allows them.
The real questions are: What is the meaning of a back-slash in a URL?
Does it have a special meaning, the way reserved characters (/, $, &, ?,
etc) have? If it has a special meaning, where is it documented so that
browser developers and Web page authors will know about it?

Note that it cannot just another character in the name of a path or
file. RFC 3986, Appendix A, indicates a path can have a name consisting
only of alphabetic characters, numerals, -, +, ., _, ~, and
percent-encoded characters. A path may also have @, :, and certain
reserved characters; but all these have special meainings within a path
(taking us back to the third question in my first paragraph).

In a very loose sense, percent-encoding is a form of escaping a
character. However, a percent-encoded character might have a different
meaing in a URL than the related literal character. For example, "%25"
represents the character "%". Obviously, the former (just a character
in a string of characters) is not treated the same as the latter (the
signal for percent-encoding).

I have yet to see a use of back-slash in a URL that was not an error,
generally a typo by the Web page author.

--

David E. Ross
<http://www.rossde.com/>

Concerned about someone (e.g., Pres. Bush) snooping
into your E-mail? Use PGP.
See my <http://www.rossde.com/PGP/>


Reply With Quote
  #9  
Old   
Harlan Messinger
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 09:17 AM



Jack wrote:
Quote:
Harlan Messinger wrote:
Jack wrote:
Alan J. Flavell inquired:


In
http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?

I think it should. I think it's rather obvious[1]; regardless of
language, an escape character should always itself be escaped if it
is to take its "normal" value in some expression.

How can you mean "regardless of language", given that which
characters are escape characters depends entirely on what language is
in use?

Try this:

"For any language x, that set of characters which are escape characters
in x should themselvesd be escaped if they are to take their normal
values in some expression."

I thought that was obviously my meaning, and it seems to require some
perverse gymnastics to get my original utterance to mean something
different.
It was obvious that that's what you should have meant, but it wasn't
obvious, if you did mean that, why you would have thought a remark about
the escaping of escape characters to be relevant, given that backslashes
*aren't* escape characters in the context being discussed (URL syntax).


Reply With Quote
  #10  
Old   
Harlan Messinger
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 09:19 AM






Alan J. Flavell wrote:
Quote:
On Sat, 10 Jun 2006, Harlan Messinger wrote:

Alan J. Flavell wrote:
The only substantive mention of "\" which I can find is in section
7.3 under the main heading of "7. Security Considerations":

|Special care should be taken when the URI path interpretation process
| involves the use of a back-end file system or related system
| functions. File systems typically assign an operational meaning to
| special characters, such as the "/", "\", ":", "[", and "]"
[...]

Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the
changes?
May I ask what the source of risk is? You mention that backslash
being the path delimiter on a back-end file system,

Well, I might have done so, if I had thought about it; but in fairness
it wasn't *my* mention, it was a quote from the RFC. :-}

but that can't be the problem, since
the forward slash is the path delimiter on other file systems,

I don't agree. In principle, the "/" has a defined meaning in a URL
(it's a hierarchy separator, if I can put it loosely), and anyone
interpreting a URL is required to attribute that meaning to it - no
matter what their local file system separator might be.

Whereas "\" has no defined meaning in the structure of a URL, and
could (given an insufficiently paranoid parser) possibly find its way
into a filesystem reference. Which could have significant
consequences on, say, Windows.

and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.

Because the URL "/" (the one that functions as a URL hierarchy
separator) never gets that far. By then it would have been turned
into the filesystem hiararchy separator, whatever that might be.
Yes, it might sometimes be "/", but don't let that fool you. It might
just as well been turned into ":" for a different filesystem, or into
a hierarchical database query or whatever, in the general case.
OK, I see. Thanks for the explanation.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.