RFC3986, backslash in URI/URLs - 06-10-2006 , 06:25 AM
[Sorry, there isn't a newsgroup for discussing URLs as such - this
seemed a reasonably on-topic place to discuss it...?]
The story so far: on somewhat unrelated newsgroup, my attention
fell upon the URL:
which contains a link to the purported URL:
Comparing the latter with other URLs in that area, it appeared that
the "\" was a probable blunder for "/". However, since their web
server is IIS, it appears that their server silently fixes-up this
blunder, and delivers the intended content. My recollection of
RFC1738 was that an unencoded "\" ought not to appear in a URL, so I
was initially inclined to rate this URL as broken...
However, this then led me down the trail of RFC2396, which 'updates
and merges "Uniform Resource Locators" [RFC1738] and "Relative Uniform
Resource Locators" [RFC1808]', and RFC3986, which "obsoletes rfc 1808
and updates rfc 1738".
In RFC2396 2.4.3, the backslash is listed under "Excluded US-ASCII
characters", under the subcategory of "unwise", with the "must"
But in RFC3986, this character "\" seems to have been stealthily
dropped from the list of characters needint to be escaped. I find no
mention of this change in Appendix D, "Changes from RFC2396".
The only substantive mention of "\" which I can find is in section 7.3
under the main heading of "7. Security Considerations":
cited URL, which I would like to have categorised as defective, would
be rated as OK by this latest RFC. And since the server returns the
desired resource when this misbegotten URL is presented, I can't even
rate it as a blunder - can I?
Any suggestions why this apparently risky, and IMHO undesirable,
change was smuggled into the RFC without mentioning it in the changes?
In http://lists.w3.org/Archives/Public/...5May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -
 Of course, this isn't a situation that I meet in my own
serveradmin-ing using Apache. If the author codes "\" instead of "/"
in a URL, and attempts to follow the link with a www-conforming
browser, the link does not work. If they use IE instead, however, it
appears that it silently fixes-up the error on the *client* side. It
seems from my tests that IE6 makes no attempt to access the cited URL
directly - it replaces the "\" by "/" before even trying (whereas
Mozilla replaces the "\" by "%5C", after which, Apache, he say "no").
So it looks as if MS give themselves two bites at this fuxup: once in
their browser-like object, and once in their web server.
(Another reason why authors are misguided if they use MS software as
their only test of their web pages. But I digress.)
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 07:06 AM
Alan J. Flavell inquired:
language, an escape character should always itself be escaped if it is
to take its "normal" value in some expression.
them off over the interweb.
 The expressions "obviously..." and "it's obvious that..." are
frequently encountered when the author is about to perpetrate some
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 08:04 AM
are escape characters depends entirely on what language is in use?
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 08:07 AM
Alan J. Flavell wrote:
the path delimiter on a back-end file system, but that can't be the
problem, since the forward slash is the path delimiter on other file
systems, and the interpretation of the forward slash in URIs as a path
delimiter doesn't create risk on that account.
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 09:27 AM
Harlan Messinger <hmessinger.removethis (AT) comcast (DOT) net> scripsit:
regardless of language (notation), is adequate within broad limits. Jack's
error is that he assumes that the backslash is an escape character in the
"language" of URLs, i.e. URL syntax.
Jukka K. Korpela ("Yucca")
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 10:54 AM
On Sat, 10 Jun 2006, Harlan Messinger wrote:
it wasn't *my* mention, it was a quote from the RFC. :-}
(it's a hierarchy separator, if I can put it loosely), and anyone
interpreting a URL is required to attribute that meaning to it - no
matter what their local file system separator might be.
Whereas "\" has no defined meaning in the structure of a URL, and
could (given an insufficiently paranoid parser) possibly find its way
into a filesystem reference. Which could have significant
consequences on, say, Windows.
separator) never gets that far. By then it would have been turned
into the filesystem hiararchy separator, whatever that might be.
Yes, it might sometimes be "/", but don't let that fool you. It might
just as well been turned into ":" for a different filesystem, or into
a hierarchical database query or whatever, in the general case.
I think that's the sort of thing that the RFC authors have in mind,
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 12:56 PM
Harlan Messinger wrote:
"For any language x, that set of characters which are escape characters
in x should themselvesd be escaped if they are to take their normal
values in some expression."
I thought that was obviously my meaning, and it seems to require some
perverse gymnastics to get my original utterance to mean something
Re: RFC3986, backslash in URI/URLs - 06-10-2006 , 07:56 PM
John Dunlop wrote:
Does it have a special meaning, the way reserved characters (/, $, &, ?,
etc) have? If it has a special meaning, where is it documented so that
browser developers and Web page authors will know about it?
Note that it cannot just another character in the name of a path or
file. RFC 3986, Appendix A, indicates a path can have a name consisting
only of alphabetic characters, numerals, -, +, ., _, ~, and
percent-encoded characters. A path may also have @, :, and certain
reserved characters; but all these have special meainings within a path
(taking us back to the third question in my first paragraph).
In a very loose sense, percent-encoding is a form of escaping a
character. However, a percent-encoded character might have a different
meaing in a URL than the related literal character. For example, "%25"
represents the character "%". Obviously, the former (just a character
in a string of characters) is not treated the same as the latter (the
signal for percent-encoding).
I have yet to see a use of back-slash in a URL that was not an error,
generally a typo by the Web page author.
David E. Ross
Concerned about someone (e.g., Pres. Bush) snooping
into your E-mail? Use PGP.
See my <http://www.rossde.com/PGP/>
Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 08:17 AM
obvious, if you did mean that, why you would have thought a remark about
the escaping of escape characters to be relevant, given that backslashes
*aren't* escape characters in the context being discussed (URL syntax).
Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 08:19 AM
Alan J. Flavell wrote: