HighDots Forums  

RFC3986, backslash in URI/URLs

HTML Writing HTML for the Web (comp.infosystems.www.authoring.html)


Discuss RFC3986, backslash in URI/URLs in the HTML forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
Jack
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 11:56 AM






Harlan Messinger wrote:
Quote:
"For any language x, that set of characters which are escape
characters in x should themselvesd be escaped if they are to take
their normal values in some expression."

I thought that was obviously my meaning, and it seems to require
some perverse gymnastics to get my original utterance to mean
something different.

It was obvious that that's what you should have meant, but it wasn't
obvious, if you did mean that, why you would have thought a remark
about the escaping of escape characters to be relevant, given that
backslashes *aren't* escape characters in the context being discussed
(URL syntax).
Indeed. Without giving it much thought, I had temporarily formed the
mistaken impression that backslashes *were* escape characters in URLs,
and that was the context of my post. Sorry for confusion.

--
Jack.


Reply With Quote
  #12  
Old   
Alan J. Flavell
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 04:15 PM






On Sun, 10 Jun 2006, John Dunlop wrote:

Quote:
Alan J. Flavell

In http://lists.w3.org/Archives/Public/www-style/2005May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?

I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped:
ah, thanks for explaining that...

Quote:
the word 'excluded' appears in RFC3986 only in unrelated contexts,
and I can't find mention of this removal anywhere in the changeover
notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html
Well, appendix D does say:

Section 2, on characters, has been rewritten to explain what
characters are reserved, when they are reserved, and why they are
reserved, even when they are not used as delimiters by the generic
syntax.

So that, at least, does still define "reserved" characters.

But it also says

... URI normalizers
are now given license to decode any percent-encoded octets
corresponding to unreserved characters.

Oh, hang on! "Reserved Characters" are listed in 2.2, while
"Unreserved characters" are listed in 2.3. And the backslash is
neither "reserved" nor "unreserved".

So, "unreserved" does not mean "any characters which are not
reserved". There are characters which are neither. Is that
confusing?

Indeed, if I take an ASCII table, and mark off the (non-control)
characters which they designate as reserved, and as unreserved, I'm
left with quite a few which are neither, as follows.

"%" is obviously special ...

Leaving (by my reckoning) space, ", <, >, \, ^, `, {, |, and }

Is there a name for this category of characters - that are neither
reserved nor unreserved?

Quote:
Anyway, backslashes still can't occur in URLs, since no production
allows them.
I must admit I hadn't tried approaching the problem from that
direction, but, now that you put it that way, it does make sense.

thanks


Reply With Quote
  #13  
Old   
David E. Ross
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 04:19 PM



Alan J. Flavell wrote:
Quote:
On Sun, 10 Jun 2006, John Dunlop wrote:

Alan J. Flavell

In http://lists.w3.org/Archives/Public/www-style/2005May/0004.html ,
I found a top-posted "answer" which is resolutely ignoring the
bottom-quoted question -

| > Shouldn't backslash itself be included in the must-be-escaped
| > list?

Shouldn't it?
I suppose you could say backslashes *are* included in the
must-be-escaped list, if you recognise the list as implied. The
explicit list seems to have been silently dropped:

ah, thanks for explaining that...

the word 'excluded' appears in RFC3986 only in unrelated contexts,
and I can't find mention of this removal anywhere in the changeover
notes:

http://www.gbiv.com/protocols/uri/rev-2002/issues.html

Well, appendix D does say:

Section 2, on characters, has been rewritten to explain what
characters are reserved, when they are reserved, and why they are
reserved, even when they are not used as delimiters by the generic
syntax.

So that, at least, does still define "reserved" characters.

But it also says

... URI normalizers
are now given license to decode any percent-encoded octets
corresponding to unreserved characters.

Oh, hang on! "Reserved Characters" are listed in 2.2, while
"Unreserved characters" are listed in 2.3. And the backslash is
neither "reserved" nor "unreserved".

So, "unreserved" does not mean "any characters which are not
reserved". There are characters which are neither. Is that
confusing?

Indeed, if I take an ASCII table, and mark off the (non-control)
characters which they designate as reserved, and as unreserved, I'm
left with quite a few which are neither, as follows.

"%" is obviously special ...

Leaving (by my reckoning) space, ", <, >, \, ^, `, {, |, and }

Is there a name for this category of characters - that are neither
reserved nor unreserved?

Anyway, backslashes still can't occur in URLs, since no production
allows them.

I must admit I hadn't tried approaching the problem from that
direction, but, now that you put it that way, it does make sense.

thanks
My inference is that, if a character is neither reserved nor unreserved,
it must be prohibited.

--

David E. Ross
<http://www.rossde.com/>

Concerned about someone (e.g., Pres. Bush) snooping
into your E-mail? Use PGP.
See my <http://www.rossde.com/PGP/>


Reply With Quote
  #14  
Old   
David E. Ross
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 04:23 PM



John Dunlop wrote:
Quote:
David E. Ross:

The real questions are:

I think these are different points of discussion, no more real or
imaginary than the original, but more off-topic. The original was
about the status of backslashes wrt URLs, which has a direct bearing on
whether or not a doc violates or conforms to the spec.
I used the phrase "The real questions are" somewhat facitiously to point
out that the original poster should tell us how he thinks back-slash
should be interpreted. I don't think he can, thus concluding the
discussion.

--

David E. Ross
<http://www.rossde.com/>

Concerned about someone (e.g., Pres. Bush) snooping
into your E-mail? Use PGP.
See my <http://www.rossde.com/PGP/>


Reply With Quote
  #15  
Old   
Michael Winter
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-11-2006 , 05:29 PM



On 11/06/2006 21:15, Alan J. Flavell wrote:

[snip]

Quote:
Oh, hang on! "Reserved Characters" are listed in 2.2, while
"Unreserved characters" are listed in 2.3. And the backslash is
neither "reserved" nor "unreserved".

So, "unreserved" does not mean "any characters which are not
reserved". There are characters which are neither. Is that confusing?
I wouldn't say so. An unreserved character is one that will never have a
special meaning, whereas reserved characters do either in generic- or
scheme-specific URI syntax.

The definition of reserved and unreserved characters has, as far as I
can see, the most significance within the percent-encoding mechanism
and, by extension, syntax-based normalisation where URI normalisers may
decode a subset of percent-encoded sequences. The remaining set of
characters have no real significance; a scheme should not use them as
delimiters. As such, special treatment from URI normalisers is not
necessary, so they do not feature in either category.

<rationale>

2.2 Reserved Characters says:

URI producing applications should percent-encode data octets
that correspond to characters in the reserved set unless
these characters are specifically allowed by the URI scheme
to represent data in that component.

So, unless present to delimit subcomponents within a URI, all
reserved characters should be percent-encoded.

Furthermore, "characters in the reserved set are protected from
normalization". That is to say that premature decoding may
render delimiters ambiguous so reserved characters should not
be decoded until the URI has been completely parsed, as noted
in 2.4 When to Encode or Decode:

When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if
any) must be parsed and separated before the percent-encoded
octets within those components can be safely decoded, as
otherwise the data may be mistaken for component delimiters.

Finally, 2.3 Unreserved Characters states:

For consistency, percent-encoded octets in the ranges of
ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D),
period (%2E), underscore (%5F), or tilde (%7E) [ed.: the
unreserved set] should not be created by URI producers and,
when found in a URI, should be decoded to their corresponding
unreserved characters by URI normalizers.

</rationale>

[snip]

Quote:
Is there a name for this category of characters - that are neither
reserved nor unreserved?
Not to my knowledge, and I doubt it warrants a name, either. :-)

[snip]

Mike


Apologies for the quantity of quotes, though at least it saves searching
the RFC. For the record, the text was taken exclusively from #3986.

--
Michael Winter
Prefix subject with [News] before replying by e-mail.


Reply With Quote
  #16  
Old   
Pierre Goiffon
 
Posts: n/a

Default Re: RFC3986, backslash in URI/URLs - 06-12-2006 , 04:41 AM



Alan J. Flavell wrote:
Quote:
The story so far: on somewhat unrelated newsgroup, my attention
fell upon the URL:
http://www.speedtouchdsl.com/prod706.htm
which contains a link to the purported URL:
http://www.speedtouchdsl.com/pdf\datasheet706WL-780WL.pdf

Comparing the latter with other URLs in that area, it appeared that
the "\" was a probable blunder for "/". However, since their web
server is IIS, it appears that their server silently fixes-up this
blunder[1], and delivers the intended content.
A set of tools for IIS, IIS Lockdown (it should certainly be installed
on all IIS servers), provides URL Scan. This latter disable by default
URL containing "\" (see http://www.securityfocus.com/infocus/1755 and
the default values for the DenyUrlSequences parameter)


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.