HighDots Forums  

Permissible characters in attribute names

HTML Writing HTML for the Web (comp.infosystems.www.authoring.html)


Discuss Permissible characters in attribute names in the HTML forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
D.M. Procida
 
Posts: n/a

Default Permissible characters in attribute names - 02-06-2008 , 01:04 PM






What characters are permissible in (for example) id and class
attributes?

What happens when using characters that are used in HTML to encode other
caharacters, such as '&'?

Since HTML, XHTML and CSS are all different languages, I presume it's
theoretically possible that the specification for one could allow
characters in say a class name that wouldn't be permitted by another.

Daniele

Reply With Quote
  #2  
Old   
Nikita the Spider
 
Posts: n/a

Default Re: Permissible characters in attribute names - 02-07-2008 , 09:12 AM






In article
<1ibwkms.1v3lops1ik23w2N%real-not-anti-spam-address (AT) apple-juice (DOT) co.uk>,
real-not-anti-spam-address (AT) apple-juice (DOT) co.uk (D.M. Procida) wrote:

Quote:
What characters are permissible in (for example) id and class
attributes?

What happens when using characters that are used in HTML to encode other
caharacters, such as '&'?

Since HTML, XHTML and CSS are all different languages, I presume it's
theoretically possible that the specification for one could allow
characters in say a class name that wouldn't be permitted by another.
That's true. Why not have a look at the specs? Here's a list of HTML
4.01's attributes, with links to the formal definition of each:
http://www.w3.org/TR/html4/index/attributes.html

XHTML 1.0 is a reformulation of HTML in XML and carries this special
note about the ampersand you mentioned:
http://www.w3.org/TR/xhtml1/#C_12

XHTML 1.1 doesn't mention any attribute-specific differences from XHTML
1.0 Strict:
http://www.w3.org/TR/xhtml11/changes.html#a_changes

The whitespace and attribute normalization rules are slightly different
between HTML and XHTML, but only in pretty esoteric ways that don't
affect normal usage.

A glance at the CSS syntax makes it look as if the CSS rules for
identifiers allow a superset of what's allowed in (X)HTML:
http://www.w3.org/TR/CSS1#appendix-b


HTH

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more


Reply With Quote
  #3  
Old   
D.M. Procida
 
Posts: n/a

Default Re: Permissible characters in attribute names - 02-07-2008 , 04:27 PM



Nikita the Spider <NikitaTheSpider (AT) gmail (DOT) com> wrote:

Quote:
In article
1ibwkms.1v3lops1ik23w2N%real-not-anti-spam-address (AT) apple-juice (DOT) co.uk>,
real-not-anti-spam-address (AT) apple-juice (DOT) co.uk (D.M. Procida) wrote:

What characters are permissible in (for example) id and class
attributes?

What happens when using characters that are used in HTML to encode other
caharacters, such as '&'?

Since HTML, XHTML and CSS are all different languages, I presume it's
theoretically possible that the specification for one could allow
characters in say a class name that wouldn't be permitted by another.

That's true. Why not have a look at the specs? Here's a list of HTML
4.01's attributes, with links to the formal definition of each:
http://www.w3.org/TR/html4/index/attributes.html
That's excellent, thanks, exactly what I was looking for.

The answer I wanted is right there in:

<http://www.w3.org/TR/html4/types.html#type-cdata>

Quote:
A glance at the CSS syntax makes it look as if the CSS rules for
identifiers allow a superset of what's allowed in (X)HTML:
http://www.w3.org/TR/CSS1#appendix-b
Thanks again - I couldn't actually make much sense of that particular
document, but I'm happy to take your word for it.

Daniele


Reply With Quote
  #4  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: Permissible characters in attribute names - 02-10-2008 , 06:00 AM



Scripsit D.M. Procida:

Quote:
The answer I wanted is right there in:

http://www.w3.org/TR/html4/types.html#type-cdata
Unfortunately, it's not that simple. As you wrote in your question,
"HTML, XHTML and CSS are all different languages".

The id attributes are special, since they have the declared type ID,
which imposes simple and rather restrictive restrictions in HTML,
whereas in XHTML, by XML rules, the syntax is far more permissive - the
"letter" and "digit" concepts have Unicode meanings, covering not just
all the letters you can imagine but also ideographic characters.

The class attribute has obscure syntax in HTML (including XHTML), since
it's defined as a whitespace-separated list of class names, without
defining the syntax of class names. This is probably intentional and
reflects the original idea that _different_ style sheet languages,
potentially with different class name syntax, could be used in
conjunction with HTML.

In effect, by HTML rules, any string is allowed as the value of a class
attribute; the specific syntax is delegated to style sheet languages.
All the excuses for a CSS standard avoid saying directly, in prose, what
the acceptable class names are. You need to go down to the formal syntax
(in the CSS 2.1 draft, Appendx G), which says that a class name is an
IDENT. This concept is only defined implicitly by the formal description
of the lexical scanner. This gets us to the pattern
-?{nmstart}{nmchar}*
with
nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [\200-\377]
escape {unicode}|\\[^\r\n\f0-9a-f]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?

This is understandable (well, decipherable) to anyone familiar with
regular expressions - and plain gibberish to the rest of mankind,
probably including well over 99% of all web authors. Anyway, this means
that the correct answer to the question "what characters are allowed in
class names by the specs?" is not simple at all. And the correct answer
would not even be very helpful, since it does not address the question
"which characters can we (safely) use in class names?". The short
answer to _this_ question is "letters A thru Z and a thru z, digits 0
thru 9, and the Ascii hyphen '-'". The underline "_" is fairly safe,
too, but it doesn't really have much benefit over the hyphen.

So it's a mess, to make an understatement. Sometimes I think that there
should be standards for HTML and CSS (as opposite to the wannabe
"standards" of the W3C or the nominal ISO "standard" on HTML, which is
just the sloppy HTML 4 with an ISO stamp on it, together with some added
incomprehensible mess). Then I remember how standards organizations
work, and I consider jumping on the walls and screaming.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
  #5  
Old   
D.M. Procida
 
Posts: n/a

Default Re: Permissible characters in attribute names - 02-10-2008 , 07:15 AM



Jukka K. Korpela <jkorpela (AT) cs (DOT) tut.fi> wrote:

Quote:
http://www.w3.org/TR/html4/types.html#type-cdata

Unfortunately, it's not that simple. As you wrote in your question,
"HTML, XHTML and CSS are all different languages".

-?{nmstart}{nmchar}*
with
nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [\200-\377]
escape {unicode}|\\[^\r\n\f0-9a-f]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?

This is understandable (well, decipherable) to anyone familiar with
regular expressions - and plain gibberish to the rest of mankind,
probably including well over 99% of all web authors. Anyway, this means
that the correct answer to the question "what characters are allowed in
class names by the specs?" is not simple at all. And the correct answer
would not even be very helpful, since it does not address the question
"which characters can we (safely) use in class names?". The short
answer to _this_ question is "letters A thru Z and a thru z, digits 0
thru 9, and the Ascii hyphen '-'". The underline "_" is fairly safe,
too, but it doesn't really have much benefit over the hyphen.
Hmm. I was (partly) interested in finding out whether % signs were
permitted. And as far as the validators I tried were concerned, they
were happy with a class="10%", for example.

Daniele


Reply With Quote
  #6  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: Permissible characters in attribute names - 02-10-2008 , 01:13 PM



Scripsit D.M. Procida:

Quote:
-?{nmstart}{nmchar}*
with
nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [\200-\377]
escape {unicode}|\\[^\r\n\f0-9a-f]
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
....
Hmm.
Well, you didn't have to quote so extensively my message just to tell
that you didn't understand it, even though massive quoting is common on
Usenet to signal massive lack of understanding. I have now trimmed your
quotation to contain the essential part.

Quote:
I was (partly) interested in finding out whether % signs were
permitted.
In class attributes, I presume. It's always useful to ask the specific
question you have in your mind, instead of mere abstract issues;
sometimes you get specific answers.

The syntax quoted above means that in CSS, the "%" character is not
allowed _as such_ in a class selector in CSS. It does not match any of
the patterns; note that the "nonascii" (a misnomer, really) part refers
to the Latin 1 supplement, and "%" is in Ascii. So it's only allowed
using the "escape" notation \% or a "unicode" notation like \25. (The
magic number 25 is the hexadecimal code number of "%" in Unicode.)

Quote:
And as far as the validators I tried were concerned, they
were happy with a class="10%", for example.
Markup validators are happy with anything you might wish to put in a
class attribute value, including plain garbage, with very few formal
limitations. That's what CDATA means. And as I wrote, even the prose in
HTML specs does not specify further restrictions.

Quite apart from this, the class name 10% is not allowed in CSS, as
mentioned. In modern versions of CSS, it's disallowed for another reason
too: it begins with a digit.

You can see this if you use the "W3C CSS Validator" on a document
containing a purported class selector like
..10%
You will also get partly bogus instructions, since when escaping the
digit 1 using the "unicode" notation \31, you need append a space -
otherwise \310 would be taken as a single "unicode" notation! So a
correct way, by the specs, to write a selector that matches elements
with class="10%" is
..\31 0\%

Warning: Though this works on modern browsers, older software has
problems with it, and it's better to use simpler class names. After all,
class names are almost always invisible to users and need to be
understable only to people who work with the HTML and CSS code.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.