HighDots Forums  

Combining diacritical marks and HTML+CSS

Cascading Style Sheets Layout/presentation on the WWW (comp.infosystems.www.authoring.stylesheets)


Discuss Combining diacritical marks and HTML+CSS in the Cascading Style Sheets forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Tristan Miller
 
Posts: n/a

Default Combining diacritical marks and HTML+CSS - 11-12-2007 , 02:07 PM






Greetings.

Is it possible using HTML and CSS to represent a combining diacritical mark
in a different style from the letter it modifies? For example, say I want
to render Å‘ (Latin small letter o with a double acute accent), but with
the o in black and the double acute accent in green. Are either of the
following valid?

1. <span style="color: black;">o</span><span style="color:
green;">&#x030B;</span>

2. <span style="color: black;">o<span style="color:
green;">&#x030B;</span></span>

Neither of the two browsers I tested (SeaMonkey 1.1.6 and Konqueror 3.5.8,
both on GNU/Linux) render the examples as intended. Is there some part of
the HTML, CSS, or Unicode standards which says that combining diacritical
marks can't be styled independently, or are my browsers buggy?

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you

Reply With Quote
  #2  
Old   
David E. Ross
 
Posts: n/a

Default Re: Combining diacritical marks and HTML+CSS - 11-12-2007 , 06:39 PM






On 11/12/2007 11:07 AM, Tristan Miller wrote:
Quote:
Greetings.

Is it possible using HTML and CSS to represent a combining diacritical mark
in a different style from the letter it modifies? For example, say I want
to render Å‘ (Latin small letter o with a double acute accent), but with
the o in black and the double acute accent in green. Are either of the
following valid?

1. <span style="color: black;">o</span><span style="color:
green;">&#x030B;</span

2. <span style="color: black;">o<span style="color:
green;">&#x030B;</span></span

Neither of the two browsers I tested (SeaMonkey 1.1.6 and Konqueror 3.5.8,
both on GNU/Linux) render the examples as intended. Is there some part of
the HTML, CSS, or Unicode standards which says that combining diacritical
marks can't be styled independently, or are my browsers buggy?

Regards,
Tristan

In general, the mark is actually part of the character and not separate.
That is, an "ñ" is not merely an "n" with a tilde. Instead, it's quite
distinct (at least in Spanish) from an "n".

Thus, having separate colors would be inappropriate. What you want
would be the same as having an "i" with the dot a different color than
the stroke or having an "A" withe the cross-bar a different color than
the two diagonals.

--
David E. Ross
<http://www.rossde.com/>

Natural foods can be harmful: Look at all the
people who die of natural causes.


Reply With Quote
  #3  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: Combining diacritical marks and HTML+CSS - 11-12-2007 , 07:39 PM



Scripsit Tristan Miller:

Quote:
Is it possible using HTML and CSS to represent a combining
diacritical mark in a different style from the letter it modifies?
Maybe. Not reliably. And not on most browsers these days.

Quote:
For example, say I want to render o (Latin small letter o with a
double acute accent), but with the o in black and the double acute
accent in green. Are either of the following valid?

1. <span style="color: black;">o</span><span style="color:
green;">&#x030B;</span

2. <span style="color: black;">o<span style="color:
green;">&#x030B;</span></span
A simple automated test would tell you that they are valid, but this does
not mean much. Validity has nothing to do with meaning, to begin with, and
it does not imply correctness.

We might intuitively expect that the diacritic appears in the color
specified, but this won't happen in general. Instead, it appears in the
color of the base character.

Quote:
Neither of the two browsers I tested (SeaMonkey 1.1.6 and Konqueror
3.5.8, both on GNU/Linux) render the examples as intended.
Neither does IE 7.

Quote:
Is there
some part of the HTML, CSS, or Unicode standards which says that
combining diacritical marks can't be styled independently, or are my
browsers buggy?
No.

Well, any browser (or any nontrivial program) is buggy, but not in this
matter.

HTML or CSS specifications do not specify the treatment of diacritic marks,
and they don't even require support to them. The document character set is
Unicode (or UCS or ISO 10646 or whatever you call it), but this does not
imply any requirement on supporting all Unicode characters, or even a
particular subset thereof.

And if you do process combining diacritic marks by Unicode rules, then o
with double acute is to be treated as compatibility equivalent to the single
character U+0151 (Latin small letter o with double acute). In general,
programs should not be expected to treat compatibility equivalent characters
differently; they may do so, but they are surely not required to do so.

In particular, a browser may well internally map the combination to U+0151
at the character level. It's nothing that complicated, though. They probably
just ignore the styles you set for a combining diacritic mark.

You're not the first one to ask for the feature. It has been discussed at
length in the Unicode mailing list. See e.g. the discussion "Coloured
diacritics",
http://www.unicode.org/mail-arch/uni...-m12/0379.html

The bottom line is that no, you can't expect to be able to do such things.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
  #4  
Old   
Tristan Miller
 
Posts: n/a

Default Re: Combining diacritical marks and HTML+CSS - 11-12-2007 , 09:10 PM



Greetings.

In article <rk6_i.251456$1y6.35250 (AT) reader1 (DOT) news.saunalahti.fi>, Jukka K.
Korpela wrote:
Quote:
And if you do process combining diacritic marks by Unicode rules, then o
with double acute is to be treated as compatibility equivalent to the
single character U+0151 (Latin small letter o with double acute). In
general, programs should not be expected to treat compatibility
equivalent characters differently; they may do so, but they are surely
not required to do so.

In particular, a browser may well internally map the combination to
U+0151 at the character level. It's nothing that complicated, though.
They probably just ignore the styles you set for a combining diacritic
mark.
Yes; this much is obvious, since the result is the same with other
combinations of characters and combining diacritical marks for which there
is no equivalent single character -- say, U+0040 U+030B (@Ì‹).

Quote:
You're not the first one to ask for the feature. It has been discussed at
length in the Unicode mailing list. See e.g. the discussion "Coloured
diacritics",
http://www.unicode.org/mail-arch/uni...-m12/0379.html
That page is password-protected.

Quote:
The bottom line is that no, you can't expect to be able to do such
things.
I suspected as much, but (except in the case of compatibility equivalent
characters) you haven't provided any normative reason for this. Are you
saying that this "bottom line" is simply an implementation choice?

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you


Reply With Quote
  #5  
Old   
Tristan Miller
 
Posts: n/a

Default Re: Combining diacritical marks and HTML+CSS - 11-12-2007 , 09:50 PM



Greetings.

In article <hbqdnecGG_CjeaXanZ2dnUVZ_viunZ2d (AT) softcom (DOT) net>, David E. Ross
wrote:
Quote:
In general, the mark is actually part of the character and not separate.
That is, an "ñ" is not merely an "n" with a tilde. Instead, it's quite
distinct (at least in Spanish) from an "n".
In general, yes. But not always. Sometimes the diacritical marks have an
inherent semantic meaning. For example, ö in Hungarian is a glyph in its
own right, and should not be considered as decomposable into a letter o
with a diaeresis. However, in Newtonian calculus, ö is not a glyph in its
own right; it is the variable o with the addition of two dots to indicate
a double derivative.

Quote:
Thus, having separate colors would be inappropriate.
It depends on one's purpose. Say I am collaborating on a mathematical
paper and using software which tracks the various authors' changes to the
document by marking them up in a different style. (For example, the
default text is black, but changes by author A are in red.) If author A
changes the equation f(x) = 2x + o to f(x) = 2x + ö, then I might expect
that the o remains black but the diaeresis would appear in red, since she
is not changing the variable but rather specifying an operation on the
variable.

Quote:
What you want
would be the same as having an "i" with the dot a different color than
the stroke or having an "A" withe the cross-bar a different color than
the two diagonals.
Again, that might be entirely appropriate in some cases, such as in
typographical education. If I want to illustrate things like ascenders,
descenders, tittles, serifs, crossbars, etc., then I might well use
diagrams with those parts highlighted in another colour. Indeed, that's
exactly what is done in some Wikipedia articles (e.g.,
<http://en.wikipedia.org/wiki/Tittle>,
<http://en.wikipedia.org/wiki/Serif>). Now, I'm not so naïve as to expect
that HTML and/or CSS should be able to decompose atomic glyphs so that
their constituent typographical parts can be individually styled --
obviously this is a job for a graphic markup language and not a text
markup language. But I think it *is* reasonable to think it might be
possible or desirable that a non-atomic glyph which includes one or more
combining diacritical marks, each with its own code point in the character
set, could have each component individually styleable.

Regards,
Tristan

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you


Reply With Quote
  #6  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: Combining diacritical marks and HTML+CSS - 11-13-2007 , 06:32 AM



Scripsit Tristan Miller:

Quote:
http://www.unicode.org/mail-arch/uni...-m12/0379.html

That page is password-protected.
It's pseudo-protection: the password is announced on the Unicode pages, and
it is "unicode".

Quote:
The bottom line is that no, you can't expect to be able to do such
things.

I suspected as much, but (except in the case of compatibility
equivalent characters) you haven't provided any normative reason for
this.
There is no normative reason, even for compatibility equivalent characters.
Programs are allowed to treat a precomposed character differently from its
decomposed form, though they should generally not be expected to do so.

Consider the display issue. A non-supporting (though conforming)
implementation can just ignore combining diacritic marks, showing a generic
glyph of unrepresentable character. A simplistic implementation effectively
just overprints the diacritic, as taken from a font, on the base character.
A better implementation takes into account the shape of the base character
and positions, for example, a diacritic on "O" differently from the same
diacritic on "o". An even better implementation additionally checks whether
the combination exists as a precomposed character (or just as a glyph in a
font) and uses it when possible, since a glyph designed by a font designer
should be expected to be as good as or better than a combination of glyphs
generated by software. This is the intended behavior - but not required.

Quote:
Are you saying that this "bottom line" is simply an
implementation choice?
Yes, but not necessarily just a matter of lazyness. It might appear to be
simple to let diacritics to be colored separately, but then the effects
would depend on the use of precomposed forms (where no such coloring would
be applied). Moreover, treating colors in an ad hoc manner would not be that
natural, and letting _any_ font formatting apply to diacritics would open a
few cans of worms. For example, font size and weight changes might have
rather odd effects and would generally ruin the work of a sophisticated
algorithm that tries to place a diacritic optimally.

If you want to play with colored diacritics, you could use a spacing
diacritic (either as a separately coded character or as a no-break space
followed by a combining diacritic), which can be colored, and use some piece
of CSS to make it overprint the preceding character. This gets tricky of
course, and nasty - you would effectively imitate the simplistic
implementation of combining diacritics (as described above). There's no sure
way of getting even the horizontal position right, since there is no CSS
unit for the width of a character.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
  #7  
Old   
Andreas Prilop
 
Posts: n/a

Default Re: Combining diacritical marks and HTML+CSS - 11-13-2007 , 12:35 PM



On Mon, 12 Nov 2007, Tristan Miller wrote:

Quote:
Is it possible using HTML and CSS to represent a combining diacritical mark
in a different style from the letter it modifies?
This question is not specific to HTML or CSS. In fact, you could try it
with any word processor. It is rather a question of font technology.
Even if you write, say, í as ASCII i followed by a non-spacing,
combining acute, OpenType fonts and similar font formats will
use a single glyph for display. This is usually desired - think of
capital and small letters! There is only one non-spacing, combining
acute for both capital and small letters. And you don't want
the acute to mess with the dot on the i .

Only when no single glyph is available (say, b with acute),
then the two glyphs for b and acute are combined. See
http://www.unics.uni-hannover.de/nht...ombimarks.html
for some examples.

The situation is different for Arabic and Indic scripts where
no precomposed glyphs for letter and vowel sign are available.
Browsers behave differently here. See
http://www.unics.uni-hannover.de/nht...arks-indic.htm
for some examples.

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.