HighDots Forums  

Multiple coding systems, and filesystems

HTML Writing HTML for the Web (comp.infosystems.www.authoring.html)


Discuss Multiple coding systems, and filesystems in the HTML forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
gentsquash@gmail.com
 
Posts: n/a

Default Multiple coding systems, and filesystems - 06-03-2008 , 05:08 PM






On some of my course pages, I quote (with attribution)
small sections of Wikipedia and the like. E.g, the top
of
http://en.wiktionary.org/wiki/entropy

has "entropia" in Greek font,

http://en.wikipedia.org/wiki/Goedel

has the o-umlaut from German, and

http://en.wikipedia.org/wiki/Origami

has a Japanese font. What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?

And can the HTML-page be set up so that it will validate?
================================================== ==

Actually, I'm ahead of myself. In the past I've cut&pasted
a snippet from, say, wiki/entropy, into an Emacs buffer,
adjoined a "From Wictionary http://..." and attempted to
save the buffer. Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.

If I'm using multiple coding systems on the same webpage,
do I have to save the different snippets in different files
stored with different coding systems, and then

<!--#include ... -->

each of them into one webpage? Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?

FWIW, my home OS is MacOSX and I need to upload my webpages
to school. The math dept. server is probably running
Unix; when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.

Sincerely,
Prof. Jonathan King (gentsquash)
Mathematics dept, Univ. of Florida

Reply With Quote
  #2  
Old   
Stanimir Stamenkov
 
Posts: n/a

Default Re: Multiple coding systems, and filesystems - 06-04-2008 , 12:37 AM






Tue, 3 Jun 2008 14:08:25 -0700 (PDT), /gentsquash (AT) gmail (DOT) com/:

Quote:
Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?
Files generally store bytes. How these bytes will be interpreted is
up to the application reading them. Characters are encoded into
bytes using different coding schemes which generally are capable of
representing the characters of a specific character set. The
Unicode character set generally contains all possible characters so
if you use some UTF (Unicode Transformation Format) variant you can
have all characters you need encoded in a single entity. So make
sure your text editor supports reading/saving files using UTF-8, for
example.

--
Stanimir


Reply With Quote
  #3  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: Multiple coding systems, and filesystems - 06-04-2008 , 03:14 AM



Scripsit gentsquash (AT) gmail (DOT) com:

Quote:
On some of my course pages, I quote (with attribution)
small sections of Wikipedia and the like. E.g, the top
of
http://en.wiktionary.org/wiki/entropy

has "entropia" in Greek font,
Technically, it has the word in Greek _characters_ (letters). This is
the key issue; fonts are secondary. The page has a style sheet that
makes special suggestions on the font of such words, in a most confusing
and tricky way.

Quote:
What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?
The proper _character encoding_ is UTF-8 in such cases. As soon as you
have Japanese, Greek, and umlaut Latin letters on one page, that's
definitely the best option. If there were just a few "special"
characters, you could present them using entity references like &ouml;
or character references like ą, but this gets clumsy (or requires
suitable software for generating them) if you have full sentences that
consist of "special" characters.

It's not possible (in practice on web pages) to switch the character
encoding in the middle of an HTML document.

Quote:
In the past I've cut&pasted
a snippet from, say, wiki/entropy, into an Emacs buffer,
adjoined a "From Wictionary http://..." and attempted to
save the buffer. Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.
UTF-8, if Emacs can really produce it. The version of Emacs I've been
using does not deal with "special" characters, but I recently looked at
the newest version of Emacs for Windows, and it seems to have an
impressive support to "special" characters.

Note that the server should be configured to send an appropriate HTTP
header. You normally do this by adding something to your .htaccess file,
and in practice you need to use the same encoding for all ".html" files
in a directory (folder), though you could use, for example, ISO-8859-1
for ".html" and UTF-8 for ".htm" files.

Quote:
If I'm using multiple coding systems on the same webpage,
do I have to save the different snippets in different files
stored with different coding systems, and then

!--#include ... --

each of them into one webpage?
No, it won't work that way, even if your server supports SSI includes.
They result in a single document, which can have one encoding only. (I
won't mention <iframe>, because it's really a poor hack for things like
this, but it performs sort-of include where the included document is
displayed "autonomously" inside the main canvas and may have a different
encoding.)

Quote:
FWIW, my home OS is MacOSX and I need to upload my webpages
to school. The math dept. server is probably running
Unix; when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.
A nice mess :-) but it should be manageable when using UTF-8. When
uploading with FTP, use binary (not Ascii) mode, since no character
conversion shall be performed - the data is already in a
system-independent encoding.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
  #4  
Old   
Andreas Prilop
 
Posts: n/a

Default Re: Multiple coding systems, and filesystems - 06-04-2008 , 11:00 AM



On Tue, 3 Jun 2008, gentsquash (AT) gmail (DOT) com wrote:

Quote:
Greek
German
Japanese
What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?
Use Unicode in the encoding ("charset") UTF-8:
http://www.unics.uni-hannover.de/nht...ilingual1.html

Quote:
Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.
Choose UTF-8 for the web.

Quote:
Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?
Yes - with Unicode.

Quote:
when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.
Either use a UTF-8 locale such as

export LC_ALL="en_US.UTF-8"
export LANG="en_US.UTF-8"

or write all non-ASCII characters as character references
&#number;
http://www.unics.uni-hannover.de/nht...ilingual2.html

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell


Reply With Quote
  #5  
Old   
Andreas Prilop
 
Posts: n/a

Default Re: Multiple coding systems, and filesystems - 06-04-2008 , 11:15 AM



On Wed, 4 Jun 2008, Jukka K. Korpela wrote:

Quote:
though you could use, for example, ISO-8859-1
for ".html" and UTF-8 for ".htm" files.
A better idea is to separate content-type and charset.
For example, use "utf8" for UTF-8 and "iso1" for ISO-8859-1.
On Apache, you can write into your .htaccess file:

Options +Multiviews
DefaultType text/html
AddCharset iso-8859-1 iso1
AddCharset utf-8 utf8

Name the files as "mypage.html.iso1" and "anotherpage.html.utf8"
or simply as "mypage.iso1" and "anotherpage.utf8";
and don't forget "stylesheet.css.utf8".

In the URLs, omit ".iso1" and ".utf8" of course:

<a href="mypage.html">
<a href="anotherpage.html">


/* One wonders if you need ISO-8859-1 at all
when you can have documents in UTF-8. */

--
Solipsists of the world - unite!


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.