HighDots Forums  

Character encoding

HTML Writing HTML for the Web (comp.infosystems.www.authoring.html)


Discuss Character encoding in the HTML forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Mambo Bananapatch
 
Posts: n/a

Default Character encoding - 04-25-2008 , 08:50 PM






I'm preparing a site for a client which includes several pages
containing Cyrillic characters. I used the UTF-8 charset, but the
Cyrillic characters appeared as question marks (and, oddly, some
Chinese characters as well.) I tried every Cyrillic charset I could
find and nothing worked.

I usually just hand-code all my PHP and HTML, but I swallowed hard and
went to Dreamweaver CS3, searched around, and found that I could set
each file's encoding to UTF-8 using the Modify => Page Properties =>
Title/Encoding command.

Now it works fine, but I don't really understand what the command did.
It didn't add any code, and it didn't change the http-equiv tag. In
fact, I have to perform the command on every file that is included in
the PHP file.

So: a) what exactly did Dreamweaver do, and b) how could I have hand-
coded whatever it is?

Thank you in advance.

(Also posted in alt.html -- my apologies if I've violated etiquette.)

Reply With Quote
  #2  
Old   
Martin Honnen
 
Posts: n/a

Default Re: Character encoding - 04-26-2008 , 06:53 AM






Mambo Bananapatch wrote:
Quote:
I'm preparing a site for a client which includes several pages
containing Cyrillic characters. I used the UTF-8 charset, but the
Cyrillic characters appeared as question marks (and, oddly, some
Chinese characters as well.) I tried every Cyrillic charset I could
find and nothing worked.

Quote:
So: a) what exactly did Dreamweaver do, and b) how could I have hand-
coded whatever it is?
Well it all depends on what exactly you do when you say "I used the
UTF-8 charset" or "I tried every Cyrillic charset"? Have you used an
editor that supports saving as UTF-8 (or a Cyrillic charset) and have
you used it so that it saved your documents as UTF-8 (or a Cyrillic
charset)? That is all what you need to do to ensure your files are
properly encoded. Then, when serving them over HTTP you need to make
sure the server sends a HTTP Content-Type response header indicating the
used charset as a paramter e.g.
Content-Type: text/html; charset=UTF-8

--

Martin Honnen
http://JavaScript.FAQTs.com/


Reply With Quote
  #3  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: Character encoding - 04-26-2008 , 11:39 AM



Scripsit Mambo Bananapatch:

Quote:
(Also posted in alt.html -- my apologies if I've violated etiquette.)
Oh, you'll just be ignored in the sequel. No problem.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/


Reply With Quote
  #4  
Old   
Mambo Bananapatch
 
Posts: n/a

Default Re: Character encoding - 04-27-2008 , 03:20 PM



On Apr 26, 7:53 am, Martin Honnen <mahotr... (AT) yahoo (DOT) de> wrote:
Quote:
Mambo Bananapatch wrote:
I'm preparing a site for a client which includes several pages
containing Cyrillic characters. I used the UTF-8 charset, but the
Cyrillic characters appeared as question marks (and, oddly, some
Chinese characters as well.) I tried every Cyrillic charset I could
find and nothing worked.
So: a) what exactly did Dreamweaver do, and b) how could I have hand-
coded whatever it is?

Well it all depends on what exactly you do when you say "I used the
UTF-8 charset" or "I tried every Cyrillic charset"? Have you used an
editor that supports saving as UTF-8 (or a Cyrillic charset) and have
you used it so that it saved your documents as UTF-8 (or a Cyrillic
charset)? That is all what you need to do to ensure your files are
properly encoded. Then, when serving them over HTTP you need to make
sure the server sends a HTTP Content-Type response header indicating the
used charset as a paramter e.g.
Content-Type: text/html; charset=UTF-8

--

Martin Honnen
http://JavaScript.FAQTs.com/
Thanks Martin, that's exactly what I did. Dreamweaver saved the files
with the correct encoding, and I used the response header you
suggested, and all's well.

I guess my question was more about what Dreamweaver did; if I were to
hand-code a page with Cyrillic characters, and didn't have access to
Dreamweaver, how would I encode each file? And why must I encode each
file, in addition to including the UTF-8 Content-Type response
header?

I just wanted to understand what I was doing.

Thanks for your time.

MB


Reply With Quote
  #5  
Old   
Paul Gorodyansky
 
Posts: n/a

Default Re: Character encoding - 04-27-2008 , 10:00 PM



Hello!

You did not really answer Martin's question - what did you do _before_
you decided to use Dreamweaver.
On a non-Russian OS one can get question marks in many cases, for
example:
- typing in an editor such as Notepad and save as "ANSI", that is, in
a character set encoding = system code page
- using copy/paste between Unicode and not-Unicode programs
- converting to UTF-8 without explicitely providing source encoding
and thus system code page is assumed
- etc.

You may want to read some explanations on my site:
- section "for developers: Cyrillic (Russian) in HTML"
- section "for developers: Cyrillic (Russian) in Multilingula HTML -
UTF-8"
- chapter "Copy/Paste; Word, .TXT" in the section
"Unicode and Cyrillic"



--
Regards,
Paul
http://RusWin.net

Reply With Quote
  #6  
Old   
Andreas Prilop
 
Posts: n/a

Default Re: Character encoding - 04-28-2008 , 09:54 AM



On Sun, 27 Apr 2008, Mambo Bananapatch wrote:

Quote:
if I were to
hand-code a page with Cyrillic characters, and didn't have access to
Dreamweaver, how would I encode each file?
You do not write with a pencil, do you? You have some editor
(word-processor, etc.) on some operating system on some computer.
We don't know what they are - but you know. Your editor saves
files in some character set, such as

MacCyrillic
http://www.unics.uni-hannover.de/nhtcapri/cyrillic.mac

ISO-8859-5
http://www.unics.uni-hannover.de/nht...cyrillic.html5

Windows-1251
http://www.unics.uni-hannover.de/nhtcapri/cyrillic.win

Unicode UTF-8
http://www.unics.uni-hannover.de/nht...gual1#cyrillic

Quote:
And why must I encode each
file, in addition to including the UTF-8 Content-Type response
header?
I don't understand what this question means.

--
Top-posting.
What's the most irritating thing on Usenet?


Reply With Quote
  #7  
Old   
David Trimboli
 
Posts: n/a

Default Re: Character encoding - 05-01-2008 , 01:29 PM



Andreas Prilop wrote:
Quote:
On Sun, 27 Apr 2008, Mambo Bananapatch wrote:

if I were to hand-code a page with Cyrillic characters, and didn't
have access to Dreamweaver, how would I encode each file?

And why must I encode each file, in addition to including the UTF-8
Content-Type response header?

I don't understand what this question means.
I wonder if Mambo is confusing file encoding with an http-equiv
declaration in a file.

Mambo, when you save a text file, you're not actually saving letters;
you're saving numbers that correspond to letters. 65="A", and so on.
(Well, yeah, it's actually saved in bits, which are actually electrical
charges...) Your text editor and my browser know how to turn those
numbers into letters to display the file. This mapping of characters to
numbers is the file's "encoding." There are many standard encodings. In
order for my browser to read you file, it needs to know which encoding
you've used; it needs to know what scheme you used to translate letters
into numbers, so that it can use the same scheme to turn numbers back
into letters.

Normally the browser learns what encoding to read by the server's HTTP
headers. An http-equiv declaration in an HTML file is a way to override
a server's content-type (encoding). You only use this if your server
isn't serving files with the correct content-type.

If I'm wrong and you already knew this stuff, I apologize.

--
David
Stardate 8333.3


Reply With Quote
  #8  
Old   
Ben C
 
Posts: n/a

Default Re: Character encoding - 05-01-2008 , 03:02 PM



On 2008-05-01, David Trimboli <david (AT) trimboli (DOT) name> wrote:
[...]
Quote:
Normally the browser learns what encoding to read by the server's HTTP
headers. An http-equiv declaration in an HTML file is a way to override
a server's content-type (encoding).
It doesn't override it-- if both are present, the server header wins.

Quote:
You only use this if your server isn't serving files with the correct
content-type.
Yes, or because you're using file:// urls during development.


Reply With Quote
  #9  
Old   
Andreas Prilop
 
Posts: n/a

Default Re: Character encoding - 05-02-2008 , 08:00 AM



On Thu, 1 May 2008, David Trimboli wrote:

Quote:
An http-equiv declaration in an HTML file is a way to override
a server's content-type (encoding).
No, it is not. See
http://www.unics.uni-hannover.de/nht...a-http-equiv.1
http://www.unics.uni-hannover.de/nht...a-http-equiv.2

--
Bugs in Internet Explorer 7
http://www.unics.uni-hannover.de/nhtcapri/ie7-bugs


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.