Full Page History:

Oldest known version of this page was edited on 2005-10-08 14:18:08 by Jakub81 [revert]

Page view:

Unicode: UTF 8

UTF-8 is currently standardized as RFC 3629 (UTF-8, a transformation format of ISO 10646).

In summary, a Unicode character's bits are divided into several groups, which are then divided among the lower bit positions inside the UTF-8 bytes.

Characters below 128dec are encoded with a single byte that contains their value: these correspond exactly to the 128 7-bit ASCII characters.

For details, please look at http://en.wikipedia.org/wiki/UTF-8

Content Type and Charset via Apache htaccess

To serve your static HTML pages as UTF, you can tweak Apache configuration files (if you have access to them) thus:

AddType 'text/html; charset=utf-8' html

If you serve your pages as application/xhtml+xml, then adding the charset is just as easy. An example:

AddType 'application/xhtml+xml; charset=utf-8' xhtml

Apache Defaults

Unfortunately, Apache default is ISO Latin, which is good only for an inexhaustive subset of the English-speaking world, but not for anyone else. If you allow user-submitted content, such as comments, UTF8 is the way to go.

Headers and Metainformation

Server-sent headers take precedence over character sets specified on the page, so if you have access to server configuration via .htaccess, you should avoid using meta tags. Furthermore, if you serve your pages as application/*, then it's illegal to add such meta tags.

If you don't have access to server configuration, it's time to change your hosting service.

Categories:

CategoryTechnical

Backlinks