Unicode and e-mail
Encyclopedia
Many email clients now offer some support for Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 in email
Email
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

 bodies. Most do not send in Unicode by default, as the reader client might not support it, but as time passes, more and more systems are likely to be set up with font
Font
In typography, a font is traditionally defined as a quantity of sorts composing a complete character set of a single size and style of a particular typeface...

s capable of displaying the full range of Unicode characters (or at least the set likely to be of interest to the user).

To use Unicode in email subject lines and email addresses two different standards need to be used to retrofit the handling of non-ASCII data to the originally ASCII-only email protocol:
  • RFC 2047 provides support for encoding non-ASCII values such as real names and subject lines in email headers
  • RFC 3490 provides support for encoding non-ASCII domain names in the Domain Name System
    Domain name system
    The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...


Unicode support in message bodies

As with all encodings apart from US-ASCII, when using Unicode text in email, MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

 must be used to specify that a Unicode transformation format is being used for the text. To use Unicode in email headers, the Unicode text has to be encoded using a MIME "Encoded-Word" with a Unicode encoding as the charset.

UTF-7
UTF-7
UTF-7 is a variable-length character encoding that was proposed for representing Unicode text using a stream of ASCII characters...

, although sometimes considered deprecated, has an advantage over other Unicode encodings in that it does not require a transfer encoding to fit within the seven-bit limits of many legacy Internet mail servers. UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 and UTF-16 on the other hand must be transfer encoded in base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

 or quoted-printable
Quoted-printable
Quoted-printable, or QP encoding, is an encoding using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean...

 to allow safe transmission across seven-bit mail servers (i.e., those that do not advertise 8BITMIME).

Some document formats, such as HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 and Rich Text Format
Rich Text Format
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

 can use 7 bit codes for Unicode characters and can thus be sent without using any special email encodings. E.g. HTML email can use HTML entities to use characters from anywhere in Unicode even if the HTML source text for the email is in a legacy encoding (e.g. 7-bit ASCII). For details of this see Unicode and HTML
Unicode and HTML
Web pages authored using hypertext markup language may contain multilingual text represented with the Unicode universal character set....

. The rest of this article deals with email messages where the actual raw text (whether markup or plain text) is in an encoding that covers the whole of Unicode.

See also

  • Comparison of email clients
  • List of Unicode fonts
  • Free software Unicode fonts

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK