ISO/IEC 8859-1:1998,
Information technology — 8-bitThe first widely adopted 8-bit microprocessor was the Intel 8080, being used in many hobbyist computers of the late 1970s and early 1980s, often running the CP/M operating system. The Zilog Z80 and the Motorola 6800 were also used in similar computers...
single-byteThe byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
coded graphic characterIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
sets — Part 1: Latin alphabet No. 1, is part of the
ISO/IEC 8859ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12...
series of
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
-based standard
character encodingA character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
s, first edition published in 1987. It is informally referred to as
Latin-1. It is generally intended for “Western European” languages (see below for a list). It is by far the most popular 8-bit character set in the world, and virtually every character set in modern use shares some similarity to it (for instance it defines the first 256 code point assignments in
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
).
ISO-8859-1 is the
IANAThe Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...
preferred charset name for this standard when supplemented with the
C0 and C1 control codesMost character encodings, in addition to representing printable characters, may also represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received...
from ISO/IEC 6429. The following other aliases are registered for ISO-8859-1:
ISO_8859-1,
iso-ir-100,
csISOLatin1,
latin1,
l1,
IBM819,
CP819.
The
Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (
hexIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
80 to 9F), where the little-used C1 controls are replaced with additional
charactersIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
. Windows-28591 is the actual ISO-8859-1 codepage.
Coverage
ISO 8859-1 encodes what it refers to as "
Latin alphabetThe Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
no. 1," consisting of 191
charactersIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
from the Latin
scriptA writing system is a symbolic system used to represent elements or statements expressible in language.-General properties:Writing systems are distinguished from other possible symbolic communication systems in that the reader must usually understand something of the associated spoken language to...
. This character-encoding scheme is used throughout The
AmericasThe Americas, or America , are lands in the Western hemisphere, also known as the New World. In English, the plural form the Americas is often used to refer to the landmasses of North America and South America with their associated islands and regions, while the singular form America is primarily...
,
Western EuropeWestern Europe is a loose term for the collection of countries in the western most region of the European continents, though this definition is context-dependent and carries cultural and political connotations. One definition describes Western Europe as a geographic entity—the region lying in the...
,
OceaniaOceania is a region centered on the islands of the tropical Pacific Ocean. Conceptions of what constitutes Oceania range from the coral atolls and volcanic islands of the South Pacific to the entire insular region between Asia and the Americas, including Australasia and the Malay Archipelago...
, and much of
AfricaAfrica is the world's second largest and second most populous continent, after Asia. At about 30.2 million km² including adjacent islands, it covers 6% of the Earth's total surface area and 20.4% of the total land area...
. It is also commonly used in most standard romanizations of East-Asian languages.
Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):
Languages with complete coverage
- Afrikaans
- Albanian
Albanian is an Indo-European language spoken by approximately 7.6 million people, primarily in Albania and Kosovo but also in other areas of the Balkans in which there is an Albanian population, including western Macedonia, southern Montenegro, southern Serbia and northwestern Greece...
- Basque
Basque is the ancestral language of the Basque people, who inhabit the Basque Country, a region spanning an area in northeastern Spain and southwestern France. It is spoken by 25.7% of Basques in all territories...
- Breton
Breton is a Celtic language spoken in Brittany , France. Breton is a Brythonic language, descended from the Celtic British language brought from Great Britain to Armorica by migrating Britons during the Early Middle Ages. Like the other Brythonic languages, Welsh and Cornish, it is classified as...
- Catalan
Catalan is a Romance language, the national and only official language of Andorra and a co-official language in the Spanish autonomous communities of Catalonia, the Balearic Islands and Valencian Community, where it is known as Valencian , as well as in the city of Alghero, on the Italian island...
- Danish
Danish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...
- English
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria... (UK and US)
- Faroese
Faroese , is an Insular Nordic language spoken by 48,000 people in the Faroe Islands and about 25,000 Faroese people in Denmark and elsewhere...
- Galician
Galician is a language of the Western Ibero-Romance branch, spoken in Galicia, an autonomous community located in northwestern Spain, where it is co-official with Castilian Spanish, as well as in border zones of the neighbouring territories of Asturias and Castile and León.Modern Galician and...
|
German German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....
IcelandicIcelandic is a North Germanic language, the main language of Iceland. Its closest relative is Faroese.Icelandic is an Indo-European language belonging to the North Germanic or Nordic branch of the Germanic languages. Historically, it was the westernmost of the Indo-European languages prior to the...
IrishIrish , also known as Irish Gaelic, is a Goidelic language of the Indo-European language family, originating in Ireland and historically spoken by the Irish people. Irish is now spoken as a first language by a minority of Irish people, as well as being a second language of a larger proportion of... (new orthographyIrish orthography has evolved over many centuries, since Old Irish was first written down in the Latin alphabet in about the 6th century AD. Prior to that, Primitive Irish was written in Ogham... )
ItalianItalian is a Romance language spoken mainly in Europe: Italy, Switzerland, San Marino, Vatican City, by minorities in Malta, Monaco, Croatia, Slovenia, France, Libya, Eritrea, and Somalia, and by immigrant communities in the Americas and Australia...
LatinLatin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and... (basic classical orthography)
LeoneseThe Leonese language is the endonym term used to refer to all vernacular Romance dialects of the Astur-Leonese linguistic group in the Spanish provinces of León and Zamora; Astur-Leonese also includes the dialects...
LuxembourgishLuxembourgish is a High German language spoken mainly in Luxembourg. About 320,000 people worldwide speak Luxembourgish.-Language family:... (basic classical orthography)
NorwegianNorwegian is a North Germanic language spoken primarily in Norway, where it is the official language. Together with Swedish and Danish, Norwegian forms a continuum of more or less mutually intelligible local and regional variants .These Scandinavian languages together with the Faroese language... (Bokmål and Nynorsk) |
Occitan
PortuguesePortuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...
Rhaeto-Romanic
Scottish Gaelic
SpanishSpanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...
SwahiliSwahili or Kiswahili is a Bantu language spoken by various ethnic groups that inhabit several large stretches of the Mozambique Channel coastline from northern Kenya to northern Mozambique, including the Comoro Islands. It is also spoken by ethnic minority groups in Somalia...
SwedishSwedish is a North Germanic language, spoken by approximately 10 million people, predominantly in Sweden and parts of Finland, especially along its coast and on the Åland islands. It is largely mutually intelligible with Norwegian and Danish...
WalloonWalloon is a Romance language which was spoken as a primary language in large portions of the Walloon Region of Belgium and some villages of Northern France until the middle of the 20th century. It belongs to the langue d'oïl language family, whose most prominent member is the French language...
|
Languages commonly supported but with incomplete coverage
| Language | Missing characters | Typical workaround | Supported by |
DutchDutch is a West Germanic language and the native language of the majority of the population of the Netherlands, Belgium, and Suriname, the three member states of the Dutch Language Union. Most speakers live in the European Union, where it is a first language for about 23 million and a second... |
IJThe IJ is the digraph of the letters i and j. Occurring in the Dutch language, it is sometimes considered a ligature, or even a letter in itselfalthough in most fonts that have a separate character for ij the two composing parts are not connected, but are separate glyphs, sometimes slightly... , ij |
digraphsA digraph or digram is a pair of characters used to write one phoneme or a sequence of phonemes that does not correspond to the normal values of the two characters combined... IJ, ij |
|
| Estonian Estonian is the official language of Estonia, spoken by about 1.1 million people in Estonia and tens of thousands in various émigré communities... |
Š The grapheme Š, š is used in various contexts, usually denoting the voiceless postalveolar fricative. In the International Phonetic Alphabet this sound is denoted with , but the lowercase š is used in the Americanist phonetic notation, as well as in the Uralic Phonetic Alphabet.For use in computer... , š, ŽThe grapheme Ž is formed from Latin Z with the addition of caron . It is used in various contexts, usually denoting the voiced postalveolar fricative, a sound similar to English g in mirage, or Portuguese and French j... , ž (only present in loanwords) |
Sh, sh, Zh, zh |
ISO-8859-15 ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9... , Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
|
FinnishFinnish is the language spoken by the majority of the population in Finland Primarily for use by restaurant menus and by ethnic Finns outside Finland. It is one of the two official languages of Finland and an official minority language in Sweden. In Sweden, both standard Finnish and Meänkieli, a... |
Š The grapheme Š, š is used in various contexts, usually denoting the voiceless postalveolar fricative. In the International Phonetic Alphabet this sound is denoted with , but the lowercase š is used in the Americanist phonetic notation, as well as in the Uralic Phonetic Alphabet.For use in computer... , š, ŽThe grapheme Ž is formed from Latin Z with the addition of caron . It is used in various contexts, usually denoting the voiced postalveolar fricative, a sound similar to English g in mirage, or Portuguese and French j... , ž (only present in loanwords) |
Sh, sh, Zh, zh |
ISO-8859-15 ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9... , Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
|
FrenchFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts... |
ŒŒ œŒ is a Latin alphabet grapheme, a ligature of o and e. In medieval and early modern Latin, it was used to represent the Greek diphthong οι, a usage which continues in English and French... , œ, and the very rare ŸY is the twenty-fifth letter in the basic modern Latin alphabet and represents either a vowel or a consonant in English.-Name:In Latin, Y was named Y Graeca "Greek Y". This was pronounced as I Graeca "Greek I", since Latin speakers had trouble pronouncing , which was not a native sound... |
digraphsA digraph or digram is a pair of characters used to write one phoneme or a sequence of phonemes that does not correspond to the normal values of the two characters combined... OE, oe, and Y without the diaeresis |
ISO-8859-15 ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9... , Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
|
HungarianHungarian is a Uralic language, part of the Ugric group. With some 14 million speakers, it is one of the most widely spoken non-Indo-European languages in Europe.... |
ŐO is the fifteenth letter and a vowel in the basic modern Latin alphabet.The letter was derived from the Semitic `Ayin , which represented a consonant, probably , the sound represented by the Arabic letter ع called `Ayn. This Semitic letter in its original form seems to have been inspired by a... , ő, ŰU is the twenty-first letter and a vowel in the basic modern Latin alphabet.-History:The letter U ultimately comes from the Semitic letter Waw by way of the letter Y. See the letter Y for details.... , ű |
|
ISO-8859-2 ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally... , Windows-1250Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Bosnian, Croatian, Serbian , Romanian and Albanian...
|
| Irish Irish , also known as Irish Gaelic, is a Goidelic language of the Indo-European language family, originating in Ireland and historically spoken by the Irish people. Irish is now spoken as a first language by a minority of Irish people, as well as being a second language of a larger proportion of... (traditional orthography) |
Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṡ, ṡ, Ṫ, ṫ |
Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Sh, sh, Th, th |
ISO-8859-14 ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 , is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic...
|
LatinLatin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and... with macronA macron, from the Greek , meaning "long", is a diacritic placed above a vowel . It was originally used to mark a long or heavy syllable in Greco-Roman metrics, but now marks a long vowel... s |
ĀA is the first letter and a vowel in the basic modern Latin alphabet. It is similar to the Ancient Greek letter Alpha, from which it derives.- Origins :... , ā, ĒE is the fifth letter and a vowel in the basic modern Latin alphabet. It is the most commonly used letter in the Czech, Danish, Dutch, English, French, German, Hungarian, Latin, Norwegian, Spanish, and Swedish languages.-History:... , ē, ĪI is the ninth letter and a vowel in the basic modern Latin alphabet.-History:In Semitic, the letter may have originated in a hieroglyph for an arm that represented a voiced pharyngeal fricative in Egyptian, but was reassigned to by Semites, because their word for "arm" began with that sound... , ī, ŌO is the fifteenth letter and a vowel in the basic modern Latin alphabet.The letter was derived from the Semitic `Ayin , which represented a consonant, probably , the sound represented by the Arabic letter ع called `Ayn. This Semitic letter in its original form seems to have been inspired by a... , ō, ŪU is the twenty-first letter and a vowel in the basic modern Latin alphabet.-History:The letter U ultimately comes from the Semitic letter Waw by way of the letter Y. See the letter Y for details.... , ū |
|
|
MāoriMāori or te reo Māori , commonly te reo , is the language of the indigenous population of New Zealand, the Māori. It has the status of an official language in New Zealand... |
ĀA is the first letter and a vowel in the basic modern Latin alphabet. It is similar to the Ancient Greek letter Alpha, from which it derives.- Origins :... , ā, ĒE is the fifth letter and a vowel in the basic modern Latin alphabet. It is the most commonly used letter in the Czech, Danish, Dutch, English, French, German, Hungarian, Latin, Norwegian, Spanish, and Swedish languages.-History:... , ē, ĪI is the ninth letter and a vowel in the basic modern Latin alphabet.-History:In Semitic, the letter may have originated in a hieroglyph for an arm that represented a voiced pharyngeal fricative in Egyptian, but was reassigned to by Semites, because their word for "arm" began with that sound... , ī, ŌO is the fifteenth letter and a vowel in the basic modern Latin alphabet.The letter was derived from the Semitic `Ayin , which represented a consonant, probably , the sound represented by the Arabic letter ع called `Ayn. This Semitic letter in its original form seems to have been inspired by a... , ō, ŪU is the twenty-first letter and a vowel in the basic modern Latin alphabet.-History:The letter U ultimately comes from the Semitic letter Waw by way of the letter Y. See the letter Y for details.... , ū |
Ä "Ä" and "ä" are both characters that represent either a letter from several extended Latin alphabets, or the letter A with an umlaut mark or diaeresis.- Independent letter :... , ä, Ëis a letter in the Albanian, Ripuarian, Uyghur Latin Script, Ladin, and Kashubian languages. This letter also appears in Afrikaans, Dutch, French, Abruzzese dialect , and Luxembourgish language as a variant of letter "e"... , ë, Ï', lowercase ', is a symbol used in various languages written with the Latin alphabet and in Ukrainian language which is written with the Cyrillic based Ukrainian alphabet; it can be read as the letter I with diaeresis or I-umlaut.... , ï, Õ"Õ", or "õ" is a composition of the Latin letter O with the diacritic mark tilde.The HTML entity is Õ for Õ and õ for õ.-Estonian:... , ö, ÜÜ, or ü, is a character which can be either a letter from several extended Latin alphabets, or the letter U with an umlaut or a diaeresis... , ü |
ISO-8859-13 ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim... , Windows-1257Windows-1257 is a single byte code page used to support the Estonian, Latvian and Lithuanian languages under Microsoft Windows. This code page is similar in layout to ISO 8859-13, but they differ in codepoints A1, A5, B4, FF, and of course in the range 80–9F, which is typically allocated with...
|
WelshWelsh is a member of the Brythonic branch of the Celtic languages spoken natively in Wales, by some along the Welsh border in England, and in Y Wladfa... |
ŴW is the 23rd letter in the basic modern Latin alphabet.In other Germanic languages, including German, its pronunciation is similar or identical to that of English V... , ŵ, ŶY is the twenty-fifth letter in the basic modern Latin alphabet and represents either a vowel or a consonant in English.-Name:In Latin, Y was named Y Graeca "Greek Y". This was pronounced as I Graeca "Greek I", since Latin speakers had trouble pronouncing , which was not a native sound... , ŷ |
|
ISO-8859-14 ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 , is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic...
|
Quotation marks
For some languages listed above the correct typographical quotation marks are missing, as only « », " ", and ' ' are included. Also this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks, however this is not considered part of the modern standard.
History
ISO 8859-1 was based on the
Multinational Character SetThe Multinational Character Set is a character encoding created by Digital Equipment Corporation for use in the popular VT220 terminal. It was an 8-bit extension of ASCII that added accented characters, currency symbols, and other character glyphs missing from 7-bit ASCII...
used by
Digital Equipment CorporationDigital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
in the popular
VT220The VT220 was a terminal produced by Digital Equipment Corporation from 1983 to 1987.-Hardware:The VT220 improved on the earlier VT100 series of terminals with a redesigned keyboard, much smaller physical packaging, and a much faster microprocessor...
terminal. It was developed within ECMA, the
European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known.
The
second edition of ECMA-94 (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.
In 1985
CommodoreCommodore is the commonly used name for Commodore Business Machines , the U.S.-based home computer manufacturer and electronics manufacturer headquartered in West Chester, Pennsylvania, which also housed Commodore's corporate parent company, Commodore International Limited...
adopted officially for its new
AmigaOSAmigaOS is the default native operating system of the Amiga personal computer. It was developed first by Commodore International, and initially introduced in 1985 with the Amiga 1000...
operating system
ANSIAnsi is a village in Kaarma Parish, Saare County, on the island of Saaremaa, Estonia....
/ISO8859-1 layout for its codepage and all internal operations in order to refer to international approved standards rather than proprietary standards, as it happened in those times with
MS-DOSMS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...
, and
Mac OSMac OS is a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems. The Macintosh user experience is credited with popularizing the graphical user interface...
and thus this standard was also used for manufacturing the keyboard layout of
Amiga 1000The A1000, or Commodore Amiga 1000, was Commodore's initial Amiga personal computer, introduced on July 23, 1985 at the Lincoln Center in New York City....
computer that was launched in July 1985. All versions of Amiga OS up to 3.1 used ISO8859-1. Since the demise of Commodore International in 1994 all further versions of AmigaOS (3.5, 3.9) continued to have ISO8859-1 codepage set enhanced with Euro Currency character, but without a leading firm capable to impose official standards, both Amiga and its clone variants (
MorphOSMorphOS is an Amiga-compatible computer operating system. It is a mixed proprietary and open source OS produced for the Pegasos PowerPC processor based computer, PowerUP accelerator equipped Amiga computers, and a series of Freescale development boards that use the Genesi firmware, including the...
,
AROSAros may refer to:*Aros , a river in J. R. R. Tolkien's Middle-earth legendarium*AROS Research Operating System, a free software implementation of AmigaOS* Aros, the original Viking name of Aarhus, the second largest city in Denmark...
) did not update officially to ISO 8859-15 neither follow a common approach in the introduction of Euro character in 2001. MorphOS 2.0 and further versions are UNICODE UTF-8 compliant.
In 1992, the
IANAThe Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...
registered the character map
ISO_8859-1:1987, more commonly known by its preferred
MIMEMultipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...
name of
ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the
InternetThe Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
. This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters via every possible 8-bit value.
ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (however the draft
HTML 5HTML5 is a language for structuring and presenting content for the World Wide Web, and is a core technology of the Internet originally proposed by Opera Software. It is the fifth revision of the HTML standard and is still under development...
specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.) It is the default encoding of the values of certain descriptive HTTP headers, and defines the repertoire of characters allowed in
HTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
3.2 documents (HTML 4.0, however, is based on
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
). It and Windows-1252 are often assumed to be the encoding of text on
UnixUnix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
and
Microsoft WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
in the absence of locale or other information, this is only gradually being replaced with Unicode encoding such as
UTF-8UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
or UTF-16.
Codepage layout
>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
]]|125|175}}
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|-
!
| |
| |
| |
| |
| |
| |
| |
||
| |
| |
| |
| |
| |
| |
| |
| |
|-
!
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|-
!
| |
| |
| |
| |
| |
| |
| |
||
| |
| |
| |
| |
| |
| |
| |
| |
|}
Similar character sets
ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
.
The lower range 32 to 126 (
hexIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
20 to 7E, the G0 subset) maps exactly to the same coded G0 subset of the ISO 646 US variant (commonly known as
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
), whose ISO 2022 standard switch sequence is "
ESC ( B". The higher range 160 to 255 (
hexIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
A0 to FF, the G1 subset) maps exactly to the same subset initiated by the ISO 2022 standard switch sequence "
ESC . A".
ISO/IEC 8859-1 is missing some characters for French and Finnish text and the
euro signThe euro sign is the currency sign used for the euro, the official currency of the Eurozone in the European Union . The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code for the euro is EUR...
. In order to provide some of these characters,
ISO/IEC 8859-15ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9...
was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently-used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.
The popular
Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
character set adds all the missing characters provided by ISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely-used C1 controls in the range 128 to 159 (
hexIn mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
80 to 9F). It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really
Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
encoded. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content.
The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European
desktop publishingDesktop publishing is the creation of documents using page layout software on a personal computer.The term has been used for publishing at all levels, from small-circulation documents such as local newsletters to books, magazines and newspapers...
. It is a superset of
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the
generic currency signThe currency sign is a character used to denote a currency, when the symbol for a particular currency is unavailable. It is particularly common in place of symbols, such as that of the Colón , which are absent from most character sets and fonts...
¤ with the
euro signThe euro sign is the currency sign used for the euro, the official currency of the Eurozone in the European Union . The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code for the euro is EUR...
€. The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of
Internet Explorer for MacInternet Explorer for Mac was a proprietary web browser developed by Microsoft for the Macintosh platform. Initial versions were developed from the same code base as Internet Explorer for Windows...
). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman.
DOS had
code page 850Code page 850 is a code page used under MS-DOS in Western Europe. It is the code page commonly used by the version of MS-DOS underlying Windows ME...
, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used
graphic characterIn ISO/IEC 646 and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans...
s from
code page 437IBM PC or MS-DOS code page 437 is the character set of the original IBM PC. It is also known as CP 437, OEM 437, PC-8, MS-DOS Latin US or sometimes misleadingly referred to as the OEM font, High ASCII or Extended ASCII....
.
See also
- ISO/IEC 8859-15
ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9...
– a derivative of ISO-8859-1
- Latin characters in Unicode
- Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
- Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...
- UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
External links