All Topics  
Bi-directional text

 

   Email Print
   Bookmark   Link






 

Bi-directional text



 
 
Bi-directional text is used as some writing system
Writing system

A writing system is a type of symbolic system used to represent elements or statements expressible in language....
s of the world, notably the Arabic
Arabic alphabet

The Arabic alphabet is the writing system used for writing several languages of Asia and Africa, such as Arabic language, Persian language, and Urdu language....
 (including variants such as Nasta'liq
Nasta'liq script

is one of the main genres of Persian calligraphy. It was developed in Iran in the 14th and 15th centuries. Although it is sometimes used to write Arabic text , it has always been more popular in the Persian, Turkic, and South Asian spheres of influence....
), Persian and Hebrew
Hebrew alphabet

The Hebrew alphabet consists of 22 letters used for writing the Hebrew language. Five of these letters have a different form when appearing as the last letter in a word....
 scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by most languages in the world. When LTR text is mixed with RTL in the same paragraph, each type of text should be written in its own direction, which is known as bi-directional text.






Discussion
Ask a question about 'Bi-directional text'
Start a new discussion about 'Bi-directional text'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Bi-directional text is used as some writing system
Writing system

A writing system is a type of symbolic system used to represent elements or statements expressible in language....
s of the world, notably the Arabic
Arabic alphabet

The Arabic alphabet is the writing system used for writing several languages of Asia and Africa, such as Arabic language, Persian language, and Urdu language....
 (including variants such as Nasta'liq
Nasta'liq script

is one of the main genres of Persian calligraphy. It was developed in Iran in the 14th and 15th centuries. Although it is sometimes used to write Arabic text , it has always been more popular in the Persian, Turkic, and South Asian spheres of influence....
), Persian and Hebrew
Hebrew alphabet

The Hebrew alphabet consists of 22 letters used for writing the Hebrew language. Five of these letters have a different form when appearing as the last letter in a word....
 scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by most languages in the world. When LTR text is mixed with RTL in the same paragraph, each type of text should be written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used.

Many computer programs fail to display bi-directional text correctly. For example, the Hebrew name Sarah should be spelled shin resh heh from right to left. Some Web browser
Web browser

A Web browser is a application software which enables a user to display and interact with text, images, videos, music, games and other information typically located on a Web page at a website on the World Wide Web or a local area network....
s may display the Hebrew text in this article in the opposite direction.

Languages using bi-directional text

There are very few scripts
Writing system

A writing system is a type of symbolic system used to represent elements or statements expressible in language....
 that can be written in either direction.

Such was the case with Egyptian
Egyptian language

Egyptian is a branch of the Afro-Asiatic languages language family along with the Chadic languages, Berber languages, Semitic languages, Cushitic languages and possibly Omotic languages languages....
 hieroglyphics, where the signs had a distinct "head" that faced the beginning of a line and "tail" that faced the end.

Chinese characters can also be written in either direction, especially in signs (but the orientation of the individual characters is never changed). This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear - that is, from right to left on the right side of the bus, and from left to right on the left side of the bus.

Another variety of writing style, called boustrophedon
Boustrophedon

Boustrophedon , is an ancient way of writing manuscripts and other inscriptions.Rather than going from left to right as in modern English language, or right to left as in Hebrew language and Arabic language, alternate lines must be read in opposite directions....
,
was used in some ancient Greek
Greek language

Greek is an Indo-European languages native to the southern Balkan peninsula, the language of the Greek people. It forms an independent branch within Indo-European....
 inscriptions, Tuareg
Tuareg

The Tuareg are a nomadic pastoralist people. They are the principal inhabitants of the Saharan interior of North Africa. They call themselves variously Kel Tamasheq or Kel Tamajaq , Imuhagh, Imazaghan or Imashaghen , or Kel Tagelmust, i.e., "People of the Veil"....
, and Hungarian runes. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.

Unicode support

Bidirectional script support is the capability of a computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 system to correctly display bi-directional text. The term is often shortened to the jargon
Jargon

Jargon is terminology which has been especially defined in relationship to a specific activity, profession, or group. In other words, the term covers the language used by people who work in a particular area or who have a common interest....
 term BiDi or bidi.

Early computer installations were designed only to support a single writing system
Writing system

A writing system is a type of symbolic system used to represent elements or statements expressible in language....
, typically for left-to-right scripts based on the Latin alphabet
Latin alphabet

The Latin alphabet, also called the Roman alphabet, is the most widely used alphabetic writing system in the world today. It evolved from the western variety of the Greek alphabet called the Cumae alphabet, and was initially developed by the Ancient Romes to write the Latin....
 only. Adding new character sets and character encoding
Character encoding

A character encoding system consists of a code that pairs a sequence of character from a given character set with something else, such as a sequence of natural numbers, octet or electrical pulses, in order to facilitate the transmission of data through telecommunication networks and/or Computer data storage of Character in compute...
s enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic
Arabic alphabet

The Arabic alphabet is the writing system used for writing several languages of Asia and Africa, such as Arabic language, Persian language, and Urdu language....
 or Hebrew
Hebrew alphabet

The Hebrew alphabet consists of 22 letters used for writing the Hebrew language. Five of these letters have a different form when appearing as the last letter in a word....
, and mixing the two was not practical. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix scripts from different scripts on the same page, regardless of writing direction.

In particular, the Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

In Unicode encoding, all non-punctuation characters
Character (computing)

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written language form of a natural language....
 are stored in writing order. This means that the writing direction of characters is stored within the characters. If this is the case, the character is called "strong". Punctuation characters however, can appear in both LTR and RTL languages. They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed. Sometimes (in mixed-directions text) this leads to display errors, caused by the bidi-algorithm that runs through the text and identifies LTR and RTL strong characters and assigns a direction to weak characters, according to the algorithm's rules.

In the algorithm, each sequence of concatenated strong characters is called a "run". A weak character that is located between two strong characters with the same orientation will inherit their orientation. A weak character that is located between two strong characters with a different writing direction, will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL). If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. To correct or prevent these errors, you can use "pseudo-strong" characters. These Unicode control characters
Unicode control characters

Many characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters....
 are called "marks". The mark (U+200E LTR or U+200F RTL) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to have the trademark symbol ™ (TM; U+2122) for an English name brand (LTR) in an Arabic (RTL) passage display correctly, you need to add an LTR mark after the trademark symbol if the symbol is not followed by LTR text. This is because if you do not add the LTR mark, the weak character ™ will be neighboured by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order.

See also

  • Internationalization and localization
    Internationalization and localization

    In computing, internationalization and localization are means of adapting computer software to different languages and regional differences. Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes....
  • Horizontal and vertical writing in East Asian scripts
    Horizontal and vertical writing in East Asian scripts

    Many East Asian scripts can be written horizontally or vertically. The Chinese character, Japanese writing system and Hangul scripts can be oriented in either direction, while the traditional Mongolian language and its offshoots are written vertically....
  • Writing system
    Writing system

    A writing system is a type of symbolic system used to represent elements or statements expressible in language....
     (section on directionality)


External links

  • The Bidirectional Algorithm
  • - includes examples and good explanations
  • A free implementation of the Unicode bidirectional algorithm
  • International Components for Unicode
    International Components for Unicode

    International Components for Unicode is an open source project of mature C /C++ and Java libraries for Unicode support, software internationalization and software globalization....
     contains an implementation of the bidirectional algorithm — along with other internationalization services
  • A small and fast bidirectional reordering algorithm that works pretty good, but not necessarily compliant to the Unicode algorithm
  • Working group for supporting BiDi in Free Software
    Free software

    Free Software or software libre is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with minimal restrictions only to ensure that further recipients can also do these things and to prevent consumer-facing hardware...
    . Contains several links to readings and implementation regarding BiDi in computer
    Computer

    A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
     systems.