Internationalization and localization

Internationalization and localization

Overview
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, internationalization and localization (also spelled internationalisation and localisation) are means of adapting computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 to different languages, regional differences and technical requirements of a target market. Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by adding locale
Locale
In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface...

-specific components and translating text.

The terms are frequently abbreviated to the numeronym
Numeronym
A numeronym is a number-based word.Most commonly, a numeronym is a word where a number is used to form an abbreviation . Pronouncing the letters and numbers may sound similar to the full word: "K9" for "canine"...

s i18n (where 18 stands for the number of letters between the first i and last n in internationalization, a usage coined at DEC
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

 in the 1970s or 80s) and L10n respectively, due to the length of the words.
Discussion
Ask a question about 'Internationalization and localization'
Start a new discussion about 'Internationalization and localization'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, internationalization and localization (also spelled internationalisation and localisation) are means of adapting computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

 to different languages, regional differences and technical requirements of a target market. Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by adding locale
Locale
In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface...

-specific components and translating text.

The terms are frequently abbreviated to the numeronym
Numeronym
A numeronym is a number-based word.Most commonly, a numeronym is a word where a number is used to form an abbreviation . Pronouncing the letters and numbers may sound similar to the full word: "K9" for "canine"...

s i18n (where 18 stands for the number of letters between the first i and last n in internationalization, a usage coined at DEC
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

 in the 1970s or 80s) and L10n respectively, due to the length of the words. The capital L in L10n helps to distinguish it from the lowercase i in i18n.

Some companies, like IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 and Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

, use the term "globalization" for the combination of internationalization and localization.

Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 defines Internationalization as a combination of World-Readiness and localization. World-Readiness is a developer task, which enables a product to be used with multiple scripts and cultures (globalization) and separating user interface resources in a localizable format (localizability, abbreviated to L12y).

This concept is also known as NLS (National Language Support or Native Language Support).

Nomenclature


The support of multiple languages by computer systems can be considered a continuum between localization ("L10n"), through multilingualization (or "m17n"), to internationalization ("i18n").
  • A localized system has been adapted or converted for use in a particular locale
    Locale
    In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface...

    (other than the one it was originally developed for), including the language of the user interface (UI), input, and display, and features such as time/date display and currency. Each instance of the system only supports a single locale, and there is no explicit support for languages that are not part of that locale (although the character set may coincidentally be usable for other languages).
  • Multilingualized software supports multiple languages for display and input, but has a single UI language which cannot be changed after installation of the software. Multi-locale support for other features like date, time, number, and currency formats varies as the system tends towards full internationalization. At present, most multi-lingual software relies for these features on the host operating system (e.g., Microsoft Windows or Mac OS X) of the machine on which the software runs, and may thus be able to support character sets for different languages within the same document. In general, a multilingualized system is intended for use in one specific locale, but is capable of handling multilingual content as data.
  • An internationalized system is equipped for use in a range of "locales" (or by users of multiple languages), by allowing the co-existence of several languages and character sets for input, display, and UI. In particular, a system may not be considered internationalized in the fullest sense unless the UI language is selectable by the user at runtime. Full internationalization may extend beyond support for multiple languages and orthography to compliance with jurisdiction-specific legislation (in respect of copyright, for instance) and other non-linguistic conventions.


The distinction arises because it is significantly more difficult to create a multi-lingual UI than simply to support the character sets and keyboards needed to express multiple languages. To internationalize a UI, every text string employed in interaction must be translated into all supported languages; then all output of literal strings, and literal parsing of input in UI code must be replaced by hooks to i18n libraries.

It should be noted that "internationalized" does not necessarily mean that a system can be used absolutely anywhere, since simultaneous support for all possible locales is both practically almost impossible and commercially very hard to justify. In many cases an internationalized system includes full support only for the most spoken languages, plus any others of particular relevance to the application.

Scope


Focal points of internationalization and localization efforts include:
  • Language
    • Computer-encoded text
      • Alphabets/scripts; most recent systems use the Unicode
        Unicode
        Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

         standard to solve many of the character encoding
        Character encoding
        A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

         problems.
      • Different systems of numerals
        Number names
        In linguistics, number names are specific words in a natural language that represent numbers.In writing, numerals are symbols also representing numbers...

      • Writing direction left to right in most European languages (e.g. German
        German language
        German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

        ), right-to-left in Hebrew and Arabic, vertical in some Asian languages
      • Complex text layout
        Complex Text Layout
        Complex text layout or complex text rendering refers to the typesetting of writing systems which require complex transformations between text input and text display for proper rendering on the screen or the printed page...

      • Text processing differences, such as the concept of capitalization
        Capitalization
        Capitalization is writing a word with its first letter as a majuscule and the remaining letters in minuscules . This of course only applies to those writing systems which have a case distinction...

         which exists in some scripts and not in others, different text sorting
        Collation
        Collation is the assembly of written information into a standard order. One common type of collation is called alphabetization, though collation is not limited to ordering letters of the alphabet...

         rules, etc.
      • Plural forms in text output, which differ depending upon language
    • Input
      • Enablement of keyboard shortcuts on any keyboard layout
    • Graphical representations of text (printed materials, online images containing text)
    • Spoken (Audio
      Sound
      Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing and of a level sufficiently strong to be heard, or the sensation stimulated in organs of hearing by such vibrations.-Propagation of...

      )
    • Subtitling
      Subtitle (captioning)
      Subtitles are textual versions of the dialog in films and television programs, usually displayed at the bottom of the screen. They can either be a form of written translation of a dialog in a foreign language, or a written rendering of the dialog in the same language, with or without added...

       of film
      Film
      A film, also called a movie or motion picture, is a series of still or moving images. It is produced by recording photographic images with cameras, or by creating images using animation techniques or visual effects...

       and video
      Video
      Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.- History :...

  • Culture
    Culture
    Culture is a term that has many different inter-related meanings. For example, in 1952, Alfred Kroeber and Clyde Kluckhohn compiled a list of 164 definitions of "culture" in Culture: A Critical Review of Concepts and Definitions...

    • Images and colors: issues of comprehensibility and cultural appropriateness
    • Names and titles
    • Government assigned numbers (such as the Social Security number
      Social Security number
      In the United States, a Social Security number is a nine-digit number issued to U.S. citizens, permanent residents, and temporary residents under section 205 of the Social Security Act, codified as . The number is issued to an individual by the Social Security Administration, an independent...

       in the US, National Insurance number
      National Insurance number
      The National Insurance number is a number used in the United Kingdom in the administration of the National Insurance or social security system. It is also used for some purposes in the UK tax system...

       in the UK, Isikukood in Estonia, and Resident registration number in South Korea
      South Korea
      The Republic of Korea , , is a sovereign state in East Asia, located on the southern portion of the Korean Peninsula. It is neighbored by the People's Republic of China to the west, Japan to the east, North Korea to the north, and the East China Sea and Republic of China to the south...

      ) and passports
    • Telephone numbers, addresses and international postal code
      Postal code
      A postal code is a series of letters and/or digits appended to a postal address for the purpose of sorting mail. Once postal codes were introduced, other applications became possible.In February 2005, 117 of the 190 member countries of the Universal Postal Union had postal code systems...

      s
    • Currency
      Currency
      In economics, currency refers to a generally accepted medium of exchange. These are usually the coins and banknotes of a particular government, which comprise the physical aspects of a nation's money supply...

       (symbols, positions of currency markers)
    • Weights and measures
    • Paper sizes
  • Writing conventions
    Convention (norm)
    A convention is a set of agreed, stipulated or generally accepted standards, norms, social norms or criteria, often taking the form of a custom....

    • Date/time format, including use of different calendars
    • Time zones (UTC
      Coordinated Universal Time
      Coordinated Universal Time is the primary time standard by which the world regulates clocks and time. It is one of several closely related successors to Greenwich Mean Time. Computer servers, online services and other entities that rely on having a universally accepted time use UTC for that purpose...

       in internationalized environments)
    • Formatting of numbers (decimal separator
      Decimal separator
      Different symbols have been and are used for the decimal mark. The choice of symbol for the decimal mark affects the choice of symbol for the thousands separator used in digit grouping. Consequently the latter is treated in this article as well....

      , digit grouping)
    • Differences in symbols (e.g. quoting text using double-quotes (" "), as in English, or guillemets (« »), as in French).
  • Any other aspect of the product or service that is subject to regulatory compliance
    Compliance (regulation)
    In general, compliance means conforming to a rule, such as a specification, policy, standard or law. Regulatory compliance describes the goal that corporations or public agencies aspire to in their efforts to ensure that personnel are aware of and take steps to comply with relevant laws and...

    • Disputed borders shown on maps (e.g. failing to show Kashmir
      Kashmir
      Kashmir is the northwestern region of the Indian subcontinent. Until the mid-19th century, the term Kashmir geographically denoted only the valley between the Great Himalayas and the Pir Panjal mountain range...

       as Indian is a crime in India
      India
      India , officially the Republic of India , is a country in South Asia. It is the seventh-largest country by geographical area, the second-most populous country with over 1.2 billion people, and the most populous democracy in the world...

      )


The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale
Locale
In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface...

. Internationalization is done once per product, while localization is done once for each combination of product and locale. The processes are complementary, and must be combined to lead to the objective of a system that works globally. Subjects unique to localization include the following:
  • Language translation
  • National varieties of languages (see language localization)
  • Special support for certain languages such as East Asian languages
  • Local customs
  • Local content
  • Symbols
  • Order of sorting (Collation
    Collation
    Collation is the assembly of written information into a standard order. One common type of collation is called alphabetization, though collation is not limited to ordering letters of the alphabet...

    )
  • Aesthetics
    Aesthetics
    Aesthetics is a branch of philosophy dealing with the nature of beauty, art, and taste, and with the creation and appreciation of beauty. It is more scientifically defined as the study of sensory or sensori-emotional values, sometimes called judgments of sentiment and taste...

  • Cultural values and social context
  • Differing laws/regulations (e.g. taxation laws, labour laws, etc.)

Business process for internationalizing software


In order to internationalize a product, it is important to look at a variety of markets that your product will foreseeably enter. Details such as field length for street addresses, unique format for the address, ability to make the zip code
ZIP Code
ZIP codes are a system of postal codes used by the United States Postal Service since 1963. The term ZIP, an acronym for Zone Improvement Plan, is properly written in capital letters and was chosen to suggest that the mail travels more efficiently, and therefore more quickly, when senders use the...

 field optional to address countries that do not have zip codes, plus the introduction of new registration flows that adhere to local laws are just some of the examples that make internationalization a complex project.

A broader approach takes into account cultural factors regarding for example the adaptation of the business process logic or the inclusion of individual cultural (behavioral) aspects.

Coding practice


The current prevailing practice is for applications to place text in resource strings which are loaded during program execution as needed. These strings, stored in resource files, are relatively easy to translate. Programs are often built to reference resource libraries depending on the selected locale data. One software library that aids this is gettext
Gettext
In computing, gettext is an internationalization and localization system commonly used for writing multilingual programs on Unix-like computer operating systems. The most commonly-used implementation of gettext is GNU gettext, released by the GNU Project in 1995.- History :gettext was originally...

.

Thus to get an application to support multiple languages one would design the application to select the relevant language resource file at runtime. Resource files are translated to the required languages. This method tends to be application-specific and, at best, vendor-specific. The code required to manage date entry verification and many other locale-sensitive data types also must support differing locale requirements. Modern development systems and operating systems include sophisticated libraries for international support of these types.

Some tools help in detecting i18n issues and guiding software resolution of those issues, such as Lingoport's Globalyzer or Parasoft Test.

Difficulties


While translating existing text to other languages may seem easy, it is more difficult to maintain the parallel versions of texts throughout the life of the product. For instance, if a message displayed to the user is modified, all of the translated versions must be changed. This in turn results in a somewhat longer development cycle.

Many localization issues (e.g. writing direction, text sorting) require more profound changes in the software than text translation. For example, OpenOffice.Org
OpenOffice.org
OpenOffice.org, commonly known as OOo or OpenOffice, is an open-source application suite whose main components are for word processing, spreadsheets, presentations, graphics, and databases. OpenOffice is available for a number of different computer operating systems, is distributed as free software...

 achieves this with compilation switches.

To some degree (e.g. for Quality assurance
Quality Assurance
Quality assurance, or QA for short, is the systematic monitoring and evaluation of the various aspects of a project, service or facility to maximize the probability that minimum standards of quality are being attained by the production process...

), the development team needs someone who understands foreign languages and cultures and has a technical background. In large societies with one dominant language/culture, it may be difficult to find such a person.

One example of the pitfalls of localization is the attempt made by Microsoft to keep some keyboard shortcuts significant in local languages. This has resulted in some (but not all) programs in the Italian version of Microsoft Office
Microsoft Office
Microsoft Office is a non-free commercial office suite of inter-related desktop applications, servers and services for the Microsoft Windows and Mac OS X operating systems, introduced by Microsoft in August 1, 1989. Initially a marketing term for a bundled set of applications, the first version of...

 using "CTRL + S" (sottolineato) as a replacement for "CTRL + U" (underline
Underline
An underline, also called an underscore, is one or more horizontal lines immediately below a portion of writing. Single, and occasionally double , underlining was originally used in hand-written or typewritten documents to emphasise text...

), rather than the (almost) universal "Save" function.

Cost vs benefit tradeoff


In a commercial setting, the benefit from localization is access to more markets. However, there are considerable costs involved, which go far beyond just engineering. First, software must generally be re-engineered to make it 'World-Ready'. Thereafter, providing a localization package for a given language is in itself a non-trivial undertaking, requiring specialized technical writers to construct a culturally-appropriate syntax for potentially complicated concepts, coupled with engineering resources to deploy and test the localization elements. Further, business operations must adapt to manage the production, storage and distribution of multiple discrete localized products, which are often being sold in completely different currencies, regulatory environments and tax regimes. Finally: sales, marketing and technical support must also facilitate their own operations in the new languages, in order to support customers for the localized products. Particularly for relatively small language populations, it may thus never be economically viable to offer a localized product. Even where large language populations could justify localization for a given product, and where a product's internal structure already permits localization, a given software developer/publisher may lack the size and sophistication to manage the ancillary functions associated with operating in multiple locales.

One alternative, most often used by open source software communities, is self-localization by teams of end-users and volunteers. The KDE
KDE
KDE is an international free software community producing an integrated set of cross-platform applications designed to run on Linux, FreeBSD, Microsoft Windows, Solaris and Mac OS X systems...

 project, for example, has been translated into over 100 languages. However, self-localization requires that the underlying product first be engineered to support such activities, which is a non-trivial endeavor.

See also

  • SDL Passolo
    SDL Passolo
    SDL Passolo is an award-winning specialised visual software localisation tool developed to enable the translation of user interfaces. They currently have a newly released 2009 version.-History:...

  • Alchemy Catalyst
    Alchemy Catalyst
    Alchemy CATALYST is a software Internationalization and localization suite which is developed by Alchemy Software Development Limited.-History:...

  • Bidirectional script support
  • CJK
    CJK
    CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

  • Computer russification
    Computer russification
    In computing, Russification is the localization of computers and software, i.e., making the user interface of a computer and software to communicate in the Russian language and alphabet....

    , localization into Russian language
  • Game localization
  • Global information system
    Global information system
    - Definition :There is a variety of definitions and understandings of a Global Information System and tea , such as* A global information system is an information system which is developed and / or used in a global context....

  • Globalization
    Globalization
    Globalization refers to the increasingly global relationships of culture, people and economic activity. Most often, it refers to economics: the global distribution of the production of goods and services, through reduction of barriers to international trade such as tariffs, export fees, and import...

  • Globalization Management System
    Globalization Management System
    A Globalization Management System or GMS automates transactions to reduce the time and money employed by manpower performing repetitive, non-productive labour. Thus, human resources can be redeployed to more productive and strategic tasks...

  • Globalocal
    Globalocal
    Globalocal is a portmanteau of Global and Local. Globalocal is relating to the whole world but with an impact or significance to a particular area or one’s neighbourhood. Globalocal is thinking on a global scale and acting at a local level...

  • Glocalization
  • Input method editor
    Input method editor
    An input method is an operating system component or program that allows any data, such as keyboard strokes or mouse movements, to be received as input. In this way users can enter characters and symbols not found on their input devices...

  • International Components for Unicode
    International Components for Unicode
    International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...

  • Language code
    Language code
    A language code is a code that assigns letters and/or numbers as identifiers or classifiers for languages. These codes may be used to organize library collections or presentations of data, to choose the correct localizations and translations in computing, and as a shorthand designation for longer...

  • Language industry
    Language industry
    The language industry is the sector of activity dedicated to designing, producing, and marketing tools, products, or services related to computerized language processing...

  • Language localization
    Language localisation
    Language localisationThe spelling "localization", a variant of "localisation", is the preferred spelling in the US and Canada. is the second phase of a larger process of product translation and cultural adaptation to account for...

  • Multilanguage Electronic Phototypesetting System
    Multilanguage Electronic Phototypesetting System
    Multilanguage Electronic Phototypesetting System is a system for offset printing in a variety of languages and character sets. The system, completed in 1986, was designed by the Watchtower Bible and Tract Society....

     (MEPS)
  • Pseudolocalization
    Pseudolocalization
    Pseudolocalization is a software testing method that is used to test internationalization aspects of software. Specifically, it brings to light potential difficulties with localization by replacing localizable text with text that imitates the most problematic characteristics of text from a wide...

    , a software testing
    Software testing
    Software testing is an investigation conducted to provide stakeholders with information about the quality of the product or service under test. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software...

     method for testing a software product's readiness for localization.
  • Punycode
    Punycode
    In computing, Punycode is an instance of a general encoding syntax by which a string of Unicode characters is transformed uniquely and reversibly into a smaller, restricted character set....

    , translating Unicode into the character sets for network host names
  • Region code
  • Separation of concerns
    Separation of concerns
    In computer science, separation of concerns is the process of separating a computer program into distinct features that overlap in functionality as little as possible. A concern is any piece of interest or focus in a program. Typically, concerns are synonymous with features or behaviors...

  • Software Localization
  • Translation
    Translation
    Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...

  • Website localization
    Website localization
    Website localization is the process of adapting an existing website to local language and culture in the target market.Two factors are involved—programming expertise and linguistic/cultural knowledge....

  • Web Translate It
    Web Translate It
    Web Translate It is a web-based translation project management tool developed by Atelier Convivialité and launched in 2009.-Features:Web Translate It encompasses translation project management, user management and project file management...


External links