All Topics  
Character (computing)

 

   Email Print
   Bookmark   Link






 

Character (computing)



 
 
For other uses, see character
Character

Character may refer to*Character , an agent in a work of literature, drama, opera or other works of fiction*Character , the abstraction of an observable physical or biochemical trait of an organism...
.


In computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 and machine-based telecommunication
Telecommunication

Telecommunication is the assisted Transmission of Signal over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, Drum , Semaphore line, flag signals or heliograph....
s terminology, a character is a unit of information
Information

Information as a Conveyed concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control system, data, form, instruction, knowledge, Meaning , stimulation, pattern, perception, and knowledge representation....
 that roughly corresponds to a grapheme
Grapheme

In typography, a grapheme is the fundamental unit in writing systems. Graphemes include letter , Chinese characters, numerals, punctuation marks, and all the individual symbols of any of the world's writing systems....
, grapheme-like unit, or symbol, such as in an alphabet
Alphabet

An alphabet is a standardized set of letter basic written symbols each of which roughly represents a phoneme, a spoken language, either as it exists now or as it was in the past....
 or syllabary
Syllabary

A syllabary is a set of written symbols that represent syllables, which make up words. A symbol in a syllabary typically represents an optional consonant sound followed by a vowel sound....
 in the written
Written language

A written language is the representation of a language by means of a writing system. Written language is an invention in that it must be taught to children, who will instinctively learn or create spoken language or sign language languages....
 form of a natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
.

An example of a character is a letter, numeral
Numeral system

A numeral system is a writing system for expressing numerals , and a mathematical notation for representing numbers of a given set, using graphemes or symbols in a consistent manner....
, or punctuation
Punctuation

Punctuation is everything in written language other than the actual letters or numbers, including punctuation marks , Interword separation and indentation....
 mark. The concept also includes control character
Control character

In computing and telecommunication, a control Grapheme or non-printing character is a code point in a character encoding, that does not in itself represent a written symbol....
s, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages.






Discussion
Ask a question about 'Character (computing)'
Start a new discussion about 'Character (computing)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


For other uses, see character
Character

Character may refer to*Character , an agent in a work of literature, drama, opera or other works of fiction*Character , the abstraction of an observable physical or biochemical trait of an organism...
.


In computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 and machine-based telecommunication
Telecommunication

Telecommunication is the assisted Transmission of Signal over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, Drum , Semaphore line, flag signals or heliograph....
s terminology, a character is a unit of information
Information

Information as a Conveyed concept has a diversity of meanings, from everyday usage to technical settings. Generally speaking, the concept of information is closely related to notions of constraint, communication, control system, data, form, instruction, knowledge, Meaning , stimulation, pattern, perception, and knowledge representation....
 that roughly corresponds to a grapheme
Grapheme

In typography, a grapheme is the fundamental unit in writing systems. Graphemes include letter , Chinese characters, numerals, punctuation marks, and all the individual symbols of any of the world's writing systems....
, grapheme-like unit, or symbol, such as in an alphabet
Alphabet

An alphabet is a standardized set of letter basic written symbols each of which roughly represents a phoneme, a spoken language, either as it exists now or as it was in the past....
 or syllabary
Syllabary

A syllabary is a set of written symbols that represent syllables, which make up words. A symbol in a syllabary typically represents an optional consonant sound followed by a vowel sound....
 in the written
Written language

A written language is the representation of a language by means of a writing system. Written language is an invention in that it must be taught to children, who will instinctively learn or create spoken language or sign language languages....
 form of a natural language
Natural language

In the philosophy of language, a natural language is a language that is spoken, Sign language, or writing by humans for general-purpose communication, as distinguished from formal languages and from constructed languages....
.

An example of a character is a letter, numeral
Numeral system

A numeral system is a writing system for expressing numerals , and a mathematical notation for representing numbers of a given set, using graphemes or symbols in a consistent manner....
, or punctuation
Punctuation

Punctuation is everything in written language other than the actual letters or numbers, including punctuation marks , Interword separation and indentation....
 mark. The concept also includes control character
Control character

In computing and telecommunication, a control Grapheme or non-printing character is a code point in a character encoding, that does not in itself represent a written symbol....
s, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return
Carriage return

Originally, carriage return was the term for the control character in Baudot code on a Teleprinter for end of line return to beginning of line and did not include line feed....
 or tab
Tab key

Tab key on a alphanumeric keyboard is used to advance the cursor to the next tab stop....
, as well as instructions to printer
Computer printer

File:Lexmark X5100 Series.jpgIn computing, a printer is a peripheral which produces a hard copy of documents stored in computer file form, usually on physical print media such as paper or Transparency ....
s or other devices that display or otherwise process text.

Character encoding

Computers and communication equipment represent characters using a character encoding
Character encoding

A character encoding system consists of a code that pairs a sequence of character from a given character set with something else, such as a sequence of natural numbers, octet or electrical pulses, in order to facilitate the transmission of data through telecommunication networks and/or Computer data storage of Character in compute...
 that assigns each character to something — an integer
Integer

The integers are natural numbers including 0 and their negative and non-negative numberss . They are numbers that can be written without a fractional or decimal component, and fall within the set ....
 quantity represented by a sequence of bit
Bit

A bit is a binary numeral system numerical digit, taking a value of either 0 or 1. Binary digits are a basic unit of information Computer data storage and transmission in digital computing and digital information theory....
s, typically — that can be stored
Computer storage

Computer data storage, often called storage or memory, refers to computer components, devices, and recording medium that retain digital data used for computing for some interval of time....
 or transmitted through a network
Computer network

A computer network is a group of interconnected computers. Networks may be classified according to a wide variety of characteristics. This article provides a general overview of some types and categories and also presents the basic components of a network....
. Two examples of popular encodings are ASCII
ASCII

American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words....
 and the UTF-8
UTF-8

UTF-8 is a Variable-width encoding character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backward compatibility with ASCII....
 encoding for Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
. According to statistics collected by Google
Google

Google Inc. is an United States public company, earning revenue from AdWords related to its Google search, Gmail, Google Maps, Google Apps, Orkut, and YouTube services as well as selling advertising-free versions of the Google Search Appliance....
, UTF-8 is the most common encoding used on web pages . While most character encodings map characters to numbers and/or bit sequences, Morse code
Morse code

Morse code is a type of character encoding that transmits telegraphic information using rhythm. Morse code uses a standardized sequence of short and long elements to represent the alphanumeric, punctuation and special characters of a given message....
 instead represents characters using a series of electrical impulses of varying length.

Terminology

Historically, the term character has been widely used by industry professionals to refer to an encoded character (often only as exposed via a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
's API
Application programming interface

An application programming interface is a set of subroutine, data structures, class and/or Protocol provided by library and/or operating system Service s in order to support the building of applications....
). Likewise, character set has been widely used to refer to a specific repertoire of abstract characters that have been mapped to specific bit sequences. With the advent of Unicode and bit-agnostic encoding forms, more precise terminology is increasingly favored.

It is important, in some contexts, to make the distinction that a character is a unit of information, and thus does not imply any particular visual manifestation. For example, the Hebrew letter Aleph
Aleph (letter)

' is the reconstructed name of the first letter of the Proto-Canaanite alphabet, continued in descended Semitic alphabets as Phoenician alphabet ' , Syriac alphabet ' , Hebrew alphabet Aleph , and Arabic alphabet ' ....
 ("?") is often used by mathematicians to denote certain kinds of infinity
Infinity

Infinity comes from the Latin infinitas or "unboundedness." It refers to several distinct concepts – usually linked to the idea of "without end" – which arise in philosophy, mathematics, and theology....
, but it is also used in ordinary Hebrew text. In Unicode, these two uses are different characters and are signified by two different codes, though they may be rendered identically. Conversely, the Chinese logogram
Logogram

A logogram, or logograph, is a grapheme which represents a word or a morpheme . This stands in contrast to phonogram , which represent phonemes or combinations of phonemes, and determinatives, which mark semantics....
 for water ("?") may have a slightly different appearance in Japanese texts than it does in Chinese texts, and local typefaces may reflect this. But they nonetheless represent the same information, are considered the same character, and share the same Unicode code point.

The term glyph
Glyph

A glyph is an element of writing. Two or more glyphs representing the same symbol, whether interchangeable or context-dependent, are called allographs; the abstract unit they are variants of is called a grapheme or character ....
 is used to describe a particular physical appearance of a character. Many computer font
Typeface

In typography, a typeface is a set of one or more fonts, in one or more sizes, designed with stylistic unity, each comprising a coordinated set of glyphs....
s consist of glyphs that are indexed by the Unicode code point of the character that each glyph represents.

The definition of character, or abstract character, is mutually defined by The Unicode Standard and ISO/IEC 10646
Universal Character Set

The Universal Character Set , defined by the International Organization for Standardization/International Electrotechnical Commission 10646 International Organization for Standardization, is a standard set of character s upon which many character encodings are based....
 as "a member of a set of elements used for the organisation, control, or representation of data." Unicode's definition supplements this with explanatory notes that encourage the reader to differentiate between characters, graphemes, and glyphs, among other things. The standards also differentiate between these abstract characters and coded characters or encoded characters that have been paired with numeric codes that facilitate their representation in computers.

See also

  • Characters are often combined in string
    String (computer science)

    In computer programming and some branches of mathematics, a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set or alphabet....
    s
  • Fill character
    Fill character

    In computer terminology, a fill character is a Character transmitted solely for the purpose of consuming time. It does this by filling a timeslot on a data transmission line which would otherwise be forced to be idle ....
  • Non-spacing character
    Non-spacing character

    A non-spacing character, when typed, doesn't move the entry point to the next character , as in conventional typewriters where to type an '?' one must type the Umlaut first and then 'a' which will go into the same space....


External links

  • by The Linux Information Project (LINFO)
  • summarizes the ISO/IEC's character model, focusing on terminology definitions and differentiating between characters and glyphs