Plain text
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text
Formatted text
Formatted text, styled text or rich text, as opposed to plain text, has styling information beyond the minimum of semantic elements: colours, styles , sizes and special features .-Terminology:...

.

The encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

 has traditionally been either ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

, one of its many derivatives such as ISO/IEC 646
ISO/IEC 646
ISO/IEC 646:1991, Information technology — ISO 7-bit coded character set for information interchange, is an ISO standard that since its first edition in 1972 has specified a 7-bit character code from which several national standards are derived...

 etc., or sometimes EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....

. Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

-based encodings such as UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 and UTF-16 are gradually replacing the older ASCII derivatives limited to 7 or 8 bit codes.

Usage

The purpose of using plain text today is primarily a "lowest common denominator" independence from programs that require their very own special encoding or formatting (with due sacrifices and limitations). Plain text files can be opened, read, and edited with most text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....

s. Examples include Notepad (Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

), edit (DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

), ed
Ed (text editor)
ed is a line editor for the Unix operating system. It was one of the first end-user programs hosted on the system and has been standard in Unix-based systems ever since. ed was originally written in PDP-11/20 assembler by Ken Thompson in 1971...

, emacs
Emacs
Emacs is a class of text editors, usually characterized by their extensibility. GNU Emacs has over 1,000 commands. It also allows the user to combine these commands into macros to automate work.Development began in the mid-1970s and continues actively...

, vi
Vi
vi is a screen-oriented text editor originally created for the Unix operating system. The portable subset of the behavior of vi and programs based on it, and the ex editor language supported within these programs, is described by the Single Unix Specification and POSIX.The original code for vi...

, vim, Gedit
Gedit
gedit is a text editor for the GNOME desktop environment, Mac OS X and Microsoft Windows. Designed as a general purpose text editor, gedit emphasizes simplicity and ease of use...

 or nano
Nano (text editor)
nano is a text editor for Unix-like computing systems or operating environments using a command line interface. It emulates the Pico text editor, part of the Pine email client, and also provides additional functionality....

 (Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

), SimpleText
SimpleText
SimpleText is the native text editor for the Classic Mac OS. SimpleText allows editing including text formatting , fonts, and sizes. It can be considered similar to Windows' WordPad application...

 (Mac OS
Mac OS
Mac OS is a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems. The Macintosh user experience is credited with popularizing the graphical user interface...

), or TextEdit
TextEdit
TextEdit is a simple, open source word processor and text editor, first featured in NeXT's NEXTSTEP and OPENSTEP. It is now distributed with Mac OS X since Apple Inc.'s acquisition of NeXT, and available as a GNUstep application for other Unix-compatible operating systems such as Linux...

 (Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

). Other computer programs are also capable of reading and importing plain text.
It can also be used by simple computer tools such as line printing text commands like type
Type (command)
In computing, type is a command in various VMS. AmigaDOS, CP/M, DOS, OS/2 and Microsoft Windows command line interpreters such as COMMAND.COM, cmd.exe, 4DOS/4NT and Windows PowerShell. It is used to display the contents of specified files...

(DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

 and Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

) and cat
Cat (Unix)
The cat command is a standard Unix program used to concatenate and display files. The name is from catenate, a synonym of concatenate.- Specification :...

(Unix), but also for more complex activities like web browsers, i.e. Lynx
Lynx (web browser)
Lynx is a text-based web browser for use on cursor-addressable character cell terminals and is very configurable.-Usage:Browsing in Lynx consists of highlighting the chosen link using cursor keys, or having all links on a page numbered and entering the chosen link's number. Current versions support...

 and the Line Mode Browser.

Plain text files are almost universal in programming; a source code file containing instructions in a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 is almost always a plain text file. Plain text is also commonly used for configuration file
Configuration file
In computing, configuration files, or config files configure the initial settings for some computer programs. They are used for user applications, server processes and operating system settings. The files are often written in ASCII and line-oriented, with lines terminated by a newline or carriage...

s, which are read for saved settings at the startup of a program.

Plain text is the original and ever popular method of conveying e-mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

. HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 formatted e-mail messages often include an automatically generated plain text copy as well, for compatibility reasons.

Character encodings

Text was once commonly encoded in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

, using 8 bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s for one letter or other character, encoding 7 bits, allowing 128 values, and using the 8th as a checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

 bit when transferring a file. This just allowed the ordinary Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...

 alphabet, transfer control codes, parentheses and interpunction, which annoyed computer users, especially Portuguese and Swedish users.

When data transfer became more stable, the 8th bit stopped being used as a checksum and was used to extend the character set by another 128 characters; these non-standard characters were encoded differently in different countries, and in a way that made multilingual texts impossible to encode. For instance, a browser may display ¬A rather than ` if it tries to interpret one character set as another.

At last Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 was defined, which currently allows for 1,114,112 code values used for any modern text writing system and a lot of extinct ones, and is universal. For example, Unicode encodes characters for Chinese, Hebrew, and Cyrillic scripts as well as Latin. Some of these text formats may be quite complicated to process correctly, but they still contain no structural data, such as bold start and end markers, and are therefore plain text.

Control codes

The ASCII codes before SPACE (= 32 = 20H) are not intended as displayable characters, but instead as control character
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....

s. They are used for diverse interpreted meanings. For example, the code NULL (= 0, sometimes denoted Ctrl-@) is used as string end markers in the programming language C and successors. Most troublesome of these are the codes LF (= LINE FEED = 10 = 0AH) and CR (= CARRIAGE RETURN = 13 = 0DH). Windows and OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

 require the sequence CR,LF to represent a newline, while Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 and relatives use just the LF, and Classic Mac OS
Mac OS
Mac OS is a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems. The Macintosh user experience is credited with popularizing the graphical user interface...

 (but not Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

) uses just the code CR. This was once a slight problem when transferring files between Windows and Unix systems, but today most computer programs treat this seamlessly.

See also

  • Plaintext
    Plaintext
    In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....

    , most commonly used in a cryptographic context
  • Cleartext usually refers to lack of protection from eavesdropping
    Eavesdropping
    Eavesdropping is the act of secretly listening to the private conversation of others without their consent, as defined by Black's Law Dictionary...

  • E-text
    E-text
    An e-text is, generally, any text-based information that is available in a digitally encoded human-readable format and read by electronic means, but more specifically it refers to files in the ASCII character encoding.E-text has the broad meaning of something electronic that represents words, a...

  • MIME Content-type
  • File format
    File format
    A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...

  • Binary file
    Binary file
    A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

  • Text file
    Text file
    A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...

  • Editor wars
  • Source code
    Source code
    In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK