In
computingComputing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
,
plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to
formatted textFormatted text, styled text or rich text, as opposed to plain text, has styling information beyond the minimum of semantic elements: colours, styles , sizes and special features .-Terminology:...
.
The
encodingA character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
has traditionally been either
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
, one of its many derivatives such as
ISO/IEC 646ISO/IEC 646:1991, Information technology — ISO 7-bit coded character set for information interchange, is an ISO standard that since its first edition in 1972 has specified a 7-bit character code from which several national standards are derived...
etc., or sometimes
EBCDICExtended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....
.
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
-based encodings such as
UTF-8UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
and UTF-16 are gradually replacing the older ASCII derivatives limited to 7 or 8 bit codes.
Usage
The purpose of using
plain text today is primarily a "lowest common denominator" independence from programs that require their very own special encoding or formatting (with due sacrifices and limitations). Plain text files can be opened, read, and edited with most
text editorA text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
s. Examples include Notepad (
WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
), edit (
DOSDOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...
),
eded is a line editor for the Unix operating system. It was one of the first end-user programs hosted on the system and has been standard in Unix-based systems ever since. ed was originally written in PDP-11/20 assembler by Ken Thompson in 1971...
,
emacsEmacs is a class of text editors, usually characterized by their extensibility. GNU Emacs has over 1,000 commands. It also allows the user to combine these commands into macros to automate work.Development began in the mid-1970s and continues actively...
,
vivi is a screen-oriented text editor originally created for the Unix operating system. The portable subset of the behavior of vi and programs based on it, and the ex editor language supported within these programs, is described by the Single Unix Specification and POSIX.The original code for vi...
, vim,
Geditgedit is a text editor for the GNOME desktop environment, Mac OS X and Microsoft Windows. Designed as a general purpose text editor, gedit emphasizes simplicity and ease of use...
or
nanonano is a text editor for Unix-like computing systems or operating environments using a command line interface. It emulates the Pico text editor, part of the Pine email client, and also provides additional functionality....
(
UnixUnix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
,
LinuxLinux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
),
SimpleTextSimpleText is the native text editor for the Classic Mac OS. SimpleText allows editing including text formatting , fonts, and sizes. It can be considered similar to Windows' WordPad application...
(
Mac OSMac OS is a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems. The Macintosh user experience is credited with popularizing the graphical user interface...
), or
TextEditTextEdit is a simple, open source word processor and text editor, first featured in NeXT's NEXTSTEP and OPENSTEP. It is now distributed with Mac OS X since Apple Inc.'s acquisition of NeXT, and available as a GNUstep application for other Unix-compatible operating systems such as Linux...
(
Mac OS XMac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
). Other computer programs are also capable of reading and importing plain text.
It can also be used by simple computer tools such as line printing text commands like
typeIn computing, type is a command in various VMS. AmigaDOS, CP/M, DOS, OS/2 and Microsoft Windows command line interpreters such as COMMAND.COM, cmd.exe, 4DOS/4NT and Windows PowerShell. It is used to display the contents of specified files...
(
DOSDOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...
and
WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
) and
catThe cat command is a standard Unix program used to concatenate and display files. The name is from catenate, a synonym of concatenate.- Specification :...
(Unix), but also for more complex activities like web browsers, i.e.
LynxLynx is a text-based web browser for use on cursor-addressable character cell terminals and is very configurable.-Usage:Browsing in Lynx consists of highlighting the chosen link using cursor keys, or having all links on a page numbered and entering the chosen link's number. Current versions support...
and the Line Mode Browser.
Plain text files are almost universal in programming; a source code file containing instructions in a
programming languageA programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
is almost always a plain text file. Plain text is also commonly used for
configuration fileIn computing, configuration files, or config files configure the initial settings for some computer programs. They are used for user applications, server processes and operating system settings. The files are often written in ASCII and line-oriented, with lines terminated by a newline or carriage...
s, which are read for saved settings at the startup of a program.
Plain text is the original and ever popular method of conveying
e-mailElectronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
.
HTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
formatted e-mail messages often include an automatically generated plain text copy as well, for compatibility reasons.
Character encodings
Text was once commonly encoded in
ASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
, using 8
bitA bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s for one letter or other character, encoding 7 bits, allowing 128 values, and using the 8th as a
checksumA checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...
bit when transferring a file. This just allowed the ordinary
LatinLatin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...
alphabet, transfer control codes, parentheses and interpunction, which annoyed computer users, especially Portuguese and Swedish users.
When data transfer became more stable, the 8th bit stopped being used as a checksum and was used to extend the character set by another 128 characters; these non-standard characters were encoded differently in different countries, and in a way that made multilingual texts impossible to encode. For instance, a browser may display
¬A rather than
` if it tries to interpret one character set as another.
At last
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
was defined, which currently allows for 1,114,112 code values used for any modern text writing system and a lot of extinct ones, and is universal. For example, Unicode encodes characters for Chinese, Hebrew, and Cyrillic scripts as well as Latin. Some of these text formats may be quite complicated to process correctly, but they still contain no structural data, such as bold start and end markers, and are therefore plain text.
Control codes
The ASCII codes before
SPACE (=
32 =
20H) are not intended as displayable characters, but instead as
control characterIn computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
s. They are used for diverse interpreted meanings. For example, the code
NULL (=
0, sometimes denoted
Ctrl-@) is used as string end markers in the programming language C and successors. Most troublesome of these are the codes
LF (=
LINE FEED =
10 =
0AH) and
CR (=
CARRIAGE RETURN =
13 =
0DH). Windows and
OS/2OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...
require the sequence
CR,LF to represent a newline, while
UnixUnix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
and relatives use just the
LF, and Classic Mac OSMac OS is a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems. The Macintosh user experience is credited with popularizing the graphical user interface...
(but not Mac OS XMac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
) uses just the code CR. This was once a slight problem when transferring files between Windows and Unix systems, but today most computer programs treat this seamlessly.
See also
- Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....
, most commonly used in a cryptographic context
- Cleartext usually refers to lack of protection from eavesdropping
Eavesdropping is the act of secretly listening to the private conversation of others without their consent, as defined by Black's Law Dictionary...
- E-text
An e-text is, generally, any text-based information that is available in a digitally encoded human-readable format and read by electronic means, but more specifically it refers to files in the ASCII character encoding.E-text has the broad meaning of something electronic that represents words, a...
- MIME Content-type
- File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...
- Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...
- Text file
A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...
- Editor wars
- Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...