All Topics  
Text file

 

   Email Print
   Bookmark   Link






 

Text file



 
 
A text file (sometimes spelled "textfile": an old alternate name is "flatfile") is a kind of computer file
Computer file

A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable computer storage....
 that is structured as a sequence of lines
Line (text file)

In computing, a line is a unit of organization for text files. A line consists of a sequence of zero or more characters. Depending on the file system being used the number of characters on a line may be fixed or may vary with the end of each line being denoted by one or more special characters....
. A text file exists within a computer file system. The end of a text file is often denoted by placing one or more special characters, known as an end-of-file
End-of-file

In computing, end-of-file, commonly abbreviated EOF, is a condition in a computer operating system where no more data can be read from a data source....
 marker, after the last line in a text file.

"Text file" refers to a type of container, while plain text
Plain text

In computing, plain text is a term used for an ordinary "unformatted" sequential file readable as textual material without much processing.The Character encoding has traditionally been either ASCII, one of its many derivatives such as ISO/IEC 646 etc., or sometimes EBCDIC....
 refers to a type of content.






Discussion
Ask a question about 'Text file'
Start a new discussion about 'Text file'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A text file (sometimes spelled "textfile": an old alternate name is "flatfile") is a kind of computer file
Computer file

A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable computer storage....
 that is structured as a sequence of lines
Line (text file)

In computing, a line is a unit of organization for text files. A line consists of a sequence of zero or more characters. Depending on the file system being used the number of characters on a line may be fixed or may vary with the end of each line being denoted by one or more special characters....
. A text file exists within a computer file system. The end of a text file is often denoted by placing one or more special characters, known as an end-of-file
End-of-file

In computing, end-of-file, commonly abbreviated EOF, is a condition in a computer operating system where no more data can be read from a data source....
 marker, after the last line in a text file.

"Text file" refers to a type of container, while plain text
Plain text

In computing, plain text is a term used for an ordinary "unformatted" sequential file readable as textual material without much processing.The Character encoding has traditionally been either ASCII, one of its many derivatives such as ISO/IEC 646 etc., or sometimes EBCDIC....
 refers to a type of content. Text files can contain plain text, but they are not limited to such.

At a generic level of description, there are two kinds of computer files: text files and binary files.

Data storage

Because of their simplicity text files are commonly used for storage of information. They avoid some of the problems encountered with other file formats, such as endianness
Endianness

In computing, endianness is the byte ordering used to represent some kind of data. Typical cases are the order in which integer values are stored as bytes in computer memory and the transmission order over a network or other medium....
, padding bytes, or differences in the number of bytes in a machine word
Word (computer science)

In computing, "word" is a term for the natural unit of data used by a particular computer design. A word is simply a fixed-sized group of bits that are handled together by the machine....
. Further, when data corruption
Data corruption

Data corruption refers to errors in computer data that occur during transmission or retrieval, introducing unintended changes to the original data....
 occurs in a text file, it is often easier to recover and continue processing the remaining contents. A disadvantage of text files is that they usually have a low entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
, meaning that the information occupies more storage than is strictly necessary.

Formats


ASCII

The ASCII
ASCII

American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words....
 standard allows ASCII-only text files (unlike most other file types) to be freely interchanged and readable on Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
, Macintosh, Microsoft Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
, DOS, and other systems. These differ in their preferred line ending
Newline

In computing, a newline is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line?that is, on the next line below the text, immediately proceeding the newline....
 convention and their interpretation of values outside the ASCII range (their character encoding
Character encoding

A character encoding system consists of a code that pairs a sequence of character from a given character set with something else, such as a sequence of natural numbers, octet or electrical pulses, in order to facilitate the transmission of data through telecommunication networks and/or Computer data storage of Character in compute...
).

MIME

Text files usually have the MIME
MIME

Multipurpose Internet Mail Extensions is an Internet standard that extends the format of electronic mail to support:* Text in character sets other than ASCII...
 type "text/plain", usually with additional information indicating an encoding. Prior to the advent of Mac OS X
Mac OS X

Mac OS X is a line of computer operating systems developed, marketed, and sold by Apple Inc., and since 2002 has been included with all new Macintosh computer systems....
, the Mac OS system regarded the content of a file (the data fork) to be a text file when its resource fork indicated that the type of the file was "TEXT". Under the Windows operating system, a file is regarded as a text file if the suffix of the name of the file (the "extension
Filename extension

A filename extension is a substring to the filename of a computer file applied to indicate the encoding convention of its contents.In some operating systems it is optional, while in some others it is a requirement....
") is "txt". However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
 in which the source is written.

.txt

.txt is a filename extension
Filename extension

A filename extension is a substring to the filename of a computer file applied to indicate the encoding convention of its contents.In some operating systems it is optional, while in some others it is a requirement....
 for files consisting of text usually contain very little formatting (ex: no bolding
Emphasis (typography)

In typography, emphasis is the exaggeration of words in a text with a font in a different style from the rest of the text—to emphasise them....
 or italics
Italic type

In typography, italic type refers to cursive typefaces based on a stylized form of calligraphic handwriting. The influence from calligraphy can be seen in their usual slight slanting to the right....
). The precise definition of the .txt format is not specified, but typically matches the format accepted by the system terminal or simple text editor
Text editor

A text editor is a type of software application used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
. Files with the .txt extension can easily be read or opened by any program that reads text and, for that reason, are considered universal (or platform independent
Cross-platform

In computing, cross-platform is a term used to refer to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms....
).

The ASCII character set is the most common format for English-language text files, and is generally assumed to be the default file format in many situations. For accented and other non-ASCII characters, it is necessary to choose a character encoding. In many systems, this is chosen on the basis of the default locale
Locale

In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface....
 setting on the computer it is read on. Common character encodings include ISO 8859-1 for many European languages.

Because many encodings have only a limited repertoire of characters, they are often only usable to represent text in a limited subset of human languages. Unicode
Unicode

Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....
 is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. Although there are multiple character encodings available for Unicode, the most common is UTF-8
UTF-8

UTF-8 is a Variable-width encoding character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backward compatibility with ASCII....
, which has the advantage of being backwards-compatible with ASCII: that is, every ASCII text file is also a UTF-8 text file with identical meaning.

Standard Windows .txt files


Microsoft MS-DOS and Windows use a common text file format, with each line of text separated by a two character combination: CR and LF, which have ASCII codes 13 and 10. It is common for the last line of text not to be terminated with a CR-LF marker, and many text editors (including Notepad) do not automatically insert one on the last line.

Most Windows text files use a form of ANSI, OEM or Unicode encoding. What Windows terminology calls "ANSI encodings" are usually single-byte ISO-8859 encodings, except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Windows, before the transition to Unicode. By contrast, OEM encodings, also known as MS-DOS code pages, were defined by IBM for use in the original IBM PC text mode display system. They typically include graphical and line-drawing characters common in full-screen MS-DOS applications. Newer Windows text files may use a Unicode encoding such as UTF-16LE or UTF-8.

Rendering


When opened by a text editor
Text editor

A text editor is a type of software application used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
 human-readable content is presented to the user. This often consists of the file's plain text visible to the user. Depending on the application, control codes may be rendered either as literal instructions acted upon by the editor, or as visible escape character
Escape character

In computing and telecommunication, an escape character is a single character which in a sequence of characters signifies that what is to follow takes an alternative interpretation....
s that can be edited as plain text. Though there may be plain text in a text file, control characters within the file (especially the end-of-file
End-of-file

In computing, end-of-file, commonly abbreviated EOF, is a condition in a computer operating system where no more data can be read from a data source....
 character) can render the plain text unseen by a particular method.

See also

  • List of file formats
    List of file formats

    This is a list of file formats organized by type, as can be found on computers. Filename extensions are usually noted in parentheses if they differ from the format name or abbreviation....
  • File extensions
  • ASCII
    ASCII

    American Standard Code for Information Interchange , is a coding standard that can be used for interchanging information, if the information is expressed mainly by the written form of English words....
  • EBCDIC
    EBCDIC

    Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used on IBM mainframe operating systems such as z/OS, OS/390, VM and VSE , as well as IBM midrange computer operating systems such as OS/400 and i5/OS ....
  • Text editor
    Text editor

    A text editor is a type of software application used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
  • Unicode
    Unicode

    Unicode is a computing industry standard allowing computers to consistently represent and manipulate Character expressed in most of the world's writing systems....


External links