A
document file format is a
textA text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...
or
binaryA binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...
fileA computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
format for storing
documentThe term document has multiple meanings in ordinary language and in scholarship. WordNet 3.1. lists four meanings :* document, written document, papers...
s on a
storage mediaComputer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
, especially for use by
computerA computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
s.
There currently exist a multitude of incompatible document file formats.
A rough consensus has been established that
XMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
is to be the basis for future document file formats. Examples of XML-based
openAn open file format is a published specification for storing digital data, usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implementable by both proprietary and free and open source software, using the typical...
standards are
DocBookDocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....
,
XHTMLXHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....
, and, more recently, the
ISOThe International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
/
IECThe International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
standards
OpenDocumentThe Open Document Format for Office Applications is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents....
(ISO 26300:2006) and Office Open XML (ISO 29500:2008).
In 1993, the
ITU-TThe ITU Telecommunication Standardization Sector is one of the three sectors of the International Telecommunication Union ; it coordinates standards for telecommunications....
tried to establish a standard for document file formats, known as the
Open Document ArchitectureThe Open Document Architecture and interchange format is a free and open international standard document file format maintained by the ITU-T to replace all proprietary document file formats...
(ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description languageA page description language is a language that describes the appearance of a printed page in a higher level than an actual output bitmap. An overlapping term is printer control language, but it should not be confused as referring solely to Hewlett-Packard's PCL...
s such as
PostScriptPostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...
and
PDFPortable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....
have become the
de factoDe facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...
standard for documents that a typical user should only be able to create and read, not edit. In 2001, PDF became an international
ISOThe International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
/
IECThe International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
standard (ISO 15930-1:2001, ISO 19005-1:2005, ISO 32000-1:2008).
HTMLHyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
is the most used and open international standard and it is also used as document file format. It has also become
ISOThe International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
/
IECThe International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
standard (ISO 15445:2000).
The default binary file format used by
Microsoft WordMicrosoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...
(
.docIn computing, DOC or doc is a filename extension for word processing documents; most commonly for Microsoft Word. Historically, the extension was used for documentation in plain-text format, particularly of programs or computer hardware, on a wide range of operating systems...
) has become widespread
de factoDe facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...
standard for office documents, but it is a proprietary format and is not always fully supported by other word processors.
Common document file formats
- ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
, UTF-8UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
— plain textIn computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....
formats
- Amigaguide
AmigaGuide is a hypertext document file format designed for the Amiga, files are stored in ASCII so it is possible to read and edit a file without the need for special software.Since Workbench 2.1 an Amiga Guide system for O.S...
- .doc
In computing, DOC or doc is a filename extension for word processing documents; most commonly for Microsoft Word. Historically, the extension was used for documentation in plain-text format, particularly of programs or computer hardware, on a wide range of operating systems...
for Microsoft WordMicrosoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...
— Structural binary format developed by Microsoft (specifications available since 2008 under the Open Specification PromiseThe Microsoft Open Specification Promise , is a promise by Microsoft, published in September 2006, to not assert legal rights over certain Microsoft patents on implementations of an included list of technologies....
)
- DjVu
DjVu is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy...
— file format designed primarily to store scanned documents
- DocBook
DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....
— an XML format for technical documenation
- HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
(.html, .htm), (open standard, ISO from 2000), in combination with possible image files referred to.
- FictionBook
FictionBook is an open XML-based e-book format, which originated and gained popularity in Russia. The FictionBook files have the .fb2 filename extension....
(.fb2) — open XML-based e-book format
- Office Open XML — .docx (XML-based standard for office documents, ISO standard from 2008)
- OpenDocument
The Open Document Format for Office Applications is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents....
— .odt (XML-based standard for office documents, ISO standard from 2006)
- OpenOffice.org XML
OpenOffice.org XML is an open XML-based file format developed as an open community effort by Sun Microsystems and other OpenOffice.org project contributors in 2000-2002. The open-source software application suite OpenOffice.org 1.x and StarOffice 6 used the format as their native and default file...
— .sxw (open, XML-based format for office documents)
- OXPS — Open XML Paper Specification
- PalmDoc — Common Handheld
A personal digital assistant , also known as a palmtop computer, or personal data assistant, is a mobile device that functions as a personal information manager. Current PDAs often have the ability to connect to the Internet...
document format
- Plucker
Plucker is an offline Web and free e-book reader for Palm OS based handheld devices, Windows Mobile devices and other PDAs. Plucker contains POSIX tools, scripts and "conduits" which work on Unix, Linux, Mac OS X, and Microsoft Windows...
— Handheld navigable widely used document standard
- .pages for Pages
Pages is a word processor and page layout application developed by Apple. It is part of the iWork productivity suite and runs on the Mac OS X & iOS operating systems. The first version of Pages was announced on January 11, 2005, and was released one month later. The most recent Macintosh version,...
- PDF
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....
— Open standard for documents exchange. ISO standards from 2001, 2005, 2008. It is readable on almost every platform with free or open source readers. Open source PDF creators are also available.
- Rich Text Format (RTF)
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....
— meta data format being developed by Microsoft since 1987 for Microsoft products and cross-platformIn computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...
document interchange
- SYmbolic LinK (SYLK)
Symbolic Link is a Microsoft file format typically used to exchange data between applications, specifically spreadsheets. SYLK files conventionally have a .slk suffix. Composed of only displayable ANSI characters, it can be easily created and processed by other applications, such as...
- TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....
— Popular open-source typesetting program and format. First successful mathematical notation language.
- TEI
The Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset....
— XML format for digital publication
- Troff
troff is a document processing system developed by AT&T for the Unix operating system.-History:troff can trace its origins back to a text formatting program called RUNOFF, written by Jerome H. Saltzer for MIT's CTSS operating system in the mid-1960s...
- Uniform Office Format
Uniform Office Format sometimes known as Unified Office Format is an open standard for 'office' applications developed in China. It includes word processing, presentation, and spreadsheet modules, and is made up of GUI, API, and format specifications...
— Chinese standard
- WordPerfect
WordPerfect is a word processing application, now owned by Corel.Bruce Bastian, a Brigham Young University graduate student, and BYU computer science professor Dr. Alan Ashton joined forces to design a word processing system for the city of Orem's Data General Corp. minicomputer system in 1979...
(.wpd, .wp, .wp7, .doc) (Note: possible confusion with Word format extension)
See also
- List of file formats
- List of document markup languages
- Comparison of document markup languages
The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information.-General information:...
- Open format
An open file format is a published specification for storing digital data, usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implementable by both proprietary and free and open source software, using the typical...
External links