Document file format
Encyclopedia
A document file format is a text
Text file
A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...

 or binary
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

 file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

 format for storing document
Document
The term document has multiple meanings in ordinary language and in scholarship. WordNet 3.1. lists four meanings :* document, written document, papers...

s on a storage media
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....

, especially for use by computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

s.
There currently exist a multitude of incompatible document file formats.

A rough consensus has been established that XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 is to be the basis for future document file formats. Examples of XML-based open
Open format
An open file format is a published specification for storing digital data, usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implementable by both proprietary and free and open source software, using the typical...

 standards are DocBook
DocBook
DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....

, XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....

, and, more recently, the ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 standards OpenDocument
OpenDocument
The Open Document Format for Office Applications is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents....

 (ISO 26300:2006) and Office Open XML (ISO 29500:2008).

In 1993, the ITU-T
ITU-T
The ITU Telecommunication Standardization Sector is one of the three sectors of the International Telecommunication Union ; it coordinates standards for telecommunications....

 tried to establish a standard for document file formats, known as the Open Document Architecture
Open Document Architecture
The Open Document Architecture and interchange format is a free and open international standard document file format maintained by the ITU-T to replace all proprietary document file formats...

 (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.

Page description language
Page description language
A page description language is a language that describes the appearance of a printed page in a higher level than an actual output bitmap. An overlapping term is printer control language, but it should not be confused as referring solely to Hewlett-Packard's PCL...

s such as PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

 and PDF
Portable Document Format
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

 have become the de facto
De facto
De facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...

standard for documents that a typical user should only be able to create and read, not edit. In 2001, PDF became an international ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 standard (ISO 15930-1:2001, ISO 19005-1:2005, ISO 32000-1:2008).

HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 is the most used and open international standard and it is also used as document file format. It has also become ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 standard (ISO 15445:2000).

The default binary file format used by Microsoft Word
Microsoft Word
Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

 (.doc
DOC (computing)
In computing, DOC or doc is a filename extension for word processing documents; most commonly for Microsoft Word. Historically, the extension was used for documentation in plain-text format, particularly of programs or computer hardware, on a wide range of operating systems...

) has become widespread de facto
De facto
De facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...

standard for office documents, but it is a proprietary format and is not always fully supported by other word processors.

Common document file formats

  • ASCII
    ASCII
    The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

    , UTF-8
    UTF-8
    UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

     — plain text
    Plain text
    In computing, plain text is the contents of an ordinary sequential file readable as textual material without much processing, usually opposed to formatted text....

     formats
  • Amigaguide
    Amigaguide
    AmigaGuide is a hypertext document file format designed for the Amiga, files are stored in ASCII so it is possible to read and edit a file without the need for special software.Since Workbench 2.1 an Amiga Guide system for O.S...

  • .doc
    DOC (computing)
    In computing, DOC or doc is a filename extension for word processing documents; most commonly for Microsoft Word. Historically, the extension was used for documentation in plain-text format, particularly of programs or computer hardware, on a wide range of operating systems...

     for Microsoft Word
    Microsoft Word
    Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

     — Structural binary format developed by Microsoft (specifications available since 2008 under the Open Specification Promise
    Microsoft Open Specification Promise
    The Microsoft Open Specification Promise , is a promise by Microsoft, published in September 2006, to not assert legal rights over certain Microsoft patents on implementations of an included list of technologies....

    )
  • DjVu
    DjVu
    DjVu is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy...

     — file format designed primarily to store scanned documents
  • DocBook
    DocBook
    DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....

     — an XML format for technical documenation
  • HTML
    HTML
    HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

     (.html, .htm), (open standard, ISO from 2000), in combination with possible image files referred to.
  • FictionBook
    FictionBook
    FictionBook is an open XML-based e-book format, which originated and gained popularity in Russia. The FictionBook files have the .fb2 filename extension....

     (.fb2) — open XML-based e-book format
  • Office Open XML — .docx (XML-based standard for office documents, ISO standard from 2008)
  • OpenDocument
    OpenDocument
    The Open Document Format for Office Applications is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents....

     — .odt (XML-based standard for office documents, ISO standard from 2006)
  • OpenOffice.org XML
    OpenOffice.org XML
    OpenOffice.org XML is an open XML-based file format developed as an open community effort by Sun Microsystems and other OpenOffice.org project contributors in 2000-2002. The open-source software application suite OpenOffice.org 1.x and StarOffice 6 used the format as their native and default file...

     — .sxw (open, XML-based format for office documents)
  • OXPS — Open XML Paper Specification
  • PalmDoc — Common Handheld
    Personal digital assistant
    A personal digital assistant , also known as a palmtop computer, or personal data assistant, is a mobile device that functions as a personal information manager. Current PDAs often have the ability to connect to the Internet...

     document format
  • Plucker
    Plucker
    Plucker is an offline Web and free e-book reader for Palm OS based handheld devices, Windows Mobile devices and other PDAs. Plucker contains POSIX tools, scripts and "conduits" which work on Unix, Linux, Mac OS X, and Microsoft Windows...

     — Handheld navigable widely used document standard
  • .pages for Pages
    Pages
    Pages is a word processor and page layout application developed by Apple. It is part of the iWork productivity suite and runs on the Mac OS X & iOS operating systems. The first version of Pages was announced on January 11, 2005, and was released one month later. The most recent Macintosh version,...

  • PDF
    Portable Document Format
    Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

     — Open standard for documents exchange. ISO standards from 2001, 2005, 2008. It is readable on almost every platform with free or open source readers. Open source PDF creators are also available.
  • Rich Text Format (RTF)
    Rich Text Format
    The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

     — meta data format being developed by Microsoft since 1987 for Microsoft products and cross-platform
    Cross-platform
    In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

     document interchange
  • SYmbolic LinK (SYLK)
    SYmbolic LinK (SYLK)
    Symbolic Link is a Microsoft file format typically used to exchange data between applications, specifically spreadsheets. SYLK files conventionally have a .slk suffix. Composed of only displayable ANSI characters, it can be easily created and processed by other applications, such as...

  • TeX
    TeX
    TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

     — Popular open-source typesetting program and format. First successful mathematical notation language.
  • TEI
    Text Encoding Initiative
    The Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset....

     — XML format for digital publication
  • Troff
    Troff
    troff is a document processing system developed by AT&T for the Unix operating system.-History:troff can trace its origins back to a text formatting program called RUNOFF, written by Jerome H. Saltzer for MIT's CTSS operating system in the mid-1960s...

  • Uniform Office Format
    Uniform Office Format
    Uniform Office Format sometimes known as Unified Office Format is an open standard for 'office' applications developed in China. It includes word processing, presentation, and spreadsheet modules, and is made up of GUI, API, and format specifications...

     — Chinese standard
  • WordPerfect
    WordPerfect
    WordPerfect is a word processing application, now owned by Corel.Bruce Bastian, a Brigham Young University graduate student, and BYU computer science professor Dr. Alan Ashton joined forces to design a word processing system for the city of Orem's Data General Corp. minicomputer system in 1979...

     (.wpd, .wp, .wp7, .doc) (Note: possible confusion with Word format extension)

See also

  • List of file formats
  • List of document markup languages
  • Comparison of document markup languages
    Comparison of document markup languages
    The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information.-General information:...

  • Open format
    Open format
    An open file format is a published specification for storing digital data, usually maintained by a standards organization, which can therefore be used and implemented by anyone. For example, an open format can be implementable by both proprietary and free and open source software, using the typical...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK