Document-oriented database
Encyclopedia
A document-oriented database is a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 designed for storing, retrieving, and managing document-oriented, or semi structured data
Semi-structured model
The semi-structured model is a database model. In this model, there is no separation between the data and the schema, and the amount of structure used depends on the purpose.The advantages of this model are the following:...

, information. Document-oriented databases are one of the main categories of so-called NoSQL
Nosql
In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...

 databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL
Nosql
In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...

 itself.

Documents

The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

, YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...

, JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

 and BSON
BSON
BSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays...

, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).

Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".


Another document could be:
FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].


Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it doesn't require explicitly stating if other pieces of information are left out.

Keys

Documents are addressed in the database via a unique key that represents that document. Often, this key is a simple string. In some cases, this string is a URI or path. Regardless, you can use this key to retrieve the document from the database. Typically, the database retains an index on the key such that document retrieval is fast.

Retrieval

One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow you to retrieve documents based on their contents. For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next.

Organization

Implementations offer a variety of ways of organizing documents, including notions of
  • Collections
  • Tags
  • Non-visible Metadata
  • DIrectory hierarchies

Implementations

Name Publisher License Language Notes RESTful
Representational State Transfer
Representational state transfer is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term representational state transfer was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation...

 API
Lotus Notes
Lotus Notes
Lotus Notes is the client of a collaborative platform originally created by Lotus Development Corp. in 1989. In 1995 Lotus was acquired by IBM and became known as the Lotus Development division of IBM and is now part of the IBM Software Group...

IBM Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

(unknown)
askSam askSam Systems Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

(unknown)
Apstrata Apstrata Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

(unknown)
Datawasp Significant Data Systems Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

(unknown)
Clusterpoint
Clusterpoint
Clusterpoint is a high-performance, schema-free, document-oriented database server written in the C++ programming language. It manages collections of XML documents that are stored in native XML data format. It allows many applications to store data in a natural human-readable way that matches...

Clusterpoint Ltd. Free community license / Commercial
Commercial software
Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software....

C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

Scalable, high-performance, schema-free, document-oriented database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

 platform with server based data storage, fast full text search
Full text search
In text retrieval, full text search refers to techniques for searching a single computer-stored document or a collection in a full text database...

 engine functionality, information ranking
Ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....

 for search revelevance and clustering
Cluster Computing
Cluster Computing: the Journal of Networks, Software Tools and Applications is a journal for parallel processing, distributed computing systems, and computer communication networks....

.
Yes
CRX Day Software Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

(unknown)
MUMPS
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...

 Database
Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

 and GNU Affero GPL
MUMPS
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...

Commonly used in health applications. (unknown)
UniVerse
Universe
The Universe is commonly defined as the totality of everything that exists, including all matter and energy, the planets, stars, galaxies, and the contents of intergalactic space. Definitions and usage vary and similar terms include the cosmos, the world and nature...

Rocket Software Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

Yes (Beta)
UniData Rocket Software Proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

Yes (Beta)
Jackrabbit
Apache Jackrabbit
Apache Jackrabbit is an open source content repository for the Java platform. The Jackrabbit project was started on August 28, 2004, when Day Software licensed an initial implementation of the Java Content Repository API . Jackrabbit was also used as the reference implementation of JSR-170,...

Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

(unknown)
CouchDB
CouchDB
Apache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices...

Couchbase
Couchbase
Couchbase is a Silicon Valley-based enterprise software company which offers product "Couchbase ", plus sells support & training for these...

, Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

Erlang JSON over REST/HTTP with Multi-Version Concurrency Control and ACID
ACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...

 properties. Uses map
Map (higher-order function)
In many programming languages, map is the name of a higher-order function that applies a given function to each element of a list, returning a list of results. They are examples of both catamorphisms and anamorphisms...

 and reduce
Fold (higher-order function)
In functional programming, fold – also known variously as reduce, accumulate, compress, or inject – are a family of higher-order functions that analyze a recursive data structure and recombine through use of a given combining operation the results of recursively processing its...

 for views and queries.
Yes (there is only RESTful API)
FleetDB FleetDB MIT License
MIT License
The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...

Clojure
Clojure
Clojure |closure]]") is a recent dialect of the Lisp programming language created by Rich Hickey. It is a general-purpose language supporting interactive development that encourages a functional programming style, and simplifies multithreaded programming....

A JSON-based schema-free database optimized for agile development. (unknown)
MongoDB
MongoDB
MongoDB is an open source, high-performance, schema-free, document-oriented database written in the C++ programming language...

10gen, Inc GNU AGPL v3.0
Affero General Public License
The Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:...

C, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, Erlang
Erlang
Erlang may refer to:* Agner Krarup Erlang , a mathematician and engineer after whom several concepts are named** Erlang , a unit to measure traffic in telecommunications or other domains...

, Haskell, Java
Java
Java is an island of Indonesia. With a population of 135 million , it is the world's most populous island, and one of the most densely populated regions in the world. It is home to 60% of Indonesia's population. The Indonesian capital city, Jakarta, is in west Java...

, Javascript, .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

 (C# F#, PowerShell, etc), Perl, PHP, Python, Ruby, Scala
Fast, document-oriented database optimized for highly transient data. Optional using external tools
GemFire Enterprise http://www.vmware.com/products/vfabric-gemfire VMWare Commercial
Commercial software
Commercial software, or less commonly, payware, is computer software that is produced for sale or that serves commercial purposes.Commercial software is most often proprietary software, but free software packages may also be commercial software....

Java, .NET, Memory-oriented, fast, key-value database with indexing and querying support. Yes
OrientDB
OrientDB
OrientDB is an open source NoSQL database management system written in Java. Even if it is a document-based database, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security...

Orient Technologies Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

JSON over HTTP Yes
RavenDB RavenDB commercial or GNU AGPL v3.0
Affero General Public License
The Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:...

.NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

A .NET LINQ-enabled Document Database, focused on providing high performance, transactional, schema-less, flexible and scalable NoSQL data store for the .NET and Windows platforms. Yes
Redis
Redis
Redis is used to refer to Romani people.Redis may also refer to:* Redis , an advanced key-value store...

BSD License ANSI C
ANSI C
ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first...

Key-value store supporting lists and sets with fast, simple and binary-safe protocol. (unknown)
StrokeDB http://strokedb.com/ MIT License
MIT License
The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...

Alpha software. (unknown)
Terrastore
Terrastore
Terrastore is a distributed, scalable and consistent document store supporting single-cluster and multi-cluster deployments.It provides advanced scalability support and elasticity feature without loosening the consistency at data level...

Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

JSON/HTTP (unknown)
ThruDB BSD License C++, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

Built on top of Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Alternate implementation is being developed in Java. Alpha software. (unknown)
Persevere Persevere BSD License A JSON database and JavaScript Application Server. Provides RESTful JSON interface for Create, read, update, and delete access to data. Also supports JSONQuery/JSONPath querying. Yes
DBSlayer DBSlayer Apache License
Apache License
The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....

C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

database abstraction layer
Database abstraction layer
A database abstraction layer is an application programming interface which unifies the communication between a computer application and databases such as SQL Server, DB2, MySQL, PostgreSQL, Oracle or SQLite...

 (over MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

) used by the New York Times. JSON over HTTP.
(unknown)
Eloquera DB Eloquera Proprietary .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

High performance. Based on Dynamic objects. Supports LINQ, SQL queries. (unknown)


XML database implementations

All XML databases are document-oriented databases.

See also

  • Internet Message Access Protocol
    Internet Message Access Protocol
    Internet message access protocol is one of the two most prevalent Internet standard protocols for e-mail retrieval, the other being the Post Office Protocol...

     (IMAP)
  • Database theory
    Database theory
    Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems....

  • In-memory database
  • NoSQL
    Nosql
    In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...

  • Object database
    Object database
    An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...

  • Online database
    Online database
    An online database is a database accessible from a network, including from the Internet.It differs from a local database, held in an individual computer or its attached storage, such as a CD....

  • Real time database
    Real time database
    A real-time database is a processing system designed to handle workloads whose state is constantly changing . This differs from traditional databases containing persistent data, mostly unaffected by time. For example, a stock market changes very rapidly and is dynamic...

  • Relational database
    Relational database
    A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

  • Data hierarchy
    Data hierarchy
    Data Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on....


Further reading



External links

  • http://solprovider.com/articles/20020612&cat=Lotus/IBM
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK