Column (data store)
Encyclopedia
A column of a distributed data store
Distributed data store
A distributed data store is a blurred concept and means either a distributed database where users store their information on a number of nodes, or a network in which a user stores their information on a number of peer network nodes ....

 is a NoSQL object of the lowest level in a keyspace. It is a tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...

 (a key-value pair) consisting of three elements:
  • Unique name: Used to reference the column
  • Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
  • Timestamp
    Timestamp
    A timestamp is a sequence of characters, denoting the date or time at which a certain event occurred. A timestamp is the time at which an event is recorded by a computer, not the time of the event itself...

    : The system timestamp used to determine the valid content.

Usage

The column is used as a store for the value and has a timestamp that is used to differentiate the valid content from stale ones. According to the CAP theorem
CAP theorem
In theoretical computer science the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:...

, distributed data stores cannot guarantee consistency, as availability
Availability
In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

 is a more important issue. Therefore, the data store or the application programmer will use the timestamp to find out which of the stored values in the backup nodes are up-to-date.

Some data stores, like Apache Cassandra 0.7, may use the more sophisticated vector clock instead of the timestamp to resolve stale information.

Differences to a relational database

In relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

s, a column is a part of a relational table that can be seen in each row of the table. This is not the case in distributed data stores, where the concept of a table only vaguely exists. A column can be part of a ColumnFamily that resembles at most a relational row, but it may appear in one row and not in the others. Also, the number of columns may change from row to row, and new updates to the data store model may also modify the column number. So, all the work of keeping up with changes relies on the application programmer.

Examples

In JSON-like
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

notation, three column definitions are given:


street: {name: "street", value: "1234 x street", timestamp: 123456789},
city: {name: "city", value: "san francisco", timestamp: 123456789},
zip: {name: "zip", value: "94107", timestamp: 123456789},
}

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK