All Topics  
Comma-separated values

 

   Email Print
   Bookmark   Link






 

Comma-separated values



 
 
A Comma separated values (CSV) file is a computer data file used for implementing the tried and true organizational tool, the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form, where each associated item (member) in a group is in association with others also separated by the commas of its set. Each line in the CSV file corresponds to a row in the table.






Discussion
Ask a question about 'Comma-separated values'
Start a new discussion about 'Comma-separated values'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A Comma separated values (CSV) file is a computer data file used for implementing the tried and true organizational tool, the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form, where each associated item (member) in a group is in association with others also separated by the commas of its set. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas, each field belonging to one table column. CSV files are often used for moving tabular data between two different computer programs, for example between a database program and a spreadsheet program.

Technical background

A file format
File format

A file format is a particular way to encode information for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa....
 is a particular way to encode information for storage in a computer file. Particularly, files encoded using the CSV format are used to store tabular data
Flat file database

A flat file database describes any of various means to encode a database model as a plain text file....
. The format dates back to the early days of business computing, and is widely used to pass data between computers with different internal word sizes, data formatting needs and so forth. For this reason, CSV files are common on all computer platforms.

CSV is one implementation of a delimited text file, which uses a comma
Comma (punctuation)

The comma is a punctuation mark. It has the same shape as an apostrophe or single closing quotation mark in many typefaces, but it differs from them in being placed on the baseline of the text....
 to separate values (where many implementations of CSV import/export tools allow an alternate separator to be used). However CSV differs from other delimiter separated file formats in using a " (double quote) character around fields that contain reserved characters (such as commas or newline
Newline

In computing, a newline is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line?that is, on the next line below the text, immediately proceeding the newline....
s). Most other delimiter formats either use an escape character
Escape character

In computing and telecommunication, an escape character is a single character which in a sequence of characters signifies that what is to follow takes an alternative interpretation....
 such as a backslash
Backslash

The backslash is a typographical mark used chiefly in computing. It was first introduced to computers in 1960 by Bob Bemer. Sometimes called a reverse solidus or an oblique, it is the mirror image of the common slash ....
, or have no support for reserved characters. The benefit of CSV is that they allow for the transfer of data across different applications.

In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
 terms, this type of format is called a "flat file" because only one table can be stored in a CSV file. Most systems use a series of tables to store their information, which must be "flattened" into a single table, often with information repeated over several rows, to create a text file.

History

Comma-separated value lists are very old technology and predate personal computers by more than a decade; the IBM Fortran (level G) compiler under OS/360 supported these in 1967, and they were not a new idea at the time. Comma-separated value lists were often easier to type into punched cards than fixed-column-aligned data, and were less prone to producing incorrect results if a value was punched one-column-off from its intended location (an easy mistake to make).

The comma separated list (CSL) is a data
DATA

Debt, AIDS, Trade in Africa is a multinational Non-governmental organization founded in January 2002 in London by U2's Bono along with Robert Sargent Shriver III and activists from the Jubilee 2000 Drop the Debt campaign....
 format
Format

:For help on formatting Wikipedia articles, see...
 originally known as comma-separated values (CSV) in the oldest days of simple computers. In the personal computer industry (then more commonly known as a "Home Computer
Home computer

A home computer was a class of personal computer entering the market in 1977 and becoming common during the 1980s. They were marketed to consumers as accessible personal computers, more capable than video game consoles....
"), the early most common use was by small businesses for generating solicitations using boilerplate
Boilerplate (text)

Boilerplate is any text that is or can be reused in new contexts or applications without being changed much from the original. Many computer programmers often use the term #Boilerplate code....
 form letter
Form letter

A form letter is a Letter written from a template, rather than being specifically composed for a specific recipient. The most general kind of form letter consists of one or more regions of Boilerplate interspersed with one or more substitution placeholders....
s, via mailing list
Mailing list

A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the list"....
s.

Some early software applications such as word processor
Word processor

A word processor is a computer Application software used for the production of any sort of printable material.Word processor may also refer to an obsolete type of stand-alone office machine, popular in the 1970s and 80s, combining the keyboard text-entry and printing functions of an electric typewriter with a dedicated computer for th...
s, allowed a stream of "variable data" to be merged between two files: a form letter, and a CSL of names, addresses, and other data fields, and still do, simply because tasks requiring human input (construction of lists) is natural and easy using comma separation delimiting. CSL/CSVs were also used to exchange data between desktop computers of different architectures, and for simple database
Database

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model....
 uses.

Specification

Comma separated lists date from before the earliest personal computers, but were widely used in the earliest pre-IBM PC
IBM PC

The IBM Personal Computer, commonly known as the IBM PC, is the original version and progenitor of the IBM PC compatible hardware platform ....
 era personal computers for tape storage backup and interchange of database information from machines of two different architectures. In that day, affordable hard drives did not exist, and many small businesses tried to achieve the benefits of computing using floppy disk based software.

No general standard specification for CSV exists. Variations between CSV implementations in different programs are quite common and can lead to interoperation difficulties. For Internet communication of CSV files, an Informational IETF document (RFC 4180 from October 2005) describes the format for the "text/csv" MIME type registered with the IANA
Internet Assigned Numbers Authority

The Internet Assigned Numbers Authority is the entity that oversees global IP address, root nameserver for the Domain Name System , Internet media type, and other Internet protocol assignments....
. Another relevant specification is provided by Fielded Text
Fielded text

Fielded Text is a proposed standard which provides structure and schema definition to text files which contain tables of values . The standard allows the format and structure of the data within the text file to be specified by a Meta file....
 which also covers the CSV format.

Many informal documents exist that describe the CSV format. provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.

The basic rules from a lot of these specifications are as follows:

CSV is a delimited
Delimited

Formats that use delimiter-separated values store two-dimensional arrays of data by separating the values in each row with specific delimiter character ....
 data format that has fields/columns
Field (computer science)

In computer science, data that has several parts can be divided into fields. For example, a computer may represent today's date as three distinct fields: the day, the month and the year....
 separated by the comma
Comma (punctuation)

The comma is a punctuation mark. It has the same shape as an apostrophe or single closing quotation mark in many typefaces, but it differs from them in being placed on the baseline of the text....
 character
Grapheme

In typography, a grapheme is the fundamental unit in writing systems. Graphemes include letter , Chinese characters, numerals, punctuation marks, and all the individual symbols of any of the world's writing systems....
 and records/rows
Row (database)

In the context of a relational database, a row?also called a record or tuple?represents a single, implicitly structured data item in a table ....
 separated by newline
Newline

In computing, a newline is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line?that is, on the next line below the text, immediately proceeding the newline....
s. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. However, if a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped
Escape character

In computing and telecommunication, an escape character is a single character which in a sequence of characters signifies that what is to follow takes an alternative interpretation....
 by placing another double quote character next to it. The CSV file format does not require a specific character encoding
Character encoding

A character encoding system consists of a code that pairs a sequence of character from a given character set with something else, such as a sequence of natural numbers, octet or electrical pulses, in order to facilitate the transmission of data through telecommunication networks and/or Computer data storage of Character in compute...
, byte order, or line terminator format.

  • Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded.
  • Fields are separated by commas (although in locales where the comma is used as a decimal point, the semicolon is used instead as a delimiter)
1997,Ford,E350
  • In some CSV implementations, leading and trailing spaces or tabs, adjacent to commas, are trimmed. This practice is contentious and in fact is specifically prohibited by RFC 4180, which states, "Spaces are considered part of a field and should not be ignored."
1997, Ford , E350 same as 1997,Ford,E350
  • Fields with embedded commas must be enclosed within double-quote characters.
1997,Ford,E350,"Super, luxurious truck"
  • Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super ""luxurious"" truck"
  • Fields with embedded line breaks must be enclosed within double-quote characters.
1997,Ford,E350,"Go get one now they are going fast"
  • Fields with leading or trailing spaces must be enclosed within double-quote characters. (See comment about leading and trailing spaces above.)
1997,Ford,E350," Super luxurious truck "
  • Fields may always be enclosed within double-quote characters, whether necessary or not.
"1997","Ford","E350"
  • The first record in a csv file may contain column names in each of the fields.
Year,Make,Model 1997,Ford,E350 2000,Mercury,Cougar

Example

1997FordE350ac, abs, moon3000.00
1999ChevyVenture "Extended Edition" 4900.00
1996JeepGrand CherokeeMUST SELL!
air, moon roof, loaded
4799.00


The above table of data may be represented in CSV format as follows:

1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

This CSV example illustrates that:

  • fields that contain commas, double-quotes, or line-breaks must be quoted,
  • a quote within a field must be escaped with an additional quote immediately preceding the literal quote,
  • space before and after delimiter commas may be trimmed (which is prohibited by RFC 4180), and
  • a line break within an element must be preserved.


Application support


The CSV file format is very simple and supported by almost all spreadsheet
Spreadsheet

A spreadsheet is a computer application that simulates a paper worksheet. It displays multiple cells that together make up a grid consisting of rows and columns, each cell containing either alphanumeric text or numeric values....
s and database management system
Database management system

A database management system is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model....
s. Many programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
s have libraries available that support CSV files. Even modern software applications support CSV imports and/or exports because the format is so widely recognized. In fact, many applications allow .csv-named files to use any delimiter character.

See also

  • Delimiter-separated values
  • Fielded text
    Fielded text

    Fielded Text is a proposed standard which provides structure and schema definition to text files which contain tables of values . The standard allows the format and structure of the data within the text file to be specified by a Meta file....


External links


  • RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files