Bencode
Encyclopedia
Bencode is the encoding used by the peer-to-peer
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 file sharing system BitTorrent for storing and transmitting loosely structured data.

It supports four different types of values:
  • byte strings
    String (computer science)
    In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

    ,
  • integer
    Integer
    The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

    s,
  • lists, and
  • dictionaries (associative arrays)
    Associative array
    In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

    .


Bencoding is most commonly used in torrent file
Torrent file
A torrent file stores metadata used for BitTorrent. It is defined in the BitTorrent specification.Simply, a torrent is data about a target file, though it contains no information about the content of the file. The only data that the torrent holds is information about the location of different...

s. These metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 files are simply bencoded dictionaries.

While less efficient than a pure binary encoding, bencoding is simple and (because numbers are encoded in decimal notation) is unaffected by endianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

, which is important for a cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

 application like BitTorrent. It is also fairly flexible, as long as applications ignore unexpected dictionary keys, so that new ones can be added without creating incompatibilities.

Encoding algorithm

Bencode uses ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 characters as delimiters and digits.
  • An integer is encoded as ie. Leading zeros are not allowed (although the number zero is still represented as "0"). Negative values are encoded by prefixing the number with a minus sign. The number 42 would thus be encoded as "i42e", 0 as "i0e", and -42 as "i-42e". Negative zero is not permitted.
  • A byte string (a sequence of byte
    Byte
    The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

    s, not necessarily characters) is encoded as :. (This is similar to netstrings, but without the final comma.) The length is encoded in base 10, like integers, but must be non-negative (zero is allowed); the contents are just the bytes that make up the string. The string "spam" would be encoded as "4:spam". The specification does not deal with encoding
    Character encoding
    A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

     of characters outside the ASCII set; to mitigate this, some BitTorrent applications explicitly communicate the encoding (most commonly UTF-8
    UTF-8
    UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

    ) in various non-standard ways.
  • A list of values is encoded as le . The contents consist of the bencoded elements of the list, in order, concatenated. A list consisting of the string "spam" and the number 42 would be encoded as: "l4:spami42ee". Note the absence of separators between elements.
  • A dictionary is encoded as de. The elements of the dictionary are encoded each key immediately followed by its value. All keys must be byte strings and must appear in lexicographical order
    Lexicographical order
    In mathematics, the lexicographic or lexicographical order, , is a generalization of the way the alphabetical order of words is based on the alphabetical order of letters.-Definition:Given two partially ordered sets A and B, the lexicographical order on...

    . A dictionary that associates the values 42 and "spam" with the keys "foo" and "bar", respectively, would be encoded as follows: "d3:bar4:spam3:fooi42ee". (This might be easier to read by inserting some spaces: .)

There are no restrictions on what kind of values may be stored in lists and dictionaries; they may (and usually do) contain other lists and dictionaries. This allows for arbitrarily complex data structures to be encoded.

Features

  • For each possible (complex) value, there is only a single valid bencoding; i.e. there is a bijection
    Bijection
    A bijection is a function giving an exact pairing of the elements of two sets. A bijection from the set X to the set Y has an inverse function from Y to X. If X and Y are finite sets, then the existence of a bijection means they have the same number of elements...

     between values and their encodings. This has the advantage that applications may compare bencoded values by comparing their encoded forms, eliminating the need to decode the values.
  • Many encodings can be decoded manually, but since the bencoded values often contain binary data
    Binary file
    A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

    , and may become quite complex, it is generally not considered a human-readable
    Human-readable
    A human-readable medium or human-readable format is a representation of data or information that can be naturally read by humans.In computing, human-readable data is often encoded as ASCII or Unicode text, rather than presented in a binary representation...

     encoding.
  • Bencoding serves similar purposes as data languages like JSON
    JSON
    JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

     and YAML
    YAML
    YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...

    , allowing complex yet loosely structured data to be stored in a platform independent way.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK