All Topics  
Git (software)

 

   Email Print
   Bookmark   Link






 

Git (software)



 
 
Git is a free
Free software

Free Software or software libre is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with minimal restrictions only to ensure that further recipients can also do these things and to prevent consumer-facing hardware...
 distributed revision control
Distributed revision control

Distributed revision control is a fairly recent innovation in Computer software revision control. It provides some significant advantages over the more traditional centralized approach to revision control, and it has some defining characteristics that separate it from centralized systems....
, or software source code
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 management project with an emphasis on being fast. Git was initially created by Linus Torvalds
Linus Torvalds

Linus Benedict Torvalds is a Finland software engineering best known for having initiated the development of the Linux kernel. He later became the chief architect of the Linux kernel, and now acts as the project's coordinator....
 for Linux kernel
Linux kernel

The Linux kernel is an operating system kernel used by a family of Unix-like operating systems. The term Linux distribution is used to refer to the various operating systems that run on top of the Linux Kernel....
 development.

Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.

Several high-profile software projects now use Git for revision control, most notably the Linux kernel
Linux kernel

The Linux kernel is an operating system kernel used by a family of Unix-like operating systems. The term Linux distribution is used to refer to the various operating systems that run on top of the Linux Kernel....
, Perl
Perl

In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
, Samba
Samba (software)

Samba is a free software re-implementation of Server Message Block Computer networking protocol , originally developed by Australian Andrew Tridgell....
, X.org Server
X.Org Server

The X.Org Server is the X server in the official reference implementation of the X Window System. The current stable release is 1.6.0, which is part of X11R7.5, and was released on 25 February 2009....
, Qt (toolkit)
Qt (toolkit)

Qt is a cross-platform application development framework, widely used for the development of graphical user interface programs , and also used for developing non-GUI programs such as console tools and servers....
, One Laptop per Child (OLPC) core development, VLC
VLC

VLC may refer to:* Valencia Airport , an airport in Spain* Variable-length code, in coding theory, where each symbol is encoded to a variable number of bits...
, Wine
Wine (software)

Wine is a free software software application that aims to allow Unix-like computer operating systems on the x86 architecture or x86-64 architecture to execute programs written for Microsoft Windows....
, SWI Prolog, GStreamer
GStreamer

GStreamer is a Pipeline based multimedia framework written in the C with the type system based on GObject. GStreamer allows a programmer to create a variety of media-handling components, including simple Audio frequency playback, audio and video...
, and the Android
Android (mobile device platform)

Android is a platform and operating system for mobile devices, based on the Linux kernel, developed by Google and later the Open Handset Alliance....
 mobile platform.

Git's current software maintenance
Software maintenance

Software maintenance in software engineering is the modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment ....
 is overseen by Junio Hamano.

lass="link1" onMouseover='showByLink("m4067616",this)' onMouseout='hide("m4067616")'href="http://www.absoluteastronomy.com/topics/Linus_Torvalds">Linus Torvalds
Linus Torvalds

Linus Benedict Torvalds is a Finland software engineering best known for having initiated the development of the Linux kernel. He later became the chief architect of the Linux kernel, and now acts as the project's coordinator....
 has quipped about the name “git”, which is British English
British English

British English or UK English is the broad term used to distinguish the forms of the English language used in the United Kingdom from forms used elsewhere....
 slang for a stupid or unpleasant person: This self-deprecating
Self-deprecation

Self-deprecation is communication that expresses something negative about its originator, without this being called for in context by some other person....
 humor is tongue-in-cheek
Tongue-in-cheek

Tongue-in-cheek is a term used to refer to humor in which a statement, or an entire fictional work, is not meant to be taken seriously, but its lack of seriousness is subtle....
, as Torvalds was actually pressured into naming Linux after himself (see History of Linux
History of Linux

The Linux kernel has been marked by constant growth throughout its history. Since the initial release of its source code in 1991, it has grown from a small number of C Programming Language files under a license prohibiting commercial distribution to its state in 2008 of over 340 megabytes of source under the GNU General Public License...
).

The official Git wiki also gives a number of alternative explanations for the name, including "Global Information Tracker".

Characteristics
Git's design was inspired by BitKeeper
BitKeeper

BitKeeper is a software tool for distributed revision control of computer source code. A sophisticated distributed system, BitKeeper competes largely against other professional systems such as Rational ClearCase and Perforce....
 and Monotone
Monotone (software)

Monotone is an open source software tool for distributed revision control. Monotone tracks revisions to files, groups sets of revisions into changesets, and tracks history across renames....
.






Discussion
Ask a question about 'Git (software)'
Start a new discussion about 'Git (software)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Git is a free
Free software

Free Software or software libre is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with minimal restrictions only to ensure that further recipients can also do these things and to prevent consumer-facing hardware...
 distributed revision control
Distributed revision control

Distributed revision control is a fairly recent innovation in Computer software revision control. It provides some significant advantages over the more traditional centralized approach to revision control, and it has some defining characteristics that separate it from centralized systems....
, or software source code
Source code

In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
 management project with an emphasis on being fast. Git was initially created by Linus Torvalds
Linus Torvalds

Linus Benedict Torvalds is a Finland software engineering best known for having initiated the development of the Linux kernel. He later became the chief architect of the Linux kernel, and now acts as the project's coordinator....
 for Linux kernel
Linux kernel

The Linux kernel is an operating system kernel used by a family of Unix-like operating systems. The term Linux distribution is used to refer to the various operating systems that run on top of the Linux Kernel....
 development.

Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.

Several high-profile software projects now use Git for revision control, most notably the Linux kernel
Linux kernel

The Linux kernel is an operating system kernel used by a family of Unix-like operating systems. The term Linux distribution is used to refer to the various operating systems that run on top of the Linux Kernel....
, Perl
Perl

In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
, Samba
Samba (software)

Samba is a free software re-implementation of Server Message Block Computer networking protocol , originally developed by Australian Andrew Tridgell....
, X.org Server
X.Org Server

The X.Org Server is the X server in the official reference implementation of the X Window System. The current stable release is 1.6.0, which is part of X11R7.5, and was released on 25 February 2009....
, Qt (toolkit)
Qt (toolkit)

Qt is a cross-platform application development framework, widely used for the development of graphical user interface programs , and also used for developing non-GUI programs such as console tools and servers....
, One Laptop per Child (OLPC) core development, VLC
VLC

VLC may refer to:* Valencia Airport , an airport in Spain* Variable-length code, in coding theory, where each symbol is encoded to a variable number of bits...
, Wine
Wine (software)

Wine is a free software software application that aims to allow Unix-like computer operating systems on the x86 architecture or x86-64 architecture to execute programs written for Microsoft Windows....
, SWI Prolog, GStreamer
GStreamer

GStreamer is a Pipeline based multimedia framework written in the C with the type system based on GObject. GStreamer allows a programmer to create a variety of media-handling components, including simple Audio frequency playback, audio and video...
, and the Android
Android (mobile device platform)

Android is a platform and operating system for mobile devices, based on the Linux kernel, developed by Google and later the Open Handset Alliance....
 mobile platform.

Git's current software maintenance
Software maintenance

Software maintenance in software engineering is the modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment ....
 is overseen by Junio Hamano.

Name

Linus Torvalds
Linus Torvalds

Linus Benedict Torvalds is a Finland software engineering best known for having initiated the development of the Linux kernel. He later became the chief architect of the Linux kernel, and now acts as the project's coordinator....
 has quipped about the name “git”, which is British English
British English

British English or UK English is the broad term used to distinguish the forms of the English language used in the United Kingdom from forms used elsewhere....
 slang for a stupid or unpleasant person: This self-deprecating
Self-deprecation

Self-deprecation is communication that expresses something negative about its originator, without this being called for in context by some other person....
 humor is tongue-in-cheek
Tongue-in-cheek

Tongue-in-cheek is a term used to refer to humor in which a statement, or an entire fictional work, is not meant to be taken seriously, but its lack of seriousness is subtle....
, as Torvalds was actually pressured into naming Linux after himself (see History of Linux
History of Linux

The Linux kernel has been marked by constant growth throughout its history. Since the initial release of its source code in 1991, it has grown from a small number of C Programming Language files under a license prohibiting commercial distribution to its state in 2008 of over 340 megabytes of source under the GNU General Public License...
).

The official Git wiki also gives a number of alternative explanations for the name, including "Global Information Tracker".

Characteristics


Git's design was inspired by BitKeeper
BitKeeper

BitKeeper is a software tool for distributed revision control of computer source code. A sophisticated distributed system, BitKeeper competes largely against other professional systems such as Rational ClearCase and Perforce....
 and Monotone
Monotone (software)

Monotone is an open source software tool for distributed revision control. Monotone tracks revisions to files, groups sets of revisions into changesets, and tracks history across renames....
. Git was originally designed only as a low-level engine that others could use to write front ends such as Cogito
Cogito (software)

Cogito is a revision control system layered on top of Git . It is historically the first Git frontend, which appeared in April 2005, just days after Git itself....
 or StGIT. However, the core Git project has since become a complete revision control system that is usable directly.

Git's design is a synthesis of Torvalds's experience maintaining a large distributed development project, his intimate knowledge of file system performance, and an urgent need to produce a working system in short order. (See the history section for details.) These influences led to the following implementation choices:

  • Strong support for non-linear development. Git supports rapid branching and merging, and includes specific tools for visualizing and navigating a non-linear development history. A core assumption in Git is that a change will be merged more often than it is written, as it is passed around various reviewers.
  • Distributed development. Like Darcs
    Darcs

    Darcs is a distributed revision control system by David Roundy that was designed to replace traditional centralized source control systems such as Concurrent Versions System and Subversion ....
    , BitKeeper
    BitKeeper

    BitKeeper is a software tool for distributed revision control of computer source code. A sophisticated distributed system, BitKeeper competes largely against other professional systems such as Rational ClearCase and Perforce....
    , Mercurial
    Mercurial (software)

    Mercurial is a cross-platform, distributed revision control tool for software developers. It is mainly implemented using the Python , but includes a binary diff implementation written in C ....
    , SVK
    SVK

    SVK is a decentralized version control system written in Perl, with a hierarchical distributed design comparable to centralized deployment of BitKeeper and GNU arch....
    , Bazaar and Monotone
    Monotone (software)

    Monotone is an open source software tool for distributed revision control. Monotone tracks revisions to files, groups sets of revisions into changesets, and tracks history across renames....
    , Git gives each developer a local copy of the entire development history, and changes are copied from one such repository to another. These changes are imported as additional development branches, and can be merged in the same way as a locally developed branch.
  • Repositories can be published via HTTP
    Hypertext Transfer Protocol

    Hypertext Transfer Protocol is an application-level protocol for distributed, collaborative, hypermedia information systems. Its use for retrieving inter-linked resources led to the establishment of the World Wide Web....
    , FTP
    File Transfer Protocol

    File Transfer Protocol is a network protocol used to transfer data from one computer to another through a network such as the Internet.FTP is a file transfer protocol for exchanging and manipulating files over a Transmission Control Protocol computer network....
    , rsync
    Rsync

    rsync is a software application for Unix systems which synchronizes computer files and directory from one location to another while minimizing data transfer using delta encoding when appropriate....
    , or a Git protocol over either a plain socket or ssh
    Secure Shell

    Secure Shell or SSH is a network protocol that allows data to be exchanged using a secure channel between two networked devices. Used primarily on Linux and Unix based systems to access shell accounts, SSH was designed as a replacement for TELNET and other Computer security remote Shell s, which send information, notably passwords, in...
    . Git also has a CVS server emulation, which enables the use of existing CVS clients and IDE plugins to access Git repositories.
  • Subversion and svk repositories can be used directly with git-svn.
  • Efficient handling of large projects. Torvalds has described Git as being very fast and scalable, and performance tests done by Mozilla
    Mozilla

    Mozilla was the official, public, original name of Mozilla Application Suite by the Mozilla Foundation, currently known as SeaMonkey internet suite....
     showed it was an order of magnitude
    Order of magnitude

    An order of magnitude is the class of scale or magnitude of any amount, where each class contains values of a fixed Geometric progression to the class preceding it....
     faster than other revision control systems, and two orders of magnitude faster on some operations.
  • Cryptographic authentication of history. The Git history is stored in such a way that the name of a particular revision (a "commit" in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed. (Mercurial
    Mercurial (software)

    Mercurial is a cross-platform, distributed revision control tool for software developers. It is mainly implemented using the Python , but includes a binary diff implementation written in C ....
     and Monotone
    Monotone (software)

    Monotone is an open source software tool for distributed revision control. Monotone tracks revisions to files, groups sets of revisions into changesets, and tracks history across renames....
     also have this property.)
  • Toolkit design. Git was designed as a set of programs written in C
    C (programming language)

    C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
    , and a number of shell scripts that provide wrappers around those programs. Although most of those scripts have been rewritten in C as part of an ongoing effort to port it to Microsoft Windows, the design remains, and it is easy to chain the components together to do other clever things.
  • Pluggable merge strategies. As part of its toolkit design, Git has a well-defined model of an incomplete merge, and it has multiple algorithms for completing it, culminating in telling the user that it is unable to complete the merge automatically and manual editing is required.
  • Garbage
    Garbage (computer science)

    Garbage, in the context of computer science, refers to object s, data, or other regions of the memory of a computer system , which will not be used in any future computation by the system, or by a program running on it....
     accumulates unless collected. Aborting operations or backing out changes will leave useless dangling objects in the database. These are generally a small fraction of the continuously growing history of wanted objects, but reclaiming the space
    Garbage collection (computer science)

    In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage , or memory used by Object that will never be accessed or mutated again by the Application software....
     using git-gc --prune can be slow.


One property of Git is that it snapshots directory trees of files. The earliest systems for tracking versions of source code, SCCS
Source Code Control System

Source Code Control System was the first source code revision control system. It was originally developed at Bell Labs in 1972 by Marc J. Rochkind for an IBM System/370 computer running MVT....
 and RCS
Revision Control System

The Revision Control System is a software implementation of revision control that automates the storing, retrieval, logging, identification, and merging of revisions....
, worked on individual files and emphasized the space savings to be gained from delta encoding
Delta encoding

Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files.Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required ....
 the (mostly similar) versions. Later revision control systems maintained this notion of a file having an identity across multiple revisions of a project.

Torvalds rejected this concept; consequently, Git does not explicitly record file revision relationships at any level below the source code tree. This has some significant consequences:

  • It is slightly more expensive to examine the change history of a single file than the whole project. To obtain a history of changes affecting a given file, Git must walk the global history and then determine whether each change modified that file. This method of examining history does, however, let Git produce with equal efficiency a single history showing the changes to an arbitrary set of files. For example, a subdirectory of the source tree plus an associated global header file is a very common case.
  • Renames are handled implicitly rather than explicitly. A common complaint with CVS
    Concurrent Versions System

    In the field of software development, the Concurrent Versions System , also known as the Concurrent Versioning System, is a free software revision control system....
     is that it uses the name of a file to identify its revision history, so moving or renaming a file is not possible without either interrupting its history, or renaming the history and thereby making the history inaccurate. Most post-CVS revision control systems solve this by giving a file a unique long-lived name (a sort of inode number
    Inode

    In computing, an inode is a data structure on a traditional Unix-style file system such as Unix File System. An inode stores basic information about a regular computer file, directory , or other file system object....
    ) that survives renaming. Git does not record such an identifier, and this is claimed as an advantage. Source code
    Source code

    In computer science, source code is any collection of statements or declarations written in some human-readable computer programming language....
     files are sometimes split or merged as well as simply renamed, and recording this as a simple rename would freeze an inaccurate description of what happened in the (immutable) history. Git addresses the issue by detecting renames while browsing the history of snapshots rather than recording it when making the snapshot. (Briefly, given a file in revision N, a file of the same name in revision N-1 is its default ancestor. However, when there is no like-named file in revision N-1, Git searches for a file that existed only in revision N-1 and is very similar to the new file.) However, it does require more CPU
    Central processing unit

    A central processing unit is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage....
    -intensive work every time history is reviewed, and a number of options to adjust the heuristics.


Additionally, people are sometimes upset by the storage model:

  • Periodic explicit object packing. Git stores each newly created object as a separate file. Although individually compressed, this takes a great deal of space and is inefficient. This is solved by the use of "packs" that store a large number of objects in a single file (or network byte stream), delta-compressed among themselves. Packs are compressed using the heuristic
    Heuristic (computer science)

    In computer science, a heuristic algorithm, or simply a heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, but for which there is no formal proof of its correctness....
     that files with the same name are probably similar, but do not depend on it for correctness. Newly created objects (newly added history) are still stored singly, and periodic repacking is required to maintain space efficiency. Git does periodic repacking automatically but manual repacking is also possible with the git gc command.


Git implements several merging strategies; a non-default can be selected at merge time:

resolve: the traditional 3-way merge algorithm. recursive: This is the default when pulling or merging one branch, and is a variant of the 3-way merge algorithm. "When there are more than one common ancestors that can be used for 3-way merge, it creates a merged tree of the common ancestors and uses that as the reference tree for the 3-way merge. This has been reported to result in fewer merge conflicts without causing mis-merges by tests done on actual merge commits taken from Linux 2.6 kernel development history. Additionally this can detect and handle merges involving renames." octopus: This is the default when merging more than two heads.

Early history

Git development began after many Linux kernel
Linux kernel

The Linux kernel is an operating system kernel used by a family of Unix-like operating systems. The term Linux distribution is used to refer to the various operating systems that run on top of the Linux Kernel....
 developers were forced to give up access to the proprietary
Proprietary software

Proprietary software is a term coined by advocates of the free software movement to describe computer software which is the legal property of one party....
 BitKeeper system (see BitKeeper - Pricing change
BitKeeper

BitKeeper is a software tool for distributed revision control of computer source code. A sophisticated distributed system, BitKeeper competes largely against other professional systems such as Rational ClearCase and Perforce....
). The ability to use BitKeeper free of charge had been withdrawn by the copyright holder Larry McVoy
Larry McVoy

Larry McVoy is the CEO of BitMover, the company that makes BitKeeper, a version control system that was used from February 2002 to early 2005 to manage the source code of the Linux kernel....
 after he claimed Andrew Tridgell
Andrew Tridgell

Andrew "Tridge" Tridgell is an Australian computer programmer best known as the author of and contributor to the Samba software file server, and co-inventor of the rsync algorithm....
 had reverse engineered
Reverse engineering

Reverse engineering is the process of discovering the technological principles of a device, object or system through analysis of its structure, function and operation....
 the BitKeeper protocols in violation of the BitKeeper license. At Linux.Conf.Au
Linux.conf.au

linux.conf.au is Australia's national Linux and Open Source conference. It is a roaming conference, held in a different city every year, coordinated by Linux Australia and organised by local Australian Linux User Groups....
 2005, Tridgell demonstrated during his keynote that the reverse engineering process he had used was simply to telnet
TELNET

Telnet is a network protocol used on the Internet or Local Area Network connections. It was developed in 1969 beginning with RFC 15 and standardized as Internet Engineering Task Force STD 8, one of the first Internet standards....
 to the appropriate port of a BitKeeper server and type "help".

Torvalds wanted a distributed system that he could use like BitKeeper, but none of the available free systems met his needs, particularly his performance needs. From an e-mail he wrote on April 7 2005 while writing the first prototype:

Torvalds had several design criteria:
  1. Take CVS
    Concurrent Versions System

    In the field of software development, the Concurrent Versions System , also known as the Concurrent Versioning System, is a free software revision control system....
     as an example of what not to do; if in doubt, make the exact opposite decision. To quote Torvalds, speaking somewhat tongue-in-cheek:
    “For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is, but I did end up using CVS for 7 years at a commercial company [ Transmeta
    Transmeta

    Transmeta Corporation was a United States-based corporation that licensed low power semiconductor intellectual property. Transmeta originally produced very long instruction word code morphing microprocessors, with a focus on reducing power consumption in electronic devices....
    ]
    and I hate it with a passion. When I say I hate CVS with a passion, I have to also say that if there are any SVN (Subversion) users in the audience, you might want to leave. Because my hatred of CVS has meant that I see Subversion as being the most pointless project ever started. The slogan of Subversion for a while was ‘CVS done right’, or something like that, and if you start with that kind of slogan, there's nowhere you can go. There is no way to do CVS right.”
  2. Support a distributed, BitKeeper-like workflow
    “BitKeeper was not only the first source control system that I ever felt was worth using at all, it was also the source control system that taught me why there's a point to them, and how you actually can do things. So Git in many ways, even though from a technical angle it is very very different from BitKeeper (which was another design goal, because I wanted to make it clear that it wasn't a BitKeeper clone), a lot of the flows we use with Git come directly from the flows we learned from BitKeeper.”
  3. Very strong safeguards against corruption, either accidental or malicious
  4. Very high performance
The first three criteria eliminated every pre-existing version control system except for Monotone
Monotone (software)

Monotone is an open source software tool for distributed revision control. Monotone tracks revisions to files, groups sets of revisions into changesets, and tracks history across renames....
, and the fourth excluded everything. So, immediately after the 2.6.12-rc2 Linux kernel development release, he set out to write his own.

The development of Git began on April 3 2005. The project was announced on April 6, and became self-hosting
Self-hosting

The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program?for example, a compiler that can compile its own source code....
 as of April 7. The first merge of multiple branches was done on April 18. Torvalds achieved his performance goals; on April 29, the nascent Git was benchmarked recording patches to the Linux kernel tree at the rate of 6.7 per second. On June 16, the kernel 2.6.12 release was managed by Git.

While strongly influenced by BitKeeper, Torvalds deliberately attempted to avoid conventional approaches, leading to a unique design. He developed the system until it was usable by technical users, then turned over maintenance on July 26 2005 to Junio Hamano, a major contributor to the project. Hamano was responsible for the 1.0 release on December 21 2005, and remains the maintainer .

Implementation

Like BitKeeper, Git does not use a centralized server. However, Git's primitives are not inherently a SCM
Software configuration management

In software engineering, software configuration management is the task of tracking and controlling changes in the software. Configuration management practices include revision control and the establishment of baseline ....
 system. Torvalds explains,

Git has two data structure
Data structure

A data structure in computer science is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data....
s, a mutable index that caches information about the working directory and the next revision to be committed, and an immutable, append-only object database containing four types of objects:
  • A blob object is the content of a file
    Computer file

    A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable computer storage....
    . Blob objects have no names, timestamps, or other metadata.
  • A tree object is the equivalent of a directory: it contains a list of filenames, each with some type bits and the name of a blob or tree object that is that file, symbolic link, or directory's contents. This object describes a snapshot of the source tree.
  • A commit object links tree objects together into a history. It contains the name of a tree object (of the top-level source directory), a timestamp, a log message, and the names of zero or more parent commit objects.
  • A tag object is a container that contains reference to another object and can hold additional meta-data related to another object. Most commonly it is used to store a digital signature of a commit object corresponding to a particular release of the data being tracked by Git.


The object database can hold any kind of object. An intermediate layer, the index, serves as connection point between the object database and the working tree.

Each object is identified by a SHA-1
SHA hash functions

The SHA hash functions are a set of cryptographic hash functions designed by the National Security Agency and published by the National Institute of Standards and Technology as a U.S....
 hash
Cryptographic hash function

A cryptographic hash function is a algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will almost certainly change the hash value....
 of its contents. Git computes the hash, and uses this value for the object's name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object.

Git stores each revision of a file as a unique blob object. The relationships between the blobs can be found through examining the tree and commit objects. Newly added objects are stored in their entirety using zlib
Zlib

zlib is a library used for data compression. zlib was written by Jean-Loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program....
 compression. This can consume a large amount of hard disk
Hard disk

A hard disk drive , commonly referred to as a hard drive, hard disk, or fixed disk drive, is a non-volatile storage device which stores digitally encoded data on rapidly rotating hard disk platters with magnetic surfaces....
 space quickly, so objects can be combined into packs, which use delta compression to save space, storing blobs as their changes relative to other blobs.

Portability

Git is primarily developed on Linux
Linux

Linux is a generic term referring to Unix-like computer operating systems based on the Linux kernel. Their development is one of the most prominent examples of free and open source software collaboration; typically all the underlying source code can be used, freely modified, and redistributed by anyone under the terms of the GNU GPL license...
, but can be used on other Unix-like
Unix-like

A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
 operating systems including BSD
Berkeley Software Distribution

Berkeley Software Distribution is the Unix operating system derivative developed and distributed by the Computer Systems Research Group of the University of California, Berkeley, from 1977 to 1995....
, Solaris and Darwin
Darwin (operating system)

Darwin is an open source POSIX-compliant computer operating system released by Apple Inc. in 2000. It is composed of code developed by Apple, as well as code derived from NEXTSTEP, FreeBSD, and other free software projects....
. Git is extremely fast on POSIX
POSIX

POSIX or "Portable Operating System Interface" is the collective name of a family of related standardizations specified by the Institute of Electrical and Electronics Engineers to define the application programming interface , along with shell and utilities interfaces for software compatible with variants of the Unix operating system, altho...
-based systems such as Linux.

Git also runs on Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
. There are two variants:

  • A native Microsoft Windows port, called (using MSYS from MinGW
    MinGW

    MinGW , formerly mingw32, is a native porting of the GNU Compiler Collection to Microsoft Windows, along with a set of freely distributable import libraries and header files for the Windows API....
    ), is approaching completion. There are downloadable installers ready for testing (under the names "Git" and "msysgit", where "Git" is aimed for users). While somewhat slower than the Linux version, it is acceptably fast and is reported to be usable in production, with only minor awkwardness. In particular, some commands are not yet available from the GUIs, and must be invoked from the command line.


  • Git also runs on top of Cygwin
    Cygwin

    Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment....
     (a POSIX emulation layer), although it is noticeably slower, especially for commands written as shell scripts. This is primarily due to the high cost of the fork
    Fork (operating system)

    In computing, when a Computer_process forks, it creates a copy of itself, which is called a "Child_process." The original process is then called the "Parent_process"....
     emulation performed by Cygwin. However, the recent rewriting of many Git commands implemented as shell scripts in C
    C (programming language)

    C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
     has resulted in significant speed improvements on Windows. Regardless, many people find a Cygwin installation too large and invasive for typical Windows use.


Other alternatives for running Git include:

  • git-cvsserver (which emulates a CVS server, allowing use of Windows CVS clients):
  • Eclipse
    Eclipse (software)

    Eclipse is a multi-language software development environment comprising an Integrated development environment and a plug-in system to extend it....
     IDE-based Git client, based on a pure Java implementation of Git's internals:
  • NetBeans
    NetBeans

    NetBeans refers to both a Platform for the development of applications for the network , and an integrated development environment developed using the NetBeans Platform....
     IDE support for Git is under development.
  • A Windows Explorer
    Windows Explorer

    Windows Explorer is a file manager application that is included with releases of the Microsoft Windows operating system from Windows 95 onwards....
     extension (a TortoiseCVS
    TortoiseCVS

    TortoiseCVS is a Concurrent Versions System client for Microsoft Windows released under the GNU General Public License. Unlike most CVS tools, it includes itself in Windows' Operating system shell by adding entries in the contextual menu of the file explorer, therefore it does not run in its own Window ....
    /TortoiseSVN
    TortoiseSVN

    TortoiseSVN is a Subversion client, implemented as a Microsoft Windows Windows shell extension. It is free software released under the GNU General Public License....
    -lookalike) was started at and which is an explorer extension as well as a standalone GUI and a Visual Studio 2008 Plug-in


"Libifying" the lowest-level Git operations would in theory enable re-implementation of the higher-level components for Windows without rewriting the rest.

Criticisms

Older versions of Git, usually older than v1.5.0, have been criticized for their usability, documentation, and design. More recent comparisons indicate less criticism.

Until recently, Git's Windows support was poor enough to make projects that support both POSIX and Windows look elsewhere. Examples of projects that publicly ruled out any use of Git, in the year 2006, include Mozilla
Mozilla

Mozilla was the official, public, original name of Mozilla Application Suite by the Mozilla Foundation, currently known as SeaMonkey internet suite....
 and Ruby
Ruby (programming language)

Ruby is a dynamic programming language, reflection , general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features....
.

Other projects using GIT

Ruby on Rails
Ruby on Rails

Ruby on Rails is an open source web application framework for the Ruby . It is often referred to as "Rails" or "RoR". It is intended to be used with an agile software development, which is often utilized by web developers for its suitability for short, client-driven projects....
 web framework, YUI
Yahoo! UI Library

The Yahoo! UI Library is an open-source JavaScript library for building richly interactive web applications using techniques such as Ajax , Dynamic HTML and Document Object Model scripting....
, Merb
Merb

Merb, short for "Mongrel +eRuby#erb", is a Model View Controller web framework written in Ruby . Merb adopts an approach that focuses on essential core functionality, leaving most functionality to plugins....
, DragonFly BSD
DragonFly BSD

DragonFly BSD is a Free software Unix-like operating system created as a fork of FreeBSD 4.8. Matthew Dillon , a FreeBSD and Amiga developer since 1994, began work on DragonFly BSD in June 2003 and announced it on the FreeBSD mailing lists on July 16, 2003....
, and GPM_(software)
GPM (software)

GPM which means "general purpose mouse" provides Mouse support in Linux virtual console . It is included in most Linux distributions.ncurses supports GPM; many applications use ncurses mouse-support....


See also

  • Distributed revision control
    Distributed revision control

    Distributed revision control is a fairly recent innovation in Computer software revision control. It provides some significant advantages over the more traditional centralized approach to revision control, and it has some defining characteristics that separate it from centralized systems....
  • List of revision control software
    List of revision control software

    This is a list of notable software for revision control....
  • Comparison of revision control software
    Comparison of revision control software

    The following tables compare general and technical information for notable revision control and software configuration management software. This article is not all-inclusive and may become out of date quickly....
  • Comparison of open source software hosting facilities
  • Mercurial
    Mercurial (software)

    Mercurial is a cross-platform, distributed revision control tool for software developers. It is mainly implemented using the Python , but includes a binary diff implementation written in C ....
  • Repo (Script)
    Repo (Script)

    Repo is a tool that Google built on top of Git to manage the many Git Software Repository, do the uploads to revision control system, and automate parts of the Android development workflow....


External links

  • , also distributed with Git in Documentation/user-manual.txt
  • - the project page at kernel.org
  • , article by LWN.net
    LWN.net

    LWN.net is a computing news site with an emphasis on free software and software for Unix-like operating systems. It consists of a weekly issue, separate stories which are published most days, and threaded discussion attached to every story....
  • and at wiki
  • - A summary of the best Git features in comparison to other Software configuration management
    Software configuration management

    In software engineering, software configuration management is the task of tracking and controlling changes in the software. Configuration management practices include revision control and the establishment of baseline ....
     systems.
  • from GitWiki
  • from www.youtube.com
  • from www.youtube.com
  • , article by Sam Vilain
  • explains how Git conceptually works
  • is similar to "Git for computer scientists", but more thorough. For some high-level commands, it explains how low-level commands can be used to achieve the same effect.
  • [irc://irc.freenode.net/git #git] on freenode
  • - simple walk through of common git commands
  • - a comprehensive listing of Git tips & tricks, popularly referred to as "magic". Describes some of the lesser known features of Git.
  • : The community-built comprehensive online book
  • - a wrapper script for Git, presenting a simplified user interface, designed to be more accessible to users of other revision control systems.
  • - not just a branch name. Shows git-status info at bash prompt.