Object database

Git's on-disk storage format is relatively simple.

SHA-1s

Values are just sequences of bytes, and they're indexed by the SHA-1 hash of their content, generated by git hash-object.

Object storage

git init creates the .git directory:

  • objects/
    • NN/ -- the first two characters of a SHA-1 hash
      • nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn -- the remaining 38 characters of the SHA-1 hash

git cat-file retrieves stored objects; -t gets type, -p pretty-prints content.

Commits are objects

git commit creates and stores an object, and its SHA-1 hash is what determines the commit's ref. The file contains:

  • Header:
    • tree nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn, containing the SHA-1 hash of a tree object.
    • parent nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn references the parent commit. This field is omitted from root commits.
    • author Name <email> epoch +HHMM
    • committer Name <email> epoch +HHMM
  • Commit message

Trees are objects referencing other objects

The tree object is a series of lines comprised of the following fields:

  • mode (e.g. 10644)
  • type (blob, tree)
  • object-id containing the SHA-1 hash of the object.
  • name contains the filename.

git count-objects provides a summary of the number of objects and consumed disk space.

Tags are objects referencing commit objects

Git has two types of tags:

  • Lightweight tags don't have labels and point directly to a commit.
  • Annotated tags have labels, which are stored in tag objects.

Annotated tags are stored in the object database with the following fields:

  • Header:
    • object nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
    • type commit
    • tag name
    • tagger Name <email> epoch +HHMM
  • Message

Both types of tags are represented as files in the .git/refs/tags directory, where the value of the file is the SHA-1 hash of an object. Lightweight tags contain a reference to their HEAD commit, as there's no additional data that needs to be stored. Annotated tags instead contain a reference to their tag objects, which stores the label.

In short, tags are just labels pointing at commits -- exactly the same as a branch. The difference is in semantics: tags should be considered immutable, whereas branches can change.

Branches

At their core, branches just references to their current commit. The references are stored in .git/refs/heads, indexed by their branch names.

The current branch is stored in the .git/HEAD file:

ref: refs/heads/master

Remote branches can be found in a couple of locations within .git:

  • refs/:
    • remotes:
      • origin:
        • HEAD
  • packed-refs
$ git show-ref
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn refs/heads/x
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn refs/remotes/x