Object database
Git's on-disk storage format is relatively simple.
SHA-1s
Values are just sequences of bytes, and they're indexed by the SHA-1 hash of their content, generated by git hash-object
.
Object storage
git init
creates the .git
directory:
objects/
NN/
-- the first two characters of a SHA-1 hashnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
-- the remaining 38 characters of the SHA-1 hash
git cat-file
retrieves stored objects; -t
gets type, -p
pretty-prints content.
Commits are objects
git commit
creates and stores an object, and its SHA-1 hash is what determines the commit's ref
. The file contains:
- Header:
tree nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
, containing the SHA-1 hash of a tree object.parent nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
references the parent commit. This field is omitted from root commits.author Name <email> epoch +HHMM
committer Name <email> epoch +HHMM
- Commit message
Trees are objects referencing other objects
The tree object is a series of lines comprised of the following fields:
mode
(e.g.10644
)type
(blob
,tree
)object-id
containing the SHA-1 hash of the object.name
contains the filename.
git count-objects
provides a summary of the number of objects and consumed disk space.
Tags are objects referencing commit objects
Git has two types of tags:
- Lightweight tags don't have labels and point directly to a commit.
- Annotated tags have labels, which are stored in tag objects.
Annotated tags are stored in the object database with the following fields:
- Header:
object nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
type commit
tag name
tagger Name <email> epoch +HHMM
- Message
Both types of tags are represented as files in the .git/refs/tags
directory, where the value of the file is the SHA-1 hash of an object. Lightweight tags contain a reference to their HEAD commit, as there's no additional data that needs to be stored. Annotated tags instead contain a reference to their tag objects, which stores the label.
In short, tags are just labels pointing at commits -- exactly the same as a branch. The difference is in semantics: tags should be considered immutable, whereas branches can change.
Branches
At their core, branches just references to their current commit. The references are stored in .git/refs/heads
, indexed by their branch names.
The current branch is stored in the .git/HEAD
file:
ref: refs/heads/master
Remote branches can be found in a couple of locations within .git
:
refs/
:remotes
:origin
:HEAD
packed-refs
$ git show-ref
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn refs/heads/x
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn refs/remotes/x