../
Contents of
ELI5 version of
git index file format
The index file is crucial to understanding how git staging works. It contains all the files git is tracking at the moment
Contents of .git/index
- Let us consider the example of an empty
gitrepo - If we attempt to
catout the contents of.git/indexwe are greeted withDIRC 9ÿêûÂ5l~ır!lÎÕ'™A˘fl - This is a binary file
ELI5 version of .git/index
- Checkout https://git-scm.com/docs/index-format for more details.
- The index file starts with a 12-byte header
- The first 4 bytes are occupied by
DIRC - The next 4 bytes indicate the version number
- The next 4 bytes(32-bits) indicate the number of index entries
- This means that
gitcan potentially only have $2^{32} = 4, 294, 967, 296$ different files for staging
- This means that
That means for an empty directory , the header of .git/index looks like this
% xxd -groupsize 4 -len 12 .git/index
00000000: 44495243 00000002 00000000 DIRC........
The the entire file looks as follows
% xxd .git/index
00000000: 4449 5243 0000 0002 0000 0000 39d8 9013 DIRC........9...
00000010: 9ee5 356c 7ef5 7221 6ceb cd27 aa41 f9df ..5l~.r!l..'.A..
- What is after our header? We don’t have any files staged. So, what are those extra bits?
- Those are the
sha1checksum
Let us get the checksum
% xxd -s 12 -p .git/index
39d890139ee5356c7ef572216cebcd27aa41f9df
% echo -n "0x444952430000000200000000" | xxd -r | sha1
39d890139ee5356c7ef572216cebcd27aa41f9df
We see that the checksum matches
What happens if we stage a file?
% xxd .git/index
00000000: 4449 5243 0000 0002 0000 0001 6952 d929 DIRC........iR.)
00000010: 37c5 4aba 6952 d929 37c5 4aba 0100 0012 7.J.iR.)7.J.....
00000020: 0584 e351 0000 81a4 0000 01f5 0000 0014 ...Q............
00000030: 0000 0000 e69d e29b b2d1 d643 4b8b 29ae ...........CK.).
00000040: 775a d8c2 e48c 5391 0007 666f 6f2e 7478 wZ....S...foo.tx
00000050: 7400 0000 e6d3 1019 bd29 d667 1061 e851 t........).g.a.Q
00000060: 07dd 0acc d2f1 6a9e ......j.
% git ls-files --stage
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 foo.txt
- The output of the second command is essentially tacked on after the header.
- We also see that the third part of the header is now 1