../

git index file format

The index file is crucial to understanding how git staging works. It contains all the files git is tracking at the moment

Contents of .git/index

  • Let us consider the example of an empty git repo
  • If we attempt to cat out the contents of .git/index we are greeted with DIRC9ÿêûÂ5l~ır!lÎÕ'™A˘fl
  • This is a binary file

ELI5 version of .git/index

  • Checkout https://git-scm.com/docs/index-format for more details.
  • The index file starts with a 12-byte header
  • The first 4 bytes are occupied by DIRC
  • The next 4 bytes indicate the version number
  • The next 4 bytes(32-bits) indicate the number of index entries
    • This means that git can potentially only have $2^{32} = 4, 294, 967, 296$ different files for staging

That means for an empty directory , the header of .git/index looks like this

% xxd -groupsize 4 -len 12 .git/index
00000000: 44495243 00000002 00000000           DIRC........

The the entire file looks as follows

% xxd   .git/index
00000000: 4449 5243 0000 0002 0000 0000 39d8 9013  DIRC........9...
00000010: 9ee5 356c 7ef5 7221 6ceb cd27 aa41 f9df  ..5l~.r!l..'.A..
  • What is after our header? We don’t have any files staged. So, what are those extra bits?
  • Those are the sha1 checksum

Let us get the checksum

% xxd -s 12  -p .git/index
39d890139ee5356c7ef572216cebcd27aa41f9df
% echo -n "0x444952430000000200000000" | xxd -r | sha1
39d890139ee5356c7ef572216cebcd27aa41f9df

We see that the checksum matches

What happens if we stage a file?

% xxd .git/index
00000000: 4449 5243 0000 0002 0000 0001 6952 d929  DIRC........iR.)
00000010: 37c5 4aba 6952 d929 37c5 4aba 0100 0012  7.J.iR.)7.J.....
00000020: 0584 e351 0000 81a4 0000 01f5 0000 0014  ...Q............
00000030: 0000 0000 e69d e29b b2d1 d643 4b8b 29ae  ...........CK.).
00000040: 775a d8c2 e48c 5391 0007 666f 6f2e 7478  wZ....S...foo.tx
00000050: 7400 0000 e6d3 1019 bd29 d667 1061 e851  t........).g.a.Q
00000060: 07dd 0acc d2f1 6a9e                      ......j.
% git ls-files --stage
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	foo.txt
  • The output of the second command is essentially tacked on after the header.
  • We also see that the third part of the header is now 1