../
Type of
Why store hashes like
Git and GitHub - Full Course
Notes from this https://www.youtube.com/watch?v=rH3zE7VlIMs
Type of git commands
If we read the first commit of git, we see the following excerpt
GIT - the stupid content tracker
"git" can mean anything, depending on your mood.
- random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronounciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
- "global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks
It is from the last bullet point, that we derive the naming convention for git commands:
- Porcelain
- This is the outer high level polished stuff
- Plumbing
- This is the low level nitty gritty stuff
We will mostly work with the Porcelain commands.
git init
git initcreates an empty.gitfolder in current working directory
.git/
├── config
├── description
├── HEAD
├── hooks/
│ ├── applypatch-msg.sample*
│ ├── commit-msg.sample*
│ ├── fsmonitor-watchman.sample*
│ ├── post-update.sample*
│ ├── pre-applypatch.sample*
│ ├── pre-commit.sample*
│ ├── pre-merge-commit.sample*
│ ├── pre-push.sample*
│ ├── pre-rebase.sample*
│ ├── pre-receive.sample*
│ ├── prepare-commit-msg.sample*
│ ├── push-to-checkout.sample*
│ ├── sendemail-validate.sample*
│ └── update.sample*
├── info/
│ └── exclude
├── objects/
│ ├── info/
│ └── pack/
└── refs/
├── heads/
└── tags/
9 directories, 18 files
- We see that this directory is pretty empty.
- We have a bunch of sample hook files.
objects/andrefs/are all empty because we don’t have any commits
git hashing mechanism
gitby default usessha1- But it doesn’t just
sha1the file contents - It first appends the
type, which isblobfor a file, then the size followed by null terminator\0and then the file contents- For example an empty file called
foo.txthas the hashe69de29bb2d1d6434b8b29ae775ad8c2e48c5391 - But when you run
sha1 foo.txtyou will getda39a3ee5e6b4b0d3255bfef95601890afd80709 - Running
echo -n "blob 0\0" | sha1 -will returne69de29bb2d1d6434b8b29ae775ad8c2e48c5391which is the hash id used bygitforfoo.txt - Notice that the calculation of the hash never involved the filename. If we create a new file empty called
bar.txtit will also hash toe69de29bb2d1d6434b8b29ae775ad8c2e48c5391.gitonly uses the contents of the file to generate the object id
- For example an empty file called
index file
- We saw that
gitonly uses the contents of the file to perform the hash - What happens if we rename the file?
- Consider the following scenario:
touch foo.txt
git add foo.txt
mv foo.txt bar.txt
git statuswill output
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: foo.txt
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: foo.txt
Untracked files:
(use "git add <file>..." to include in what will be committed)
bar.txt
- Our
.git/objectsfolder still looks like this
.git/objects
├── e6
│ └── 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
├── info
└── pack
4 directories, 1 file
- How did
gitknow that we changed the file name? - It used the
.git/indexfile - When we say that we are staging a file, we are technically adding it to the
indexfile - When we moved
foo.txttobar.txt,git statusdid the following- First it outputs all the staged changes
- Second it realizes that one of the staged files is missing and hence showed that as a change to stage
- Third it saw that
bar.txtis not in theindexfile and said that it is untracked
- See 20251229T142012-git_index_file_format for more details
File status
git at its core is a file tracker. It tracks how a file evolves over its lifetime. A file can be in three states
untracked.git/indexdoesn’t have the file and there are no corresponding objects
stagedgit/indexhas the file but it is not committed yet
commited.git/indexhas the file and there is commit object and a blob object corresponding to that file
commit
- Refer to 20240130191938-git-moc for a primer of how a
COMMITis designed under the hood - If we use
git cat-file -p 3c82b84b3db4de6139871ef7c49609d26b410d13we get
tree 09a13b897d3d0f528d487c704da540cb952d7606
author Deebakkarthi C R <dbk@deebakkarthi.com> 1767037689 -0500
committer Deebakkarthi C R <dbk@deebakkarthi.com> 1767037689 -0500
Add foo.txt
- This is literally what a commit is
- It has an
IDto atreeobject - An Author
- Committer
COMMIT_MSG
- It has an
- What is a
treeobject? We can rungit cat-file -p 09a13b897d3d0f528d487c704da540cb952d7606to find out
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 foo.txt
treerepresents the snapshot of the root directory
Why store hashes like 09/a13b897d3d0f528d487c704da540cb952d7606
- You may notice that instead of storing object directly under
.git/objects,gitstores them in a directory prefixed with the first two characters of the hash. Why is that? - It is due to a phenomenon known as inode busting. You have limited amount of
inodeson your system. You can find it usingdf -i.