Git Internals: Branches, Logs, Refs & The Multiverse of Madness

Branching out your own timeline...

In the previous articles, we understood how git stores our data in the .git repository. Now that we are familiar with it, we can start with time-travelling.

But, wait! Did I mention that we can traverse through different timelines of code in the grand scheme of the multiverse of your code? My bad, I didn't! So let us get started with it!

Note: You might already be aware of git branches and the related operations, however, again, the intent of this article is not to impart knowledge on the basics of branching, rather on how git internally mantains these branches!

git init, The start of everything...

When git is initialized in the working directory, git automatically creates a default timeline for you whose history is maintained by git. The default timeline is known as the main timeline, formerly, it was known as the master timeline. All the snapshots of your code are saved in this timeline, and you can move back in time in this default timeline. So what does this timeline do?

Now, enter the fascinating world of Git branches. A branch, in essence, is a divergent timeline in the vast multiverse of your code. It allows developers to create alternate realities where they can experiment, innovate, and make changes without affecting the main timeline. Picture each branch as a parallel universe, coexisting independently until the developer decides to merge them back into the main timeline.

To bring this in perspective, when you initialize git in your working directory, by default, git creates a default branch, called main. This branch consists of all your commits (code snapshots). From this main branch, you can cut off other timelines for development purposes and then merge them into other timelines. When you merge two timelines, git automatically brings commits from both timelines, you might need to intervene in case of conflicts!

So how does git maintain the branches internally?

refs: Where timelines are defined

A branch is nothing but a reference to the head of a sequence of commits. Since you can be working on multiple features simultaneously, you may have multiple different branches, and the branch itself is not a code snapshot, but just a reference to the latest commit of that branch. Then how can we move back in time in that branch? Well, if you remember, each commit object has a parent commit in it, which enables git for easy traversal internally in the specific timeline as well.

So where does git store these references? Yes, you guessed it right, it stores it in the .git/refs folder. Let us inspect the contents of this folder.

➜  .git git:(main) cd refs 
➜  refs git:(main) tree
.
├── heads
│   └── main
└── tags

3 directories, 1 file

Continuing the example from our previous article, we have only one branch, i.e. the main branch. As shown above, the reference of branches is stored in the .git/refs/heads folder. The name of the branch is the name of the file containing the reference to the top commit of that branch. Let us inspect the contents of the main file in the folder mentioned above.

➜  refs git:(main) cat heads/main
b4f560a353a2202e9bb18939c7c3ccbf6e479639
➜  refs git:(main) git cat-file -p b4f560a353a2202e9bb18939c7c3ccbf6e479639
tree 408c5d6e7750f028dd67dd5b25cf98076e400638
parent cc3a96b5a05c666fca63d7b4acd29ce779ad0ce5
author Aman <dummy-email@gmail.com> 1703971554 +0530
committer Aman <dummy-email@gmail.com> 1703971554 +0530

Second commit to time travel!

As seen above, the main file contains only one commit, which on inspecting, turns out to be the latest commit on the main branch. Now, let us create a new branch.

➜  time-travel-with-git git:(main) git checkout -b timeline1 main
Switched to a new branch 'timeline1'

Now let us examine the contents of the refs folder.

➜  time-travel-with-git git:(timeline1) ✗ cd .git/refs                             
➜  refs git:(timeline1) tree
.
├── heads
│   ├── main
│   └── timeline1
└── tags

3 directories, 2 files

As seen above, a new file gets added to the heads folder, which has the same name as that of the new branch name. Now let us have a look at timeline1's content.

➜  refs git:(timeline1) cat heads/timeline1
b4f560a353a2202e9bb18939c7c3ccbf6e479639
➜  refs git:(timeline1) git cat-file -p b4f560a353a2202e9bb18939c7c3ccbf6e479639
tree 408c5d6e7750f028dd67dd5b25cf98076e400638
parent cc3a96b5a05c666fca63d7b4acd29ce779ad0ce5
author Aman <dummy-email@gmail.com> 1703971554 +0530
committer Aman <dummy-email@gmail.com> 1703971554 +0530

Second commit to time travel!

The above commit message is the same as that of the latest commit on the main branch. Why is that so?
If you carefully look at how we created our branch: git checkout -b timeline1 main, we have created it from the main branch. So it means that our new timeline starts from the latest timeline of the main and will contain all the commits of the main timeline till the point from where the timeline1 was cut off. From this point onwards, the timelines are separate and can have separate commits till the time they are merged.

Let us add a few more commits to the timeline1 branch.

➜  time-travel-with-git git:(timeline1) touch timeline1-file.txt
➜  time-travel-with-git git:(timeline1) ✗ echo "Hello from timeline1! " >> file2-timeline1.txt 
➜  time-travel-with-git git:(timeline1) ✗ git add .
➜  time-travel-with-git git:(timeline1) ✗ git commit -m "First commit on timeline1"
[timeline1 01b3b24] First commit on timeline1
 1 file changed, 1 insertion(+)
 create mode 100644 timeline1-file.txt
➜  time-travel-with-git git:(timeline1) touch file2-timeline1.txt
➜  time-travel-with-git git:(timeline1) ✗ echo "Hello from file2 timeline1! " >> file2-timeline1.txt 
➜  time-travel-with-git git:(timeline1) ✗ git add .
➜  time-travel-with-git git:(timeline1) ✗ git commit -m "Second commit on timeline1"          
[timeline1 0c8e8d9] Second commit on timeline1
 1 file changed, 1 insertion(+)
 create mode 100644 file2-timeline1.txt

Let us inspect the contents of the timeline1 branch file in the refs folder.

➜  time-travel-with-git git:(timeline1) cat .git/refs/heads/timeline1 
0c8e8d91dce2222d669776aea7af55a8c27685fa
➜  time-travel-with-git git:(timeline1) git cat-file -p 0c8e8d91dce2222d669776aea7af55a8c27685fa
tree 71fa913d1f22a71e1718b0df4dae7ca63759a088
parent 01b3b247a53a68b399494b392003c81d0ea63087
author Aman <dummy-email@gmail.com> 1704010558 +0530
committer Aman <dummy-email@gmail.com> 1704010558 +0530

Second commit on timeline1

As seen here, the head is now pointing to the latest commit on the timeline1 branch. Now if you have to move to another branch, git will simply look up the branch name in this refs folder and move to the commit objectId present in that file.

Good progress! Down goes the theory of timelines in git!

logs: The registry holding all commits refs

What if you want to have a look at all the commits made to date in a single go? Well, this can be useful in a variety of use cases. So if we continue with our current model, we will need to traverse backwards from the current commit up to the first commit (since the commit object has parent commits as well). In theory, this sounds great, but as the complexity of the project increases, looking up all the commits becomes a bottleneck, because it becomes a sequential operation. So what do we do now?

Inverted Indexes are useful in such scenarios, and git has its flavour of inverted index. The term "inverted index" here refers to Git's approach of holding a reference to all the commits in a way that can be looked up in constant time! This is a crucial optimization that enhances the efficiency of operations related to commit history. So where does Git store this information?

It resides inside the .git/logs folder. Let us examine the contents of the logs folder.

➜  time-travel-with-git git:(timeline1) cd .git/logs 
➜  logs git:(timeline1) tree
.
├── HEAD
└── refs
    └── heads
        ├── main
        └── timeline1

3 directories, 3 files

As seen above, we have the HEAD file and the refs folder. Let us first have a look at the heads folder. From the file name, it is clear that the file name represents the branch names. Let us inspect the contents of these files.

I have modified the content to match our needs for the time being:

➜  logs git:(timeline1) cat refs/heads/main
0000 cc3a commit (initial): First stop in time-travel
cc3a b4f5 commit: Second commit to time travel!
➜  logs git:(timeline1) cat refs/heads/timeline1 
0000 b4f5 branch: Created from main
b4f5 01b3 commit: First commit on timeline1
01b3 0c8e commit: Second commit on timeline1

As seen from the above excerpt, the files contain the entire history of the given timeline (branch). The main branch has only two commits, whereas the timeline1 branch, clearly shows that the branch has been cut off from the main branch, and after that two commits were made. Git provides a command to check the history of the current timeline, git log.

➜  logs git:(timeline1) git log 

commit 0c8e8d91dce2222d669776aea7af55a8c27685fa (HEAD -> timeline1)
Author: Aman
Date:   Sun Dec 31 13:45:58 2023 +0530

    Second commit on timeline1

commit 01b3b247a53a68b399494b392003c81d0ea63087
Author: Aman
Date:   Sun Dec 31 13:45:09 2023 +0530

    First commit on timeline1

commit b4f560a353a2202e9bb18939c7c3ccbf6e479639 (main)
Author: Aman
Date:   Sun Dec 31 02:55:54 2023 +0530

    Second commit to time travel!

commit cc3a96b5a05c666fca63d7b4acd29ce779ad0ce5
Author: Aman
Date:   Sun Dec 31 02:26:23 2023 +0530

    First stop in time-travel

Now let us move back to the main timeline, i.e. the main branch and see what git log has in store for us.

➜  logs git:(main) git log

commit b4f560a353a2202e9bb18939c7c3ccbf6e479639 (HEAD -> main)
Author: AmanMulani <amanmulani369@gmail.com>
Date:   Sun Dec 31 02:55:54 2023 +0530

    Second commit to time travel!

commit cc3a96b5a05c666fca63d7b4acd29ce779ad0ce5
Author: AmanMulani <amanmulani369@gmail.com>
Date:   Sun Dec 31 02:26:23 2023 +0530

    First stop in time-travel
(END)

As seen here, git shows the history of the current timeline. In the first line, you see that HEAD is pointing to the main branch. Git maintains a HEAD reference to the latest commit of the current timeline. This reference is maintained in the .git/HEAD file. Let us examine the contents of the HEAD file:

➜  time-travel-with-git git:(main) cat .git/HEAD
ref: refs/heads/main
➜  time-travel-with-git git:(main) git checkout timeline1
Switched to branch 'timeline1'
➜  time-travel-with-git git:(timeline1) cat .git/HEAD
ref: refs/heads/timeline1

As seen from the snippet above, the HEAD file always contains the reference to the current branch. (Although in a detached state, HEAD can also point to a commit ID). Using the combination of techniques mentioned above and some other optimizations, git can perform a look-up of the history of any timeline efficiently.

Summary

Ahhh! Finally! We have understood how timelines aka branches are maintained internally in Git, and how Git looks up the history of any given timeline. This is particularly useful as you get into situations where basic git commands are no longer able to save your day!

With this knowledge in the arsenal, in the upcoming articles, we will focus on how the timelines are merged, resulting in the creation of something beautiful, or sometimes, something very very terrible!

Did you find this article valuable?

Support ArcheTech with Aman by becoming a sponsor. Any amount is appreciated!