Branches

Digging Deep into Git with 1 file in a single repository and 11 commits

Recently, I was conducting a session on git and branching and I was trying to explain the mental model. I didn’t do a very good job of it. I found an excellent post by Neil Atkinson called Git branches – Is your mental model wrong?. I tried to use the post, but it didn’t help my participants achieve the kind of clarity that I wanted them to have. Mostly because the post is intended for those who have a little better familiarity with git than most of us have, even after using git for a long time. To quote from the post:

When we start out as developers, using git for the first time, we are shown diagrams of git commits forming chains that “branch” outwards (see Scott Chacon’s excellent Pro Git Book: http://git-scm.com/doc). When we look into the underlying implementation of git we find data structures known as “trees”. Then we hear that there is such a thing as a git branch, and suddenly we instinctively feel we know what that is: it must surely bear a similarity to the branch of a tree, else why would they have named it a “branch”.

And that must mean…

  • Branches are made up of a load of commits joined together, right?
  • I can move up and down a branch, right?
  • Two people can be at different points on the same branch, right?
  • I can delete a branch by deleting the commits that make up the branch, right?
  • I guess I can move a branch if I break the links between commits or something, right?
  • When two branches merge they become one single branch, right?

No! No! No! No! No! No!
Git branches – Is your mental model wrong?

That’s been bothering me quite a bit. By writing this post (and hopefully rewriting it multiple times), I hope to get myself and you, my reader, a very clear mental model of how git branches work by going into more details than the post mentioned above and a practical exercise that we’ll do alongside this post.

My understanding is that once a developer has a clear understanding of how git works, every git command suddenly becomes accessible and they can do a much better job of maintaining their code and related workflows. Let’s give it a shot then.

What we need to learn

  1. There’s no tree. A commit is a snapshot of your code, as you write it in a linear timeline. Commits are added in a straight line.
  2. Everything is a commit. A branch is a commit.  A tag is a commit.
  3. Every branch and tag is just a name (labelfor a single commit, or in git terminology, a ref (reference) for a single commit. There are no tree like branches.
  4. You don’t have to be scared of the  You are in 'detached HEAD' state... error. There’s nothing wrong with your code.

To understand these statements, we need to dig into git. To really and deeply understand what I’m saying in this post, it is necessary that you open a terminal right away and run the commands as instructed from time to time in the tutorial. If you don’t do that, some things will not make sense.

Exercise Requirements

A git repository for this tutorial

To start off, let’s create an example repository for this tutorial anywhere on your machine. Make sure you have git installed.

  1. Create a directory for our repo.
  2. Initialise git

With this command, you have asked git to start tracking this repository.

A spreadsheet or text file to track commit history

Simply make a copy of this one https://docs.google.com/spreadsheets/d/1ty6jDXHVuNvVoiYQyxVANXf7o6nhCJXhtKccsko9KBU/edit?usp=sharing. It has five columns, viz, History.md Content, Commit Message, Commit ID, Tag  and Branch:

We’ll track whatever changes we make to the repository in this sheet

.

How does git work? or What is a commit?

You obviously write code in a linear timeline, whether you use an SCM like git, svn, mercurial or any other. With git, you take a snapshot of your code at a give time. It’s like a photograph, a very detailed one.

Just look at your surroundings right now. You are reading this post on a device. There are things and probably people around you. Imagine you could take a magical photograph of everything around you. It’s magical because say 20 years later, you can sit in a different city, probably on Mars, pull out this photograph and say some magic words. Instantly, you’ll be back in this exact moment, when you took the photograph. You could time travel in your past, in your history using these snapshots.

That’s what a commit is for a code repository. It is a magical snapshot. Later on, you can write a simple command in your terminal and all your code would be exactly like how it was when you took the snapshot or in git terms, added a commit. You can restore your code to any commit, anytime in your code’s past history or, in git terms, the commit history.

Remember that, it is very important. Your code changes in time, like your life. It has a history of commits (snapshots) taken at various points in it’s linear history. It is a straight line.

Adding a commit (taking a snapshot) or committing is easy. Whenever you want to commit, we run

Let’s make some changes to our repo. While inside the  git-model-tut directory, create a new file called  History.md:

Open this file in your favourite editor, I’m going to use vim:

Inside the file, just paste/write the following text

and save it.

We’re now going to take a snapshot or commit. To do that, we have to first add this new file to the staging area and then commit it:

The commit message says First Commit; the text inside the History.md file says First Change.

Add this information to your spreadsheet so that it looks like this:

Spreadsheet after first commit

Next, let’s make the second change to  History.md, so that it’s contents are:

Then, commit it:

Finally, add this information to the spreadsheet too:

Spreadsheet Contents after second commit

Now, let’s make a third change is the same way so that the contents of  History.md are:

Commit it again

Again, add all the information to our spreadsheet;

Spreadsheet Contents after third commit

Commit ID, because commits need a name, just like everything else

Whether human or machine, we need to have names, or in developer speak, identifiers (ID) for everything. It’s also true for commits. We need to have a name or ID for every commit.

Every git commit does have a name. You can call it the commit key, commit ID or commit hash. It is generated automatically by git when you run the git commit command and is a long alphanumeric string. Here’s an example: dbbd418c23e59c702e09e511abff6b75a0b8f9b6.

You can see these IDs in your commit history. If you have been following the instructions till now, run:

You should see an output similar to this:

Git Commit History

Of course, your commit IDs will be different from what is displayed here. Remember git generates these automatically. At this stage, copy the commit IDs after the word commit and the space for each of the commits and add them to your spreadsheet:

Spreadsheet Contents with Commit IDs after third commit

This is the commit history. Each commit has an ID (your IDs will be different from what you see here.) Using these IDs, you can travel up and down the straight timeline of your commit history. For example, to go back to the first change, I can simply run:

To come back to the latest commit, I can run:

Try this yourself. Ignore any errors that you come across. We’ll get to that in just a bit:

  1. Copy the commit ID of the First Commit and run:
  2. Open the  History.md file. It’s contents should be:
  3. Copy the commit ID of the Second Commit and run:
  4. Open the  History.md file. It’s contents should be:
  5. Copy the commit ID of the Third Commit (latest code) and run:
  6. Open the  History.md file. It’s contents should be back to:
  7. Finally, get back to our master branch:

You just time travelled in a way! 🙂

Tags, because humans can’t remember long alphanumeric names

If you and I had a conversation about commit number dbbd418c23e59c702e09e511abff6b75a0b8f9b6, it’ll quickly get awkward.

That’s why we give a human readable name to some commits that are important to our workflow so that we can remember the names and have non-awkward conversations about the snapshot (commit). If dbbd418c23e59c702e09e511abff6b75a0b8f9b6 had a simple name, our lives would become potentially awesome.

That’s what tags are. They are human readable names or labels for a particular commit. Together with branches (that we’ll come to, in a moment), they are called refs (references). This means that a particular tag just refers to a particular commit. In other words, when we talk about a tag say 1.0, we are just talking about a commit whose real name is something like dbbd418c23e59c702e09e511abff6b75a0b8f9b6.

(Right now, because we have created a repository and added commits in a certain way, we’re able to refer to commits by the commit message, as well. So, commit ID dbbd418c23e59c702e09e511abff6b75a0b8f9b6 has the commit message “First Commit” and so on. That’s only for our understanding. That’s not how git repositories are in real world work. Commit messages aren’t names or references to a commit, but rather a summary of changes.)

You can give any commit a name or a tag by running

Now, instead of having to remember long alphanumeric commit IDs, we can simply remember the name or tag. It is now easier to travel up and down the straight timeline of your commit history, by running

Let’s do that for our repository. Let’s give our third commit a tag:

Update the tag column of our spreadsheet for this commit:

Spreadsheet Contents with first tag after third commit

Now, let’s make our fourth change. Edit  History.md so that the contents are:

Commit it again

Now, run git log, the output should be similar to this:

Git log output

Remember, your commit IDs will be different from mine. Make sure to update the spreadsheet with this information:

Spreadsheet Contents after fourth commit

Now if I want to go back to the snapshot that we call the Third Commit, I can do one of the following

Go ahead, try it. First checkout the third commit, by copying the commit ID and then running:

You’d get an error similar to this one:

In fact, you might have got this error, earlier, as well. I asked you to ignore it then. I’m going to ask you to just pay attention to the highlighted lines. Beyond that, I want you to ignore it for now, we’ll get back to it later. I promise! 🙂

Your code will revert to that commit. If you view  History.md , it’s contents should be:

Now go back to the latest commit by using the commit ID of the Fourth Commit:

Confirm this by viewing the contents of  History.md. The contents should be:

This time, there should be no error.

Now, checkout the tag 1.0:

You’d again get the error that we saw above. Other than that, if you view  History.md, the contents should be:

Finally, go back to the Fourth Commit (latest commit again), by checking out with its ID:

This time the error would disappear again.

Tags are strict. Tags are stubborn. A tag will always refer to a single commit. After you tag a commit, you’ll obviously make some more changes and commit to get a new commit ID but no name. You’ll keep making new changes and committing them, but they will not have the tag.

Let’s make a fifth and sixth change in the same fashion as above. Our git log should look like:

Commit History after Sixth Commit

Update the spreadsheet as usual:

Spreadsheet Contents after sixth commit

Recall that we’re still writing our code in a linear timeline, frequently taking snapshots (committing) along a straight line (commit history). All these snapshots have a long alphanumeric string as a name. So, if some of these snapshots have any special meaning for us, we give it a human readable simple name (tag).

We are still on a straight line. There is no tree.

What is the HEAD?

Let’s look back at our error message earlier:

Line #3 says that we’re in a detached HEAD state. We’ll understand that later. Right now, our focus is on Line #12 that says HEAD is now at 704e480 …. What that means is git or in another manner of saying, the HEAD of your repository is at that particular commit ID.

The HEAD is like a pointer on your repository’s linear timeline (or commit history). When you checkout a particular commit, you are just moving the pointer (the HEAD) to that commit (snapshot) of your code.

Git is like an observer tracking your code, in a linear timeline, maintaining a record of all the snapshots (commits with IDs), as you go about your business of writing code.

If you ask git for  a particular snapshot (commit) by giving it the ID of the commit, it can take you back in the commit history at that particular snapshot by moving the HEAD to that snapshot.

When you move up and down your repository’s timeline, you’re actually asking git to move the HEAD up and down your commit history.

Before, we go further, let’s tag this sixth commit as 2.0:

And update our spreadsheet:

Spreadsheet Contents after sixth commit and second tag

You can list all your tags and the commit IDs that they refer to by running the following command:

This should give you an output similar to:

List of tags with Commit IDs

Summary

  1. git tracks your code with a pointer called HEAD
  2. When you commit, you create a snapshot of your code with a unique commit ID which is a long alphanumeric string.
  3. When you checkout a particular commit, you move the HEAD (git’s pointer) to that commit. You can move up and down your repository’s history (or commit history) which is a straight line.
  4. Tags are just human readable refs( references = names = labels) for a single commit because the commit ID is difficult to work with, for humans.

In the next post, we take this tutorial further and try and understand how branching works.

What do you think?