Essential Git – Part 2

Be sure to check out the last post. In the intro I explain that we’ll be experimenting with some cool technologies to, for example, setup a webserver from scratch the coming weeks. But first we need a good grasp on the basics, and the basis of this all is git. Git is not only the basis for this project of ours, but for all your future work. I can’t imagine any serious IT work without a form of version control. But don’t take my word for it, just look at the latest stackoverflow.com Developer Survey about the most used tools.

Recap

A small and fast recap of last week.

After installing Git (version 2.28 and up), apply the essential configuration:

$ git config --global init.defaultBranch main
$ git config --global user.name "Henk Batelaan"
$ git config --global user.email [email protected]
$ git config --global pull.rebase false

Start a new project locally:

$ mkdir fast-test && cd fast-test
$ git init
$ git status

Set a remote after the necessary steps at GitLab and SSH keys/access token setup:

$ git remote add origin [email protected]:iohenkies/fast-test.git

Add, commit and push a new file:

$ echo This is going to be my first file > test.txt
$ git add .
$ git commit -am "This is our first commit"
$ git push -u origin main
$ git status

That’s it! You can trash this fast-test locally and at GitLab or leave it for further testing.

Branching

This last section is a nice enough steppingstone to this section: branching. As every VCS, git has something called branches. This is the concept where you divert from the main branch and continue developing on a separate branch that does not interfere with the main one.

As the simplest example (but not the most correct one per se) you can imagine that your main branch holds your production code that is stable and can be deployed at any time, and from that branch you create feature, release and bug-fix branches that are eventually merged back to main. There are probably a couple of concepts there that I need to expand on some more.

First, by default a git clone will clone all remote branches locally. Still with our Terraform example from last week let’s look at these branches:

$ git branch
* main

Hmm so this only shows the main branch. Aren’t there any other branches then? Look at Terraform GitHub page and at the top left the branch name ‘main’. This shows there are other branches, so why do they don’t show up locally? Well, they need to be ‘checked out’, that is what it is called when you switch to another branch.

$ git checkout v0.15
Branch 'v0.15' set up to track remote branch 'v0.15' from 'origin'.
Switched to a new branch 'v0.15'
$ git pull
Already up to date.

We’ve switched to the branch of the previous major release. The git pull is not necessary normally, but I did that to show that the data in that branch was already local. You can see all remote branches with

$ git branch -r

and all local and remote branches with

$ git branch -a

For larger projects or use in CI/CD pipelines for instance, it comes in handy to clone just one specific branch instead of pulling all data locally. For example:

$ cd
$ git clone https://github.com/hashicorp/terraform --branch v0.15 --single-branch
$ cd terraform
$ git branch -a

Only the v0.15 branch will be cloned locally. We can take this even one step further, certainly used a lot in CI/CD pipelines, and that is to only grab the last commit:

$ git clone https://github.com/hashicorp/terraform --branch v0.15 --single-branch --depth 1

Think ‘snapshots’ as touched upon in the previous article. It grabs the last snapshot and gives you a 100% working copy of the code at the smallest possible size. This gets a little bit murky when dealing with merge commits, but that’s outside the scope of this post.

Branching strategy

 With branching there comes the seemingly endless debate on branching strategy: when are we branching, when are we merging branches, when are we tagging, when are we merging to main, etc. I will not give you the answers, but I can give you a little guidance. It really depends on the software being developed, the company, the team, etc.

The consensus is that your branching strategy should be build on 3 key concepts:

  • Use feature branches for all new features and bug fixes
  • Merge feature branches into the main branch using merge/pull requests
  • Keep an up-to-date main branch that is always ready to be deployed

This is a good starting point for sure, but there are lot of deviations, especially on the first bullet. See below.

I believe the first real thought-out strategy was the Git Flow strategy. I’ve used it at a couple of companies and although it really is very well thought out, it is complex and probably too complex for most use cases.

Then GitHub came up with GitHub flow. Its main goal was to simplify the branching strategy and still guard the fact that there should be high quality, deployable code in the main branch at all times. It was thought out around 10 years ago and still recommended at GitHub. At the companies I’ve worked for, this is the most used strategy by far.

Then there is GitLab. I can tell you that I like most parts of the GitLab flow strategy. With GitLab flow, all features and fixes go to the main branch while enabling production and stable branches. In my experience not a lot of companies seem to have adopted it, but I like the use of a production branch and/or environment branches a lot.

Pull/merge requests

Now that we know a bit more about branching, I think it is a good time to talk pull requests (PR) and merge requests (MR). You’ve probably heard about one of these, or both. Most asked question: what is the difference between the two?

Glad to tell you that there is no difference. Both are requests to merge a certain branch with the main branch. The term pull requests is being used by GitHub where merge requests are in use at GitLab. In my opinion merge requests are the better name looking at what is actually happening, but it probably depends on who you are asking.

In any case, below is an example of the workflow when dealing with PRs/MRs.

  • You’re working in a team, and you oversee a new Terraform feature where you’ll be deploying a container in Azure blob storage
  • You’ve cloned the appropriate code repo locally. In small and personal projects, you edit the code, commit and push to the main branch and are done
  • In more and more organizations though, the main branch is protected:
    • This means as much as that you cannot make changes to the main branch directly
    • This protection has been setup by the repo administrator and is policy in most larger organizations and teams
    • So how can we merge our code? This is where PRs/MRs come in
  • Locally you create a new feature branch:
$ git branch feature/add-azureblob-container
$ git checkout feature/add-azureblob-container

or in one go

$ git checkout -b feature/add-azureblob-container
  • You create and test your work and add, commit and push your changes
$ git add .
$ git commit -am "Last edits to README. Finally done with Azure blob"
$ git push -u origin feature/add-azureblob-container
  • Almost there. This pushes to the remote repo, but only to the feature/add-azureblob-container branch
    • Now we create a PR/MR from the GitHub/GitLab UI with the source your feature branch and the target the main branch
    • One or more teammates will look at the proposed code changes and when all potential discussions are out of the way and possible pipelines succeed, approve your PR/MR
    • You can now merge your changes to main
    • Main can be deployed to production (maybe it should be tagged first, depending on e.g. branching strategy as discussed earlier)

When working with pipelines you can do all sorts of cool stuff with all of this (i.e. use certain operations as triggers for certain pipeline stages), but it’s too far out of scope to handle this here.

Tags

Another often used trigger in pipelines is git tags. A tag is to mark a specific point in a repo’s history, marking it as being important. Unlike a branch, a tag does not change. It points directly to a specific commit in the history and will not change unless explicitly updated.

Of course, tagging is most used in marking a software release, like v2.0.1 and 12.5.1 (both using semantic versioning). To be honest I mostly tag release in the GitLab UI since a pipeline follows often which I would like to check the progress of, but you can tag from the CLI as well:

$ git tag -a v1.7.1 -m "Updated to the latest Kubernetes version (1.21)"

This will create an annotated tag with version and release notes. Besides annotated tags you can also create lightweight tags (which I have never used). Lightweight tags are created with the absence of the -a, -s, or -m options.

More info on your tags:

$ git tag -l
$ git tag -l "v1.7*"
$ git show v1.7.1

Delete a tag:

$ git tag -d v1.7.1
$ git push -u origin --delete v1.7.1

Amend a tag to an older commit via the commit checksum:

$ git tag -a v1.2.4 af15ecb

I think there is nothing else to be worth mentioning about tags. They are used for versioning and often pipeline triggers. Not unimportantly to know is that there are also release strategies where no tags are used at all.

Final words

Proofreading the Git posts I’m sure I covered a lot but also discovered that I might missed a few bits and pieces that I wanted to address but forgot along the way. The concept of HEAD comes to mind and basic merging as well (which we probably don’t need when working with PRs/MRs). But also merge conflicts and troubleshooting. Git submodules are also a nice one to dive into. I added some links there for your convenience. Maybe these are topics for a next time. Anyway, I’m convinced this will lay a solid base for your future git powered projects.

Starting out these posts, the general planning was to setup a webserver with some Infrastructure as Code tools. Along the way I came up with all sort of other stuff to cover, so let’s just see how it goes the coming weeks.

For now the plan is that next time I’ll introduce you to a bit of cloud computing. I’ve been using Digital Ocean and Hetzner cloud often myself, but for real cloud native purposes I’ll probably need to switch to AWS, Azure or GCP. So, let’s take a look at what the options are, where are we going to run this webserver of ours and why, etcetera.