Essential Git – Part 1

The coming weeks we’ll be experimenting with some cool technologies to, for example, setup a webserver from scratch with a couple Infrastructure as Code tools and configuration management tools, websites served by Docker containers, a reverse proxy based on Traefik and a couple of different VPS and cloud providers. The basis of this all is git.

Introduction

Git is a source control management (SCM) system developed by the grandfather of Linux, Linus Torvalds. He needed something better than the SCM system he used at the time, so he developed it himself. Initially he was not too proud of his development: he called it ‘git’ which is slang for unpleasant person, and he left a nice message in the first line of the man page. Proud or not, it’s now the most used SCM system in the world.

So, what is source control, or also called version control? Most simply put it is a way to keep track of your code. You put your code in a local repository, you keep track of your changes, you commit your code into source control and eventually push your code to a remote repository. From there other people from your team or the world can access the code and all sorts of other magic can happen automatically, but this is for future posts.

Get started

Getting started is easy. Just setup a GitLab or GitHub account to be able to get your code online and share it with yourself or others. As soon as you’ve got an account let’s install it on your local machine.

Mac users (we’re using Brew):

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
$ brew install git tree

Enterprise (8) Linux users:

$ sudo dnf install git tree vim

Debian/Ubuntu Linux users:

$ sudo apt install git tree vim

From here all commands should be the same.

$ git version
Initialize

Let’s setup our example project.

Since recently, people like to call the first created branch main instead of master:

$ git config --global init.defaultBranch main

You’ll need version 2.28 or up to do so, or else you’ll need to change this later with:

$ git checkout master
$ git branch -m main

Alternatively leave the branch name master. More on config settings in the next section.

Setup your first local repo:

$ mkdir webserver && cd webserver
$ git init
$ tree .git

So, we’ve obviously created a directory and initialized it as a new git repo. With the earlier installed tree we can have a look what exactly gets created. Although we won’t touch this stuff directly, git will use this as its database: what files does it keep track of, which files are committed, what are our branches, where is HEAD, etc.

The .git directory: know it’s there, what it does and where it’s for.

Config

I’ve been using git for years and although there are quite some config options to set, it seems I always only set a hand full (including the most recent ‘main branch’ one as explained above):

$ git config --global user.name "Henk Batelaan"
$ git config --global user.email "[email protected]"
$ git config --global pull.rebase false

This is the bare minimum to set and make your git commit messages useful.

The –global flag sets these options for every repo for the logged in user. Simply omit it to set it only for the repo you’re working on currently. In rare occasions –system can be used to set it for every repo of every user on the system.

Check out your settings:

$ git config --list

Edit your settings manually:

$ git config --global --edit

I honestly think this is all you need to know 98% of the time, but to check out more:

$ man git config
Basics

OK, we’ve setup an empty repo and some config, but how does git work exactly? Essentially, Git handles content in snapshots and knows how to apply or roll back the changesets between two snapshots. Each commit essentially creates a snapshot. Let’s visualize this.

Working in our local webserver repo created earlier, first see what is going on:

$ git status
On branch main
nothing to commit, working tree clean

To be expected. Let’s add a file and check the status again:

$ echo This is going to be my first file > test.txt
$ git status
On branch main
Untracked files:
    (use "git add <file>..." to include in what will be committed)
        test.txt

What you can see here, is that git  knows of the new file, but won’t do anything with it. You must add it to the git staging area, also called index:

$ git add test.txt
$ git status
On branch main
Changes to be committed:
     (use "git restore --staged <file>..." to unstage)
        new file:   test.txt

The file is staged now, added to the .git/index file. Let’s speed things up:

$ touch configuration_0{1..4}.conf
$ git add .
$ git status
On branch main
Changes to be committed:
     (use "git restore --staged <file>..." to unstage)
        new file:   configuration_01.conf
        new file:   configuration_02.conf
        new file:   configuration_03.conf
        new file:   configuration_04.conf
        new file:   test.txt

We’ve got 5 new files in total in the staging area. With the git add . (the dot representing the current directory) we made sure all new files were staged all at once. Adding files to the staging area like this is only needed with new and deleted files. Already tracked files don’t need to be added; they are already on git’s radar.

Let’s commit our changes to the local repository and create our first snapshot as explained above:

$ git commit -am "This is our first commit"
$ git status
On branch main
nothing to commit, working tree clean

There are a lot of ways to commit changes, but the above is simple and in most cases sufficient. I like to immediately comment my commit (the -m flag) but there are people who want to get prompted (omit the -m flag). In any case all your commits need a comment.

It’s a bit out of scope for this post but make these comments count. Don’t simply call them ‘changes’ or ‘troubleshooting’ or whatever. Make sure your teammates (or yourself in a few weeks from now) can, when checking the git log, reasonably assess what has been committed where.

Check out all options for adding and committing files:

$ man git add
$ man git commit

Anyway, back on track, finally check the git log*:

$ git log
commit d2cf522c350a9f05a5b501b64c75d65f01d6b432 (HEAD -> main)
Author: Henk Batelaan <[email protected]>
Date:   Mon Aug 16 20:20:41 2021 +0200

     This is our first commit

You’ve committed your first files.

Remotes

This is all good and well but working in a team this is still useless: you’ll need a remote place to store your code and not only the local repository we have now.
In come remotes and the earlier mentioned GitLab or GitHub accounts. GitHub is most widely used, I’m more of a GitLab guy so I’ll be using this in the examples.

You’ll need repository access from the command line, and this is best done in one of two ways:

  1. SSH key authentication
  2. Access token authentication

The first is probably simplest and most secure. If you don’t have a SSH key already, let’s quickly create one:

$ ssh-keygen -C "Henk Batelaan" -b 8192 -t rsa

Follow and ‘Enter’ the prompts if this is your first and only key. The above command creates a secure private and public key. Grab the public key from

~/.ssh/id_rsa.pub

We need it in the GitLab portal:

  • Login with your account
  • On the top right click your profile picture
  • In the left bar click ‘SSH Keys’
  • Paste your public key and click save

Alternatively, you can setup access token authentication. Since this changes all future git URLs in this post, I won’t lay out how to do so. You can find a couple of easy-to-follow steps here if you’re interested.

Now back to remotes. Check out the current remote:

$ git remote -v

You’ll notice it turns up empty because we don’t have any. Create a new one at gitlab.com

  • Login with your account
  • At the top right click ‘New project’
  • Choose ‘Create blank project’
  • Fill in the details, these should be self explanatory
  • Disable ‘Initialize repository with a README’ and create the project

When you disable the creation of the README the next screen will give you excellent pointers how to push your existing, local repo to this new project (and gives you other options as well).

For our example I simply called the project webserver and use this to get the code online:

$ cd webserver
$ git remote add origin [email protected]:iohenkies/webserver.git
$ git push -u origin main

If SSH keys are setup correctly, these steps will push your local repo data to your remote repo. Check it out at the GitLab portal.

Getting remote content

Often, you’ll need to pull data from remote Git repositories instead of creating them on your own like we did until this point. We can get our hand on this data in a couple of ways.

Let’s take the Terraform repo at GitHub and get the code. As you can see in the screenshot, there is a button in the top right called ‘Code’, for GitLab repos this button is called ‘Clone’. For public repos you’ll find a https address that you need. From your terminal simply clone it locally with:

$ cd
$ git clone https://github.com/hashicorp/terraform
$ cd terraform

You now have the source code of the Terraform main branch on your computer and this is basically the way to get remote content locally. Now when you’ve already cloned a repo before and want to get the latest remote changes, use

$ git pull

in the folder where you have the repo locally. You can also

$ git fetch

to check out the remote changes but not actually merge them with your local data. I believe I’ve never used this feature myself but know it’s there.

The above commands all keep using the original repo as their base. You clone the code locally and branch, commit, push and merge your hard work back to the repo. There is also an alternative and that is ‘forking’ the repo. This is done from the GitLab/GitHub UI and creates a completely independent code repository that you have full ownership off. Changes will not be merged back into the original repo you forked from. Within organizations/teams this is not used a lot, but in the open-source space it’s very common.

Among famous open-source forks are Ubuntu, which was once a fork of Debian, MariaDB, which was a fork from MySQL and Nextcloud, which was a fork of Owncloud.

Wrapping up

This post set us up with a repository that is local and remote and explained a few things in the process. Next time I’ll elaborate quite a bit and try to get more practical with branches, Git branching strategies, merge requests and what not. Stay tuned.