The coming weeks we’ll be experimenting with some cool technologies to, for example, setup a webserver from scratch with a couple Infrastructure as Code tools and configuration management tools, websites served by Docker containers, a reverse proxy based on Traefik and a couple of different VPS and cloud providers. The basis of this all is git.
Introduction
Git is a source control management (SCM) system developed by the grandfather of Linux, Linus Torvalds. He needed something better than the SCM system he used at the time, so he developed it himself. Initially he was not too proud of his development: he called it ‘git’ which is slang for unpleasant person, and he left a nice message in the first line of the man page. Proud or not, it’s now the most used SCM system in the world.
So, what is source control, or also called version control? Most simply put it is a way to keep track of your code. You put your code in a local repository, you keep track of your changes, you commit your code into source control and eventually push your code to a remote repository. From there other people from your team or the world can access the code and all sorts of other magic can happen automatically, but this is for future posts.
Get started
Getting started is easy. Just setup a GitLab or GitHub account to be able to get your code online and share it with yourself or others. As soon as you’ve got an account let’s install it on your local machine.
Mac users (we’re using Brew):
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
$ brew install git tree
Enterprise (8) Linux users:
$ sudo dnf install git tree vim
Debian/Ubuntu Linux users:
$ sudo apt install git tree vim
From here all commands should be the same.
$ git version
Initialize
Let’s setup our example project.
Since recently, people like to call the first created branch main instead of master:
$ git config --global init.defaultBranch main
You’ll need version 2.28 or up to do so, or else you’ll need to change this later with:
$ git checkout master $ git branch -m main
Alternatively leave the branch name master. More on config settings in the next section.
Setup your first local repo:
$ mkdir webserver && cd webserver $ git init $ tree .git
So, we’ve obviously created a directory and initialized it as a new git repo. With the earlier installed tree we can have a look what exactly gets created. Although we won’t touch this stuff directly, git will use this as its database: what files does it keep track of, which files are committed, what are our branches, where is HEAD, etc.
The .git directory: know it’s there, what it does and where it’s for.
Config
I’ve been using git for years and although there are quite some config options to set, it seems I always only set a hand full (including the most recent ‘main branch’ one as explained above):
$ git config --global user.name "Henk Batelaan" $ git config --global user.email "[email protected]" $ git config --global pull.rebase false
This is the bare minimum to set and make your git commit messages useful.
The –global flag sets these options for every repo for the logged in user. Simply omit it to set it only for the repo you’re working on currently. In rare occasions –system can be used to set it for every repo of every user on the system.
Check out your settings:
$ git config --list
Edit your settings manually:
$ git config --global --edit
I honestly think this is all you need to know 98% of the time, but to check out more:
$ man git config
Basics
OK, we’ve setup an empty repo and some config, but how does git work exactly? Essentially, Git handles content in snapshots and knows how to apply or roll back the changesets between two snapshots. Each commit essentially creates a snapshot. Let’s visualize this.
Working in our local webserver repo created earlier, first see what is going on:
$ git status On branch main nothing to commit, working tree clean
To be expected. Let’s add a file and check the status again:
$ echo This is going to be my first file > test.txt $ git status On branch main Untracked files: (use "git add <file>..." to include in what will be committed) test.txt
What you can see here, is that git knows of the new file, but won’t do anything with it. You must add it to the git staging area, also called index:
$ git add test.txt $ git status On branch main Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: test.txt
The file is staged now, added to the .git/index file. Let’s speed things up:
$ touch configuration_0{1..4}.conf $ git add . $ git status On branch main Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: configuration_01.conf new file: configuration_02.conf new file: configuration_03.conf new file: configuration_04.conf new file: test.txt
We’ve got 5 new files in total in the staging area. With the git add . (the dot representing the current directory) we made sure all new files were staged all at once. Adding files to the staging area like this is only needed with new and deleted files. Already tracked files don’t need to be added; they are already on git’s radar.
Let’s commit our changes to the local repository and create our first snapshot as explained above:
$ git commit -am "This is our first commit" $ git status On branch main nothing to commit, working tree clean
There are a lot of ways to commit changes, but the above is simple and in most cases sufficient. I like to immediately comment my commit (the -m flag) but there are people who want to get prompted (omit the -m flag). In any case all your commits need a comment.
It’s a bit out of scope for this post but make these comments count. Don’t simply call them ‘changes’ or ‘troubleshooting’ or whatever. Make sure your teammates (or yourself in a few weeks from now) can, when checking the git log, reasonably assess what has been committed where.
Check out all options for adding and committing files:
$ man git add $ man git commit
Anyway, back on track, finally check the git log*:
$ git log commit d2cf522c350a9f05a5b501b64c75d65f01d6b432 (HEAD -> main) Author: Henk Batelaan <[email protected]> Date: Mon Aug 16 20:20:41 2021 +0200 This is our first commit
You’ve committed your first files.
Remotes
This is all good and well but working in a team this is still useless: you’ll need a remote place to store your code and not only the local repository we have now.
In come remotes and the earlier mentioned GitLab or GitHub accounts. GitHub is most widely used, I’m more of a GitLab guy so I’ll be using this in the examples.
You’ll need repository access from the command line, and this is best done in one of two ways:
- SSH key authentication
- Access token authentication
The first is probably simplest and most secure. If you don’t have a SSH key already, let’s quickly create one:
$ ssh-keygen -C "Henk Batelaan" -b 8192 -t rsa
Follow and ‘Enter’ the prompts if this is your first and only key. The above command creates a secure private and public key. Grab the public key from
~/.ssh/id_rsa.pub
We need it in the GitLab portal:
- Login with your account
- On the top right click your profile picture
- In the left bar click ‘SSH Keys’
- Paste your public key and click save
Alternatively, you can setup access token authentication. Since this changes all future git URLs in this post, I won’t lay out how to do so. You can find a couple of easy-to-follow steps here if you’re interested.
Now back to remotes. Check out the current remote:
$ git remote -v
You’ll notice it turns up empty because we don’t have any. Create a new one at gitlab.com
- Login with your account
- At the top right click ‘New project’
- Choose ‘Create blank project’
- Fill in the details, these should be self explanatory
- Disable ‘Initialize repository with a README’ and create the project
When you disable the creation of the README the next screen will give you excellent pointers how to push your existing, local repo to this new project (and gives you other options as well).
For our example I simply called the project webserver and use this to get the code online:
$ cd webserver $ git remote add origin [email protected]:iohenkies/webserver.git $ git push -u origin main
If SSH keys are setup correctly, these steps will push your local repo data to your remote repo. Check it out at the GitLab portal.
Getting remote content
Often, you’ll need to pull data from remote Git repositories instead of creating them on your own like we did until this point. We can get our hand on this data in a couple of ways.
Let’s take the Terraform repo at GitHub and get the code. As you can see in the screenshot, there is a button in the top right called ‘Code’, for GitLab repos this button is called ‘Clone’. For public repos you’ll find a https address that you need. From your terminal simply clone it locally with:
$ cd $ git clone https://github.com/hashicorp/terraform $ cd terraform
You now have the source code of the Terraform main branch on your computer and this is basically the way to get remote content locally. Now when you’ve already cloned a repo before and want to get the latest remote changes, use
$ git pull
in the folder where you have the repo locally. You can also
$ git fetch
to check out the remote changes but not actually merge them with your local data. I believe I’ve never used this feature myself but know it’s there.
The above commands all keep using the original repo as their base. You clone the code locally and branch, commit, push and merge your hard work back to the repo. There is also an alternative and that is ‘forking’ the repo. This is done from the GitLab/GitHub UI and creates a completely independent code repository that you have full ownership off. Changes will not be merged back into the original repo you forked from. Within organizations/teams this is not used a lot, but in the open-source space it’s very common.
Among famous open-source forks are Ubuntu, which was once a fork of Debian, MariaDB, which was a fork from MySQL and Nextcloud, which was a fork of Owncloud.
Wrapping up
This post set us up with a repository that is local and remote and explained a few things in the process. Next time I’ll elaborate quite a bit and try to get more practical with branches, Git branching strategies, merge requests and what not. Stay tuned.
I’m a passionate, communicative go-getter and highly motivated to build, maintain and improve a stable and effective IT infrastructure at different sized companies. My hearth is with open source, Linux, DevOps, Kubernetes and everything that is cloud native.