Written 07 August 2023 ~ 16 min read

Managing GitHub with Terraform (and GitHub!)

Written by

Benjamin Clark

Part 2 of 4 in Infrastructure as Code (IaC)

Introduction

An XKCD comic on git commits — “Git Commit“ (2013) XKCD. Available at: https://xkcd.com/1296/ (Accessed: 31st July 2023)

As my quest to refactor monsternames kind of runs out of control - spawning a whole GitHub organisation, several (as yet to be published) blog posts, and some fun projects I came across several problems:

I needed many GitHub repositories to host all the various refactored aspects and new bits of code
I needed uniform branch protection if I wanted to open source this stuff
I needed a codeowner file from the first commit
I needed to fix configuration drift if anything bad happened

Perhaps more importantly, all of this I’m doing in my spare time outside of my main (read: rent-paying and pastry-buying) work so I needed a way to fulfil the above whilst still leaving enough spare time to cycle, play Badminton (badly), paint miniatures (also badly), and play 20-year old Total War games on a 2-year old laptop.

So I figured I’d use Terraform.

Since writing this article, I’ve announced that the underlying codebase for this is now live. So check that out for links to the sudoblark.terraform.github codebase in its entirety.

The Goal

Have a single GitHub repository as the single source-of-truth for the GitHub configuration of the Sudoblark Organisation
Enforce a sensible, default, branch protections across all repositories
Enforce a sensible, default, codeowners file across all repositories
- But also allow additional codeowners to apply on a per repo basis
Ensure all repositories are created with the above CODEOWNERS and a sensible LICENSE.txt depending on if they should be open or closed source
Use a standard Terraform workflow with GitHub actions, like we did for AWS in a previous blog post
Run applies on a regular basis to fix configuration drift
Use a dedicated bot account for actions against the Sudoblark Ltd organisation

The actual setup

Prerequisites

Before beginning work on the implementation, I created the following

The sudoblark.terrform.github repository within the Sudoblark organisation to act as the single source of truth.
As my first commit to create the main branch I created a.github/CODEOWNERS file - no reverting back to a previous commit without CODEOWNERS for anyone - a LICENSE.txt (read more on what this is via GitHub’s own article here), and a sensible default .gitignore.
As per my post (read the prerequisites section there for exact details) setting up Terraform from scratch for AWS I also had in place:
- tfenv
- AWSCLI
- A sudoblark profile block
- An S3 bucket to store terraform state
A new GitHub user to act as the bot, adding them as admin to the organisation manually

This meant the state of play was as follows:

Whilst the contents of CODEOWNERS just makes my own GitHub account default for everything (for now).

Terraform

Setting to work on the terraform - and taking into account the relatively basic requirements (i.e. no teams needed as it’s still just me at the moment) - I set to work on making a mono repository for GitHub based on the GitHub provider module for Terraform. This resulted in:

A repository module within the mono repository
Instantiations of said repository module

We’ll cover both below.

Repository Module

Creation

The structure, as we’re using a mono repository, is quite simple; we just make a modules folder, and in there add our repository module which will do all the setup of a single repository.

Whilst our content is as per below:

 % ls -la
total 72
drwxr-xr-x  11 bclark  staff   352 Aug  6 20:26 .
drwxr-xr-x   3 bclark  staff    96 Aug  1 00:28 ..
-rw-r--r-- 1 bclark  staff   570 Aug  6 19:20 branch_protection.tf
-rw-r--r-- 1 bclark  staff   211 Aug  6 18:57 closed_source.txt
-rw-r--r-- 1 bclark  staff   277 Aug  6 19:03 data.tf
-rw-r--r-- 1 bclark  staff   995 Aug  6 20:26 file.tf
-rw-r--r-- 1 bclark  staff   350 Aug  6 20:26 locals.tf
-rw-r--r-- 1 bclark  staff   124 Aug  1 00:30 main.tf
-rw-r--r-- 1 bclark  staff  1490 Aug  6 18:57 open_source.txt
-rw-r--r-- 1 bclark  staff   549 Aug  6 19:23 repository.tf
-rw-r--r-- 1 bclark  staff  1025 Aug  6 20:18 variables.tf

variables.tf

Here we define the inputs for the module, and sensible defaults:

variable "name" {
  description = "Name of the repository to be created."
  type        = string
}

variable "description" {
  description = "Description we should give to the repository"
  type        = string
}

variable "archived" {
  description = "If the repository is archived or not."
  type        = bool
  default     = false
}

variable "topics" {
  default     = null
  description = "An array of topic names to be added to the repository"
  type        = list(string)
}

variable "visibility" {
  default     = "private"
  description = "Determines repository visibility"
  type        = string
  validation {
    condition = anytrue([
      var.visibility == "private",
      var.visibility == "public"
    ])
    error_message = "Visibility must be either 'private' or 'public'"
  }
}

variable "open_source" {
  description = "If the repository should be licensed as open source or not."
  type        = bool
  default     = false
}

variable "codeowners" {
  description = "Additional codeowners for the repository"
  type        = list(string)
  default     = null
}

locals.tf

Here, we transform our variables - where needed - to the values we actually want to use within the module and define default codeowners across the organisation. For this module, we simply want to take the list of codeowners we get given and transform them into valid CODEOWNER file entries, whilst codeowners are hard-coded to GitHub usernames. In larger organisations, it’d probably be better to manage this via teams but it’s suitable enough for my needs.

locals {
  codeowners        = var.codeowners == null ? null : formatlist("# Codeowner managed by sudoblark.terraform.github\n* %s", var.codeowners)
  codeowners_string = var.codeowners == null ? null : join("\n", local.codeowners)

  default_codeowners = [
    "* @benjaminlukeclark"
  ]
  default_codeowners_string = join("\n", local.default_codeowners)
}

A bit of explanation may be needed here. The var.codeowners == null ? <true> : <false> syntax is how terraform implements ternary operators, although it calls it a conditional expression. So, put simply, if var.codeowners is null then local.codeowners and local.codeowners_string is also null but if not then we dynamically create valid codeowner entries. We’ll cover some examples in file.tf.

main.tf

This simply defines our required providers, required as the GitHub provider has an unexpected feature.

terraform {
  required_providers {
    github = {
      source  = "integrations/github"
      version = "~> 4.0"
    }
  }
}

.txt files

These define default licensing for Sudoblark Ltd GitHub repositories, based on if they’re open or closed source.

data.tf

This defines data needed for the module. As branch_protection.tf relies on the main branch being created - which takes a while - we set references to the main branch here such that terraform waits until that branch has been auto-created. If we didn’t do this, then downstream dependencies with hard-coded references to main might fail as they would be referencing an object that has not yet been created via the GitHub RESTAPI. So instead, we set this reference via a data object and implicit dependency detection does the rest.

data "github_branch" "main" {
  # Get reference to main branch for repository, which should be created automatically.
  # So using this ensures we wait till repo is fully created before doing more things.
  repository = github_repository.repository.name
  branch     = "main"
}

repository.tf

Perhaps the most important file. This actually makes the repository, with some hardcoded defaults (which we want to hard-code across the entire organisation, but are very easy to amend if we need to in the future via simple Boolean variables)

resource "github_repository" "repository" {
  name                   = var.name
  description            = format("%s%s", var.description, " - repo managed by sudoblark.terraform.github")
  allow_squash_merge     = true
  allow_rebase_merge     = false
  archived               = var.archived
  auto_init              = true
  delete_branch_on_merge = true
  topics                 = var.topics == null ? null : var.topics
  visibility             = var.visibility
  has_downloads          = true
  has_projects           = true
  has_issues             = true
  has_wiki               = true
}

Notice the description. Here, we’re making it abundantly clear how this repository is managed in case anyone comes across it and wonders how to change merge request settings, license files, codeowners etc.

branch_protection.tf

Here, we protect the main branch of our repository. You’ll be horribly surprised how many places don’t protect their main branch. For now it’s hard-coded, but it’d be simple enough to have this resource take in a map of maps (similar to Python’s ability to do nested dictionaries) of any number of branch protection rules on a per repo basis. For example, you could have default main branch protections but protect additional development branches for some repos, for others require more reviewers, etc. But perfection is the enemy of progress, and for my particular use case I can just hard-code the main protections and improve iteratively if needed:

resource "github_branch_protection" "main" {
  repository_id                   = github_repository.repository.node_id
  allows_deletions                = false
  enforce_admins                  = false
  pattern                         = data.github_branch.main.branch
  require_conversation_resolution = true
  required_linear_history         = false

  required_status_checks {
    strict = true
  }

  required_pull_request_reviews {
    dismiss_stale_reviews           = true
    require_code_owner_reviews      = true
    required_approving_review_count = 1
  }
}

file.tf

Here, we create LICENSE.txt and CODEOWNER files for every repository based on var.open_source and the calculated codeowner_string items in local.tf from earlier:

resource "github_repository_file" "license" {
  repository          = github_repository.repository.name
  branch              = data.github_branch.main.branch
  file                = "LICENSE.txt"
  content             = var.open_source ? file("${path.module}/open_source.txt") : file("${path.module}/closed_source.txt")
  commit_message      = "license.txt - managed by sudoblark.terraform.github"
  overwrite_on_create = true
}

resource "github_repository_file" "repository_codeowners_file" {
  repository          = github_repository.repository.name
  branch              = data.github_branch.main.branch
  file                = ".github/CODEOWNERS"
  content             = <<EOT
# Default owners for all the things in the repo - managed by sudoblark.terraform.github
${local.default_codeowners_string}
%{if local.codeowners_string != null}
${local.codeowners_string}
%{endif~}
  EOT
  commit_message      = "CODEOWNERS - managed by sudoblark.terraform.github"
  overwrite_on_create = true
}

For the license file, we use another ternary operator to use the content of open_source.txt if var.open_source is true else we use the content of closed_source.txt, whilst $(path.module) allows us to ensure these files are local to the repository module and not accidentally inject unknown text into our license files. If we wanted to open this up to allow custom licensing on a per repository basis this would be easy enough with similar logic to how we calculate codeowners in locals.tf.

EOT is Terraform’s version of a Here document. It lets us define a multi-line string in a readable manner, within which we’re using string templates to simply inject the multi-line strings generated in locals.tf. That’s a bit of a mouthful, so here’s a concrete example:

With no var.codeowners defined a license file looks similar to below:

  # module.repositories["terraform.aws"].github_repository_file.repository_codeowners_file will be created
  + resource "github_repository_file" "repository_codeowners_file" {
      + branch              = "main"
      + commit_author       = (known after apply)
      + commit_email        = (known after apply)
      + commit_message      = "CODEOWNERS - managed by sudoblark.terraform.github"
      + commit_sha          = (known after apply)
      + content             = <<-EOT
            # Default owners for all the things in the repo - managed by sudoblark.terraform.github
            * @benjaminlukeclark
        EOT
      + file                = ".github/CODEOWNERS"
      + id                  = (known after apply)
      + overwrite_on_create = true
      + repository          = "terraform.aws"
      + sha                 = (known after apply)
    }

Whilst if var.codeowners has items in its list, our locals.tf will dynamically generate a multi-line string with each codeowner on a new line. Which may result in a file similar to the below:

  # module.repositories["monsternames.open-api"].github_repository_file.repository_codeowners_file will be created
  + resource "github_repository_file" "repository_codeowners_file" {
      + branch              = "main"
      + commit_author       = (known after apply)
      + commit_email        = (known after apply)
      + commit_message      = "CODEOWNERS - managed by sudoblark.terraform.github"
      + commit_sha          = (known after apply)
      + content             = <<-EOT
            # Default owners for all the things in the repo - managed by sudoblark.terraform.github
            * @benjaminlukeclark
            
            # Codeowner managed by sudoblark.terraform.github
            * hello
            # Codeowner managed by sudoblark.terraform.github
            * world
        EOT
      + file                = ".github/CODEOWNERS"
      + id                  = (known after apply)
      + overwrite_on_create = true
      + repository          = "monsternames.open-api"
      + sha                 = (known after apply)
    }

Instantiation

Using this newly created module is extremely easy. The sudoblark.terraform.github repository simply has three .tf files in the root:

% ls -la | grep .tf
-rw-r--r-- 1 bclark  staff   609 Aug  6 21:11 locals.tf
-rw-r--r-- 1 bclark  staff   557 Aug  6 20:43 main.tf
-rw-r--r-- 1 bclark  staff   397 Aug  6 20:43 repositories.tf

main.tf is a standard setup to use s3 as remote state storage, so no need to cover that.

locals.tf defines all of our repositories we wish to create in a complex data structure:

locals {
  repositories = {
    "sudoblark.terraform.github" : {
      description : "Terraform setup for Sudoblark GitHub",
      topics : ["terraform", "github", "iac"]
    },
    "monsternames.open-api" : {
      description : "OpenAPI definition for the monsternames RESTAPI",
      topics : ["open-api"]
    },
    "terraform.aws" : {
      description : "Terraform setup for Sudoblark AWS",
      topics : ["terraform", "aws", "iac"]
    },
    "aws.lambda.dynamoapi" : {
      description : "A generic Python lambda which supports CRUD operations on top of DynamoDB",
      topics : ["aws"]
    }
  }
}

Which means our actual terraform is simple and doesn’t need any real explanation:

module "repositories" {
  for_each    = local.repositories
  source      = "./modules/repository"
  name        = each.key
  description = each.value.description
  topics      = try(each.value.topics, [])
  visibility  = try(each.value.visibility, "private")
  archived    = try(each.value.archived, false)
  codeowners  = try(each.value.codeowners, null)
  providers = {
    github = github
  }
}

And with that, we’re ready to manage all of Sudoblark’s GitHub with terraform locally… but that’s a bit rubbish. Lets see how we can implement GitOps using CI/CD next.

CI/CD

The core workflow - what to do on an open merge request or merge to main - is quite literally lifted and shifted from the setup I made which manages AWS via Git, Terraform and CI/CD. I made a post on that, so read that to understand the core setup.

The only differences with this use-case is that we wish to fix configuration drift on a regular basis, and we need the workflow to have access to all of the GitHub organisation. We already have a workflow which ensures the real-world matches expected state; we do it on a merge to main. So we can just add a cron schedule to that, allow manual triggering just in case, and ignore changes to the files the workflow itself may amend to prevent the workflow accidentally trigger itself if changes are needed on the sudoblark.terraform.github repository.

on:
  workflow_dispatch:
  schedule:
    - cron: 0 0 * * *
  push:
    branches:
      - main
    paths-ignore:
      - '.github/**'
      - 'LICENSE.txt'

Then, to make sure we don’t accidentally delete all of our repositories upon any such refresh we can follow Hashicorp’s advice and add a lifecycle policy to our github_repository and github_branch_protection blocks to prevent them being deleting upon an apply:

  lifecycle {
    prevent_destroy = true
  }

And finally, we create a new access token (for sudoblark-bot) and add it to our GitHub organisation:

And amend our workflows slightly to both:

Allowing switching between the repository-targeting and org-targeting tokens
Perform said targeting when appropriate

Which results in the two workflow files below:

commit-to-pr.yaml

name: Terraform checks on pull request

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.SUDOBLARK_AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.SUDOBLARK_AWS_ACCESS_KEY_VALUE }}
  AWS_DEFAULT_REGION: eu-west-2
  # Automatically generated token unique to this repo per workflow execution
  REPO_GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  # Explicitly provided token in ORG to access entire organisation
  ORG_GITHUB_TOKEN: ${{ secrets.SUDOBLARK_GITHUB_TOKEN }}

on: [pull_request]

permissions:
  contents: read
  pull-requests: write

jobs:
  validation:
    name: Terraform validate
    runs-on: ubuntu-20.04
    steps:
      - uses: actions/checkout@v3
        env:
          GITHUB_TOKEN: ${{ env.REPO_GITHUB_TOKEN }}

      - name: terraform validate
        uses: dflook/terraform-validate@v1

  linting:
    name: Terraform lint
    runs-on: ubuntu-20.04
    steps:
      - uses: actions/checkout@v3
        env:
          GITHUB_TOKEN: ${{ env.REPO_GITHUB_TOKEN }}

      - name: Install tflint
        uses: terraform-linters/setup-tflint@v3

      - name: Run tflint
        run: tflint

  plan:
    name: Terraform plan
    runs-on: ubuntu-20.04
    needs: [validation, linting]
    steps:
      - uses: actions/checkout@v3
        env:
          GITHUB_TOKEN: ${{ env.REPO_GITHUB_TOKEN }}

      - id: install-aws-cli
        uses: unfor19/install-aws-cli-action@v1

      - name: terraform plan
        uses: dflook/terraform-plan@v1
        env:
          GITHUB_TOKEN: ${{ env.ORG_GITHUB_TOKEN }}

apply.yaml

name: Terraform apply

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.SUDOBLARK_AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.SUDOBLARK_AWS_ACCESS_KEY_VALUE }}
  AWS_DEFAULT_REGION: eu-west-2
  # Explicitly provided token in ORG to access entire organisation
  GITHUB_TOKEN: ${{ secrets.SUDOBLARK_GITHUB_TOKEN }}

on:
  workflow_dispatch:
  schedule:
    - cron: 0 0 * * *
  push:
    branches:
      - main
    paths-ignore:
      - '.github/**'
      - 'LICENSE.txt'

permissions:
  contents: read
  pull-requests: write

jobs:
  apply:
    name: Terraform apply
    runs-on: ubuntu-20.04
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: terraform apply
        uses: dflook/terraform-apply@v1
        with:
          auto_approve: true

The Result

After importing my existing repositories by simply referencing them by name (including the repository that is used to manage repositories!):

terraform import "module.repositories[\"terraform.aws\"].github_repository.repository" "terraform.aws"
terraform import "module.repositories[\"monsternames.open-api\"].github_repository.repository" "monsternames.open-api"
terraform import "module.repositories[\"sudoblark.terraform.github\"].github_repository.repository" "sudoblark.terraform.github"

Our pull request works as expected:

And then, when we merge to main, changes are made:

And, thanks to our cron job, said apply will run on a regular schedule to fix configuration drift if it should ever occur:

Conclusion

So that’s it, we’ve managed to successfully:

Have a single repository in the Sudoblark GitHub to manage the Sudoblark GitHub (and, indeed, the repository even manages itself!)
Enforce default branch protection across all repositories
Enforce a default CODEOWNERS file across all repositories and allow this to be extended on a repo-by-repo basis
Ensure all repositories are made with a CODEOWNERS and LICENSE.txt file, with LICENSE.txt defaults for open vs closed source
Use a standard Terraform workflow with GitHub actions, like we did for AWS in a previous post, inclusive of swapping out organisational and repository-specific GITHUB_TOKEN values as appropriate
Run applies on a regular basis to fix configuration drift
Use a dedicated bot account for CI/CD actions which require GitHub organisational access
Carry out cost, quality, best-practice, validation and planning checks on pull requests and applies on merges to main

I hope you’ve enjoyed this little read, learned a thing or two, and had some good thoughts about you can extend Infrastructure-as-Code within your own organisation. I’m certainly happy as with this pattern in place I can rapidly switch back to refactoring monsternames-api and create some juicier blog posts in the future.

Part 2 of 4 in Infrastructure as Code (IaC)

← Previous Next →