Written 07 August 2023 · Last edited 28 March 2026 ~ 13 min read
Managing GitHub with Terraform (and GitHub!)
Benjamin Clark
Part 2 of 3 in Infrastructure as Code (IaC)
This post was originally published in August 2023. It’s been rewritten from scratch in March 2026 to reflect three years of evolution — the three-tier data pattern, domain-grouped repository files, and the separation of repository management from organisation-level governance. The core pattern remains the same; the maturity and lessons learned are new.
Introduction
I built this pattern for clients before I built it for Sudoblark. When setting up GitHub for an enterprise organisation, the requirements are familiar: consistent branch protection across dozens of repositories, CODEOWNERS from day one, automated drift detection. After implementing it a few times the approach became second nature. So when I started expanding Sudoblark’s public presence — more conference talks, more open-source projects, a proper portfolio to show clients in pre-sales — I knew what to reach for.
Manual repository setup was the obstacle. Every time I had an idea for a demo or a new project, the first step was clicking through the same GitHub settings: configure branch protection, write CODEOWNERS, decide on licensing. After a few rounds of that, I stopped. Ideas sat in a notebook and never made it to GitHub because the bootstrap effort drained the excitement out of starting the actual work.
Putting the same Terraform pattern into Sudoblark changed that. New repositories went from thirty minutes of clicking to a few lines in a domain-grouped .tf file. The cognitive load dropped, and the demos started getting built again.
This pattern has changed significantly since I first published this post in August 2023. What was a single locals.tf file with four repositories is now a three-tier pattern with domain-grouped data files, a dedicated data module, and a separate repository for organisation-level governance. This rewrite reflects where the implementation actually is today.
The goal
- How to bootstrap Terraform GitHub management when the repository being managed doesn’t exist yet
- The three-tier pattern for organising repository definitions — and why domain-grouped files matter whether building solo or at enterprise scale (covered in depth in the three-tier Terraform data pattern post)
- Why drift detection is worth setting up even if it never catches anything
- When to split repository management from organisation-level governance into separate Terraform repositories
Prerequisites
- Working knowledge of Terraform (modules, state, providers)
- Familiarity with GitHub Actions workflows
- A general understanding of infrastructure-as-code concepts
- The setting up Terraform from scratch for AWS post covers the foundational CI/CD patterns used here
The Bootstrapping Problem
Before Terraform can manage a GitHub repository, that repository has to exist. Before commits can reach a repository, a branch has to exist. Before a branch can be protected, it has to have commits — so there’s no way around a manual first step.
The first few commits to sudoblark.terraform.github were deliberate groundwork rather than Terraform: a detailed README — enough context to pick the project back up after a month away — a .gitignore, CODEOWNERS to lock down merges, and a LICENSE. This established a working main branch with the basic repository hygiene that every other Sudoblark repository would eventually get automatically.
With the repository in place, I wrote the Terraform locally and ran terraform apply against a dummy repository to confirm it worked before touching any real infrastructure. Only once that was verified did I open a feature branch for CI/CD, then a separate PR to bring the repository under its own management.
That PR is also what the title’s parenthetical is about. Importing sudoblark.terraform.github into its own Terraform state:
terraform import 'module.repositories["sudoblark.terraform.github"].github_repository.repository' "sudoblark.terraform.github"
After that, the repository managing all Sudoblark repositories is itself a Sudoblark repository managed by Terraform. Changes to its own configuration go through the same pull request workflow as any other. Branch protection applies to the branch protection configuration. CODEOWNERS governs the file that governs CODEOWNERS.
Running imports before local applies had been verified, or before CI/CD was in place, would have made state harder to debug. Doing it in order — manual hygiene, local validation, CI/CD, self-management — each step had a clean foundation.
The Three-Tier Pattern
The structure has three distinct layers. The data layer (modules/data/) defines what repositories exist. The implementation layer (root repositories.tf) orchestrates their creation. The core module (modules/repository/) contains the actual GitHub resources. Each layer has a single job, and changes to one don’t require touching the others.
The data layer
modules/data/defaults.tf sets the organisation-wide baseline:
locals {
repository_prefix = "sudoblark."
append_prefix = true
visibility = "private"
open_source = false
archived = false
codeowners_entries = [
{
pattern = "*"
owners = ["@benjaminlukeclark"]
}
]
}
Individual domain files then define repository lists against that baseline. Each file maps to a discrete project or concern — repositories_bookshelf.tf for the Bookshelf product, repositories_monsternames.tf for the monsternames API, repositories_core_platform.tf for foundational Sudoblark infrastructure:
# modules/data/repositories_bookshelf.tf
locals {
repositories_bookshelf = [
{
name = "bookshelf.data-lake"
description = "Terraform module to manage the Bookshelf data lake..."
open_source = false
visibility = "private"
},
{
name = "bookshelf.frontend"
description = "Frontend application for the Bookshelf project..."
topics = ["flutter", "frontend", "bookshelf"]
}
]
}
modules/data/repositories.tf concatenates all domain lists and applies the prefix logic:
locals {
repositories = concat(
local.repositories_bookshelf,
local.repositories_ci_cd,
local.repositories_core_platform,
# ...
)
repositories_with_names = [
for repository in local.repositories : merge(repository, {
full_name = format(
"%s%s",
try(repository.append_prefix, local.append_prefix) ? local.repository_prefix : "",
repository.name
)
})
]
}
outputs.tf then exposes the assembled list alongside the defaults, so the implementation layer can reference them without reimplementing the logic.
The implementation layer
The root repositories.tf is deliberately thin — it calls the data module, then loops over the result:
module "data" {
source = "./modules/data"
}
module "repositories" {
for_each = { for repo in module.data.repositories : repo.full_name => repo }
source = "./modules/repository"
name = each.value.full_name
description = each.value.description
visibility = try(each.value.visibility, module.data.visibility)
archived = try(each.value.archived, module.data.archived)
codeowners_entries = concat(try(each.value.codeowners_entries, []), module.data.codeowners_entries)
open_source = try(each.value.open_source, module.data.open_source)
providers = { github = github }
}
try() handles optional fields — if a repository doesn’t specify visibility, it inherits from module.data.visibility. Adding a new repository never requires touching this file.
The codeowners_entries line shows how per-repository configuration layers onto the organisation defaults rather than replacing them. concat merges any repo-specific entries (defaulting to an empty list if none are defined) with module.data.codeowners_entries, so the global default ownership always applies and individual repositories can extend it.
The core module
modules/repository/ contains the actual GitHub resources — github_repository, CODEOWNERS management, LICENSE management. variables.tf defines the full interface: name, description, visibility, topics, archived, open_source, and codeowners_entries. The typed list for codeowners_entries is worth calling out as an example of how the module handles the more structured inputs:
variable "codeowners_entries" {
type = list(object({
pattern = string,
owners = list(string)
}))
}
The type constraint means malformed entries fail at plan time rather than producing a silently broken CODEOWNERS file.
Why domain grouping
In 2023, all repository definitions lived in a single locals.tf. As the count grew past twenty, navigating that file became tedious — scrolling through unrelated definitions to find the one being changed. Splitting by domain brings each file down to a manageable size.
At enterprise scale the benefit shifts. When different teams own different parts of the repository estate, domain-grouped files create natural ownership boundaries. A platform team owns repositories_core_platform.tf. A product team owns repositories_bookshelf.tf. Pull request conflicts largely disappear because changes rarely touch the same file.
The split that works for cognitive load at small scale is the same split that works for ownership at large scale.
Evolution: 2023→2026
The 2023 version of sudoblark.terraform.github was structurally simple. A single locals.tf defined four repositories in a flat map. Branch protection lived in a github_branch_protection resource inside the core module, applied per-repository. For four repositories, that was enough.
By mid-2025 the repository count had grown significantly and the single-file structure had become a navigation problem. The locals.tf was pushing past two hundred lines, with no logical grouping between an AI demo repository and a Terraform module. Finding a specific configuration meant scrolling through unrelated definitions. The three-tier refactor split that into domain-grouped files — repositories_bookshelf.tf, repositories_monsternames.tf, repositories_core_platform.tf — each small enough to be immediately readable.
The bigger change came in 2026. During a client engagement migrating an organisation from Bitbucket to GitHub, I worked with GitHub’s organisation-level rulesets for the first time. The breadth of what they could enforce — commit message conventions, required signatures, branch naming, bypass policies — went well beyond what per-repository branch protection rules could offer. I built it out for the client and then immediately wanted the same for Sudoblark.
That work lives in a separate repository: sudoblark.terraform.github.organisation. Keeping organisation-level governance separate from repository management means two different concerns have two different change histories, two different state files, and two different deployment cycles. sudoblark.terraform.github still manages what repositories exist and how they’re configured. The organisation repository manages the rules that apply across all of them.
With rulesets in place, the per-repository github_branch_protection resource was removed from the core module — it was redundant. The Governing GitHub at Organisation Scale with Terraform post covers that side of the architecture in full.
Starting from scratch today, the three-tier pattern would be the starting point — not something to grow into. The overhead of creating modules/data/ with a handful of files is minimal compared to the refactor required when a single locals.tf becomes unwieldy. The structure that scales to fifty repositories is the right structure for five.
Drift Detection in Practice
The apply workflow runs on three triggers: push to main, workflow_dispatch for manual runs, and a daily cron at midnight UTC:
on:
workflow_dispatch:
schedule:
- cron: 0 0 * * *
push:
branches:
- main
paths-ignore:
- '.github/**'
- 'LICENSE.txt'
The paths-ignore block prevents the workflow triggering on changes to files the apply itself may commit back — CODEOWNERS and LICENSE files are written to repositories on each apply, so without this the cron would kick off its own re-run indefinitely.
The primary concern is CODEOWNERS. In an enterprise setting, CODEOWNERS files are how responsibility is demarcated between teams — they control who must review changes to which parts of a codebase, and by extension which organisational entities are accountable for what. If those files can be modified ad hoc via the UI, or simply drift from the intended state, compliance and governance guarantees start to erode. Managing them centrally through Terraform and reconciling daily ensures the configuration is always what it’s declared to be.
There’s a security dimension to that cadence as well. If someone manages to gain push access to a repository’s main branch when they shouldn’t have it, the drift detection acts as a bounded recovery window. Within 24 hours, the configuration — CODEOWNERS, rulesets, visibility, everything managed by Terraform — is restored to its declared state. That limits the blast radius of a breach rather than leaving incorrect access in place indefinitely.
In practice this hasn’t occurred on Sudoblark, but for enterprise clients this pattern has been a compliance requirement rather than a nice-to-have. The cost is a few GitHub Actions minutes per day.
The one place the cron is not permitted to intervene is repository deletion. prevent_destroy in github_repository blocks Terraform from removing repositories even if they disappear from the configuration:
lifecycle {
prevent_destroy = true
}
Removing a repository from a domain file and running apply produces an error rather than a deletion. The correct workflow is to archive the repository first, verify nothing depends on it, then remove it from state manually. It’s a deliberate speed bump — the cost of accidental deletion is high enough to justify the friction.
Production Considerations
The bot account
All CI/CD actions run as sudoblark-bot, an organisation admin account with a dedicated personal access token stored as SUDOBLARK_GITHUB_TOKEN. The separation keeps automated changes clearly attributed — commits, file writes, and API calls appear as bot actions rather than under a personal account, which matters for audit trails.
The longer-term intention is to migrate from a PAT to a GitHub App with OIDC-based authentication, eliminating long-lived tokens in favour of time-limited credentials. That work is covered in the Governing GitHub at Organisation Scale with Terraform post.
State management
Terraform state is stored in S3. The workflows authenticate to AWS via OIDC role assumption rather than stored access keys — the configure-aws-credentials action assumes a dedicated IAM role scoped to what the workflow needs, and credentials are valid only for the duration of the run. No long-lived AWS credentials are stored as secrets.
Token scoping
Two tokens are in use: SUDOBLARK_GITHUB_TOKEN (the bot’s PAT with organisation-level access) and the auto-generated GITHUB_TOKEN that GitHub creates per workflow execution, scoped to the repository.
The split came from a practical problem encountered during a client implementation: using a single organisation-level token for every operation — checkout, plan, apply, PR comments — hit GitHub’s API rate limits. Reserving SUDOBLARK_GITHUB_TOKEN for operations that require organisation access (Terraform init, plan, apply, validate) and using the auto-generated GITHUB_TOKEN for repository-scoped operations where possible keeps the number of API calls against the organisation token manageable.
When to split domain groups
There’s no hard rule on when to create a new domain file, but roughly twenty to thirty repositories in a single file is where navigation starts to suffer. The more useful signal is ownership: if a group of repositories has a natural owner — a team, a product, a concern — that’s a domain file. Creating the file before it’s needed costs almost nothing.
Conclusion
The pattern started as something I built for clients. After delivering it a few times, it made obvious sense to bring it to Sudoblark — the same consistency, the same automation, the same single source of truth. What changed over three years wasn’t the core idea but the maturity of the implementation.
The three-tier structure replaced a flat locals.tf because the flat file stopped being navigable. The separation of repository management from organisation-level governance happened because rulesets could enforce things branch protection couldn’t. Each change was a response to a real constraint.
The bootstrapping paradox hasn’t gone away — there’s still a manual first step, and there always will be. But the import dance is a one-time cost, and once the system is managing itself the ongoing overhead is close to zero. A new repository becomes a few lines in a domain file and a pull request.
Centrally managed CODEOWNERS, reconciled daily, is a compliance and security control rather than an operational convenience. At enterprise scale the 24-hour recovery window is a meaningful constraint on the blast radius of any access incident.
The sudoblark.terraform.github repository is public and reflects the current state of the implementation. The companion Governing GitHub at Organisation Scale with Terraform post covers the organisation-level ruleset side of the architecture.
Further Reading:
- sudoblark.terraform.github — the repository this post is based on
- Governing GitHub at Organisation Scale with Terraform — organisation-level rulesets, GitHub Apps, and OIDC authentication
- The Three-Tier Terraform Data Pattern — a deeper look at the data pattern used here
- Setting Up Terraform from Scratch for AWS — the foundational CI/CD patterns referenced in the Prerequisites
Part 2 of 3 in Infrastructure as Code (IaC)