Written 30 March 2026 ~ 18 min read

The Three-Tier Terraform Data Pattern

Written by

Benjamin Clark

Introduction

This post is the third in the Sudoblark Best Practices series. It assumes you’ve read the first post on Terraform with Data Structures and Algorithms, or are at least familiar with the idea of data-structure-driven Terraform. If not, start there.

About three years into running Sudoblark, I picked up a strategic technical partner engagement — I can’t name the client, but the brief was a supply-chain agentic AI platform. I was the sole technical resource for solution architecture, systems engineering, backend API development, and the agentic AI frameworks. There was one contractor covering data and one covering the frontend — but the platform engineering, the infrastructure, the glue holding it all together: that was mine.

The platform engineering scope was significant. I built an AWS organisation from scratch — SSO, Identity Centre, a dev/staging/prod/management account split, all in IaC. I built a GitHub Enterprise and organisation from scratch — users, teams, repository permissions, all managed through Terraform. Then I started building the actual SaaS system: 14 repositories covering agentic AI pipelines, backend APIs, a frontend, a data warehouse. One engineer. Client timelines.

That’s the context in which the versioned-module approach — the pattern I’d used successfully at larger clients with dedicated infrastructure teams — started to crack.

Once I’d worked out what to do differently, I went back and applied it to Sudoblark’s own repos: the modularised-demo first, then the bookshelf-demo. The value was immediately obvious, and not just for solo work — it’s also the right call for smaller clients who have a DevOps function but can’t justify the overhead of a full module registry operation. This post is that pattern written up.

The goal

Understand why the DSA pattern’s overhead becomes a problem at solo or small-team scale
See how the three-tier structure maintains separation of concerns without release engineering
Follow a real implementation from the bookshelf-demo with working HCL examples
Know where the pattern breaks down and when to reach back for versioned modules

Prerequisites

Familiarity with Terraform basics and the data structures and algorithms pattern is assumed. You don’t need prior experience with a Terraform module registry.

sudoblark.ai.bookshelf-demo — the reference implementation used throughout this post
sudoblark.terraform.modularised-demo PR #35 — a before/after migration from DSA to three-tier

Why versioned modules struggle at solo scale

The DSA pattern works best with two teams: one that maintains modules as a product, one that consumes them. The consuming team interacts with data structures. The module team controls the implementation. Each moves at its own pace. The version contract protects both sides.

At a larger client with a dedicated infrastructure engineering team and multiple consuming application teams, that protection is genuine value. I’ve shipped that setup, and it works well.

The problem is when you’re both teams. The “protection” is protecting a boundary that doesn’t exist. And you’re still paying the full cost of maintaining it.

What that actually looks like in practice:

You make a change to a module. You tag it. You publish it. You wait for the registry to reflect it. You go to the consuming repo, find the version pin, bump it, commit, push, run plan. The change was two lines. The ceremony was twenty minutes.
You’re building something new and you need three modules. Each one is in a separate repository. Each one needs to be tagged and published before the downstream repo can reference it. You find a bug in module B while wiring up module C. Back to module B, fix, tag, publish, update the pin in C. By the time you’re back in C, you’ve lost the thread.
An AI coding assistant can’t fetch your remote module at git::https://github.com/your-org/terraform-modules.git//lambda?ref=v2.3.1. It can’t verify which version you’re pinned to. So it guesses, and it guesses wrong — suggesting inputs that existed in v1 but not v2, or hallucinating interface fields that never existed at all.

By the time I was building the supply-chain platform — with 14 repositories to stand up and a single pair of hands to do it — the versioned approach had become the bottleneck. I was spending more time on module release engineering than on the application infrastructure I was supposed to be building.

The three-tier pattern is what I reached for instead.

The three-tier pattern

The three-tier pattern reorganises things into — predictably — three tiers, all living in the same repository:

Data layer (modules/data/) — defines what should exist
Infrastructure modules (modules/infrastructure/*/) — defines how each resource concern is created
Environment layer (infrastructure/{account}/) — defines where resources are deployed

No remote module registry. No version tags to bump. Everything is local, everything is readable, and the separation of concerns is maintained through directory structure rather than release engineering.

All of the examples in this post come from sudoblark.ai.bookshelf-demo — an AI data pipeline that AV scans uploads, extracts book metadata using Bedrock, and stores results in Parquet for querying via Athena. The implementation structure looks like this — the specific infrastructure modules will vary per project, but the three-tier shape stays the same:

modules/
  data/
    buckets.tf              # S3 bucket definitions
    lambdas.tf              # Lambda definitions (with IAM inline)
    glue_crawlers.tf        # Glue crawler definitions (with IAM inline)
    notifications.tf        # S3 notification trigger definitions
    athena_workgroups.tf    # Athena workgroup definitions
    defaults.tf             # Project constants and defaults
    variables.tf            # Required inputs: account, project, application, environment
    infrastructure.tf       # Enrichment: computes full names and resolved references
    outputs.tf              # Exports enriched configurations
  infrastructure/
    s3/                     # Creates S3 buckets and folder structures
    lambda/                 # Creates Lambda functions, roles, and policies
    notifications/          # Creates S3 event notifications
    glue/                   # Creates Glue databases, crawlers, and security config
    athena/                 # Creates Athena workgroups
infrastructure/
  aws-sudoblark-development/
    data.tf                 # Instantiates the data module
    s3.tf                   # module "s3"  — four lines
    lambda.tf               # module "lambda"  — four lines
    notifications.tf        # module "notifications"  — six lines
    glue_databases.tf       # module "glue"  — six lines
    athena_workgroups.tf    # module "athena"  — four lines

Tier 1: The data layer

The data layer is where you define your infrastructure as plain Terraform data structures. The key discipline here is that it should contain no environment-specific values — it describes the application’s logical resources, not where or how they run.

Each file in the data layer starts with a comment block that documents the data structure — required fields, optional fields, constraints, and a concrete example. This is the interface contract for anyone adding or changing infrastructure; no Terraform knowledge required to use it.

# modules/data/buckets.tf
/*
  S3 Buckets data structure definition:

  Each bucket object requires:
  - name (string): The bucket identifier (will be prefixed with account-project-application)

  Optional fields:
  - folder_paths (list(string)): List of folder paths to pre-create in the bucket (default: [])

  Constraints:
  - Bucket names must be unique within the configuration
  - Final bucket name will be: account-project-application-name (all lowercase)
  - Folder paths should not start or end with slashes

  Example:
  {
    name         = "landing"
    folder_paths = ["uploads", "archive"]
  }
*/
locals {
  buckets = [
    {
      name         = "landing"
      folder_paths = ["uploads", "archive"]
    },
    { name = "raw"            },
    { name = "processed"      },
    { name = "athena-results" },
  ]
}

# modules/data/lambdas.tf
locals {
  lambdas = [
    {
      name          = "av-scanner"
      description   = "AV scans uploads from landing and passes clean files to raw bucket"
      zip_file_path = "../../lambda-packages/av-scanner.zip"
      handler       = "lambda_function.handler"
      environment_variables = {
        RAW_BUCKET = "raw"
        LOG_LEVEL  = "INFO"
      }
      iam_policy_statements = [
        {
          sid       = "LandingBucketRead"
          effect    = "Allow"
          actions   = ["s3:GetObject"]
          resources = ["arn:aws:s3:::${var.account}-${local.project}-${local.application}-landing/*"]
        },
        {
          sid       = "LandingBucketDelete"
          effect    = "Allow"
          actions   = ["s3:DeleteObject"]
          resources = ["arn:aws:s3:::${var.account}-${local.project}-${local.application}-landing/*"]
        },
        {
          sid       = "RawBucketWrite"
          effect    = "Allow"
          actions   = ["s3:PutObject"]
          resources = ["arn:aws:s3:::${var.account}-${local.project}-${local.application}-raw/*"]
        },
      ]
    },
    {
      name          = "metadata-extractor"
      description   = "Extracts book metadata from images using Bedrock and writes to Parquet"
      zip_file_path = "../../lambda-packages/metadata-extractor.zip"
      handler       = "lambda_function.handler"
      environment_variables = {
        PROCESSED_BUCKET = "processed"
        LOG_LEVEL        = "INFO"
      }
      iam_policy_statements = [
        {
          sid       = "RawBucketRead"
          effect    = "Allow"
          actions   = ["s3:GetObject"]
          resources = ["arn:aws:s3:::${var.account}-${local.project}-${local.application}-raw/*"]
        },
        {
          sid       = "ProcessedBucketWrite"
          effect    = "Allow"
          actions   = ["s3:PutObject"]
          resources = ["arn:aws:s3:::${var.account}-${local.project}-${local.application}-processed/*"]
        },
        {
          sid       = "BedrockInvokeModel"
          effect    = "Allow"
          actions   = ["bedrock:InvokeModel"]
          resources = ["*"]
        },
      ]
    },
  ]
}

The lambdas.tf file follows the same docstring convention — required fields, optional fields, constraints, and an example entry at the top — before the locals block. Three things are worth pointing out here.

First, the IAM policy statements live with the Lambda that needs them. There’s no separate iam_roles.tf you need to cross-reference to understand what a function can do. If you want to know what permissions av-scanner has, you look at the av-scanner block. That’s it.

Second, the ARN patterns use var.account and local.project — module inputs — rather than hardcoded account IDs or bucket names. The same definition deploys correctly into any account, as long as the right variable values are provided.

The data layer accepts four required inputs: account, project, application, and environment. These four strings are everything that changes between environments. Everything else is derived from them.

Tier 2: Enrichment

The data layer describes the logical shape of things. The enrichment layer, in modules/data/infrastructure.tf, computes the concrete resource names and resolves references that infrastructure modules need.

# modules/data/infrastructure.tf (excerpt)
locals {
  lambdas_enriched = [
    for lambda in local.lambdas : merge(
      {
        # Apply defaults for any optional fields not specified
        runtime               = local.lambda_defaults.runtime
        timeout               = local.lambda_defaults.timeout
        memory_size           = local.lambda_defaults.memory_size
        layers                = local.lambda_defaults.layers
        environment_variables = local.lambda_defaults.environment_variables
      },
      lambda,
      {
        # Computed full names following account-project-application-name convention
        full_name = lower("${local.account}-${local.project}-${local.application}-${lambda.name}")
        role_name = lower("${local.account}-${local.project}-${local.application}-${lambda.name}")
        # IAM policy statements pass through unchanged — already fully resolved in the data layer
      }
    )
  ]
}

The enrichment layer applies defaults, computes full resource names, and passes iam_policy_statements through untouched. The IAM policy resources are already written with full ARN patterns in the data layer — there’s nothing to resolve at enrichment time.

This separation keeps things honest. The data layer stays focused on what you want; enrichment stays focused on what Terraform needs to make it real. You can read one without needing to understand the other.

Tier 3: Infrastructure modules and the environment layer

Each infrastructure module lives in modules/infrastructure/{concern}/ and does one thing: accept an enriched list and create the corresponding AWS resources using for_each. Here’s the Lambda module in full:

# modules/infrastructure/lambda/function.tf
locals {
  lambdas_map = { for lambda in var.lambdas : lambda.name => lambda }
}

data "aws_iam_policy_document" "assume_role" {
  for_each = local.lambdas_map

  statement {
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["lambda.amazonaws.com"]
    }
    actions = ["sts:AssumeRole"]
  }
}

resource "aws_iam_role" "lambda" {
  for_each           = local.lambdas_map
  name               = each.value.role_name
  assume_role_policy = data.aws_iam_policy_document.assume_role[each.key].json
}

resource "aws_iam_role_policy_attachment" "basic_execution" {
  for_each   = local.lambdas_map
  role       = aws_iam_role.lambda[each.key].name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

data "aws_iam_policy_document" "lambda_policy" {
  for_each = local.lambdas_map

  dynamic "statement" {
    for_each = each.value.iam_policy_statements
    content {
      sid       = statement.value.sid
      effect    = statement.value.effect
      actions   = statement.value.actions
      resources = statement.value.resources
    }
  }
}

resource "aws_iam_role_policy" "lambda_policy" {
  for_each = local.lambdas_map
  name     = "${each.value.name}-policy"
  role     = aws_iam_role.lambda[each.key].id
  policy   = data.aws_iam_policy_document.lambda_policy[each.key].json
}

resource "aws_cloudwatch_log_group" "lambda" {
  for_each          = local.lambdas_map
  name              = "/aws/lambda/${each.value.full_name}"
  retention_in_days = 30
}

resource "aws_lambda_function" "function" {
  for_each      = local.lambdas_map
  function_name = each.value.full_name
  role          = aws_iam_role.lambda[each.key].arn
  handler       = each.value.handler
  runtime       = each.value.runtime
  timeout       = each.value.timeout
  memory_size   = each.value.memory_size
  filename      = each.value.zip_file_path

  environment {
    variables = each.value.environment_variables
  }

  depends_on = [
    aws_iam_role.lambda,
    aws_iam_role_policy.lambda_policy,
    aws_cloudwatch_log_group.lambda,
  ]
}

The module has no knowledge of which specific lambdas it’s creating. It takes the enriched list and creates a role, trust policy, managed policy attachment, inline policy, log group, and function for each entry. Adding a new Lambda to lambdas.tf gives you all six resources on the next terraform apply, with no changes needed in the infrastructure module.

The environment layer — the bit that actually runs — ends up genuinely thin:

# infrastructure/aws-sudoblark-development/data.tf
module "data" {
  source      = "../../modules/data"
  account     = var.account
  project     = var.project
  application = var.application
  environment = var.environment
}

# infrastructure/aws-sudoblark-development/lambda.tf
module "lambda" {
  source  = "../../modules/infrastructure/lambda"
  lambdas = module.data.lambdas
}

# infrastructure/aws-sudoblark-development/s3.tf
module "s3" {
  source  = "../../modules/infrastructure/s3"
  buckets = module.data.buckets
}

# infrastructure/aws-sudoblark-development/glue_databases.tf
module "glue" {
  source               = "../../modules/infrastructure/glue"
  databases            = module.data.glue_databases
  crawlers             = module.data.glue_crawlers
  security_config_name = module.data.glue_security_config_name
}

Four lines per infrastructure concern. No logic. No conditional expressions. No count tricks. The environment layer’s job is purely wiring — connecting the data module outputs to the infrastructure module inputs. That should be all it ever needs to do.

If you need a staging environment, you create infrastructure/aws-sudoblark-staging/ and do exactly the same thing with different variable values. The data module, the infrastructure modules, and all of your definitions stay unchanged.

Where it clicked

There were three moments that told me this was the right call.

The first was on the client engagement itself. I showed a non-technical stakeholder the infrastructure — just the modules/data/ directory and a quick explanation of what each file does. They immediately grasped it. “So if I want a new Lambda, I add an entry here?” Yes. “And the permissions for it are right there in the same block?” Yes. That kind of clarity — where someone with no Terraform experience can immediately understand what exists and why — is something I’d never managed to achieve with a more complex module structure. The data layer’s docstring convention and the colocation of IAM with the resource that needs it are what make it readable to non-engineers.

The second was AI assistance. As mentioned earlier, Claude can’t fetch your remote module at git::https://github.com/your-org/terraform-modules.git//lambda?ref=v2.3.1. With versioned modules I’d spent time correcting interface hallucinations — suggestions that were valid in v1 but not v2, or inputs that simply didn’t exist. With local modules, everything is in the repository. Claude reads modules/data/lambdas.tf, understands what a Lambda entry looks like, reads modules/infrastructure/lambda/function.tf, understands what gets created. The suggestions became correct because the context was there to make them correct. More practically: AI-assisted infrastructure work decomposes into very small, well-scoped operations — edit lambdas.tf, add one entry. These are the kinds of atomic, clearly-bounded tasks that AI assistants handle well.

The third was the bookshelf-demo itself. I built v1 in a day. I hadn’t been able to say that about a new demo project in a long time — I’d been quietly dreading building new demos precisely because standing up the versioned-module structure from scratch was its own half-day project before I’d written a line of actual application infrastructure. That friction had been discouraging me from building things. The three-tier approach removed it entirely.

Production considerations

The three-tier pattern is straightforward to get started with, but there are a few areas where it needs deliberate management as a project grows.

Module testing

Infrastructure modules in this pattern are local Terraform modules — they don’t go through a release process, which means the automated validation gate that semantic versioning provides (even if imperfect) is absent. In practice this means the terraform validate and terraform plan outputs in your CI/CD pipeline are doing more work. I’d recommend running checkov against the plan output as a baseline policy check; the CI/CD at scale post covers how to wire that into a pipeline as a standardised step.

For more comprehensive module testing, Terratest is the standard approach — but honestly, for application-scoped infrastructure with a handful of modules, the cost of writing and maintaining Terratest suites often outweighs the benefit. My current position: invest in good terraform plan output review in CI, and save Terratest for modules that genuinely behave like shared library code.

State management

With everything in a single repository and a single environment directory per account, state management is refreshingly simple: one state file per environment, stored in an S3 backend with DynamoDB locking. No cross-state references, no partial applies to coordinate. That simplicity is one of the pattern’s genuine wins.

The state file size concern that often gets raised here is, in practice, a systems architecture problem rather than a Terraform one. If an application’s infrastructure has grown to hundreds of resources, the right answer is usually to split it into multiple repositories with clear boundaries — the mono-repo vs multi-repo question. That decomposition solves the state problem as a side effect.

Where I have genuinely seen terraform plan buckle under resource count is at a different scale entirely: large enterprises managing GitHub itself through a data-driven Terraform approach and self-serving repository creation. Two thousand repositories, each with ten-odd resources, lands you at roughly twenty thousand resources in a single plan. At that point you have a real problem — but it’s a systems engineering and architecture problem, not a problem with the three-tier pattern specifically. The solution is better separation of concerns in the data layer, not a different Terraform approach.

When this pattern starts to break down

The three-tier pattern assumes a single application, a single repository, and a small number of engineers. Scaling beyond that usually means each service or bounded concern gets its own three-tier repo — that’s healthy growth, not a breakdown. The breakdown conditions are different:

Multiple teams consuming the same infrastructure modules

Once a second team wants to reuse your Lambda or S3 module, the “local module” assumption breaks. At that point, extracting those modules into a versioned registry and returning to the DSA pattern is the right call. The data layer pattern stays useful either way — it’s the module distribution mechanism that changes.

The data layer becomes a coordination bottleneck

If five engineers are all editing lambdas.tf on different branches, the merge conflict surface area grows. The fix is usually clearer ownership — each team has its own data files — but if the data layer itself becomes a shared file that everyone edits, it’s a signal that the project has outgrown a single application boundary.

IAM sprawl in the data layer

Colocating IAM policy statements with resource definitions works well when each resource has a small, clear set of permissions. I’ve seen this become hard to audit when a Lambda accumulates 15-20 IAM statements across multiple iterations. At that point, a dedicated IAM review pass and possibly a separate iam_statements.tf per resource file is worth considering — though the statements should still reference the resource they belong to.

When to use each pattern

The data structures and algorithms pattern is not superseded by this one. I’ve used both on real engagements, and they solve genuinely different problems.

Use versioned, remote Terraform modules when:

You have a dedicated infrastructure engineering team maintaining modules as a product
Multiple teams or repositories consume the same modules
You need stable, auditable contracts between module consumers and implementors
Consumers should never need to understand or see the implementation

Use the three-tier pattern when:

You’re a small team (or solo) maintaining both the modules and the data structures
You want AI coding assistants to be genuinely useful for infrastructure work
You value a single repository of truth over release engineering overhead
Your infrastructure concerns are scoped to a single application or project

The DSA pattern is absolutely still what I’d recommend at a larger client with a mature infrastructure engineering function. The three-tier pattern is the right tool for different circumstances — specifically, the circumstances that come up most often in small-team and solo work.

Conclusion

This came about because I needed to move fast on a client engagement with significant infrastructure scope and no one to share the load with. The versioned-module approach would have worked eventually, but it would have added weeks of setup overhead before I could start building the actual system. The three-tier pattern was the pragmatic adaptation to that constraint.

What I didn’t expect was how much it would change the day-to-day experience of working on infrastructure. The bookshelf-demo took a day to get to v1 — I hadn’t been able to say that about a new project in years. The non-technical stakeholder who immediately understood the data layer wasn’t a fluke; the structure genuinely makes infrastructure legible to people without Terraform expertise. And AI assistance improved qualitatively the moment the module code was local and readable.

The motivation is all three things at once: reclaiming time from ceremony, making AI assistance actually usable for infrastructure, and recognising that solo practitioner reality doesn’t require the same structural protections as a multi-team organisation. When you’re the module team and the consumer team, versioned modules are overhead without benefit. The three-tier pattern is the right answer for that context — which, in my experience, is also a very common context to be in.

The reference implementation is sudoblark.ai.bookshelf-demo. If you want the conceptual foundations that underpin all of this, the original DSA post is still where to start.

Further Reading:

Terraform with Data Structures and Algorithms — the DSA pattern that this post builds on; start here if the versioned-module approach applies to your context
Strategies for CI/CD at Scale — how to wire plan, validate, and Checkov into a repeatable pipeline for this kind of infrastructure
sudoblark.ai.bookshelf-demo — the reference implementation used throughout this post
sudoblark.terraform.modularised-demo PR #35 — a concrete before/after migration from the DSA pattern to three-tier

Part 3 of 3 in Sudoblark Best Practices

← Previous