Context

I am currently supporting my customer in setting up and operating an AWS platform. The central component of this platform is a Kubernetes cluster on which a number of applications are provided and operated. In addition to the cluster, a number of other AWS services are used. In order to build the whole thing efficiently and comparably across several environments, the aspect of automation with Terraform plays a particularly important role. However, the team entrusted with operation and further development has a very strong administrative background. Knowledge from the field of development, as well as concepts such as IaC and its advantages are only available to a very limited extent. This is exactly where I am now trying to help out as an expert in order to relieve the platform’s operations team and drive development forward. At the time I joined the project, there were a few Gitlab repositories for Terraform modules. The included Terraform code was rolled out to the respective stages via (very simple) pipelines. Each stage (dev, test, prod) had its own branch with the respective infrastructure code for this environment. This approach is also called environment branching. Environment branching certainly has its raison d’être, but in our case (and generally often in the infrastructure environment) it comes with it a lot of complexity and other problems. I would like to go into these problems in this article and then explain what our solution to these problems looked like.

What is environment branching?

In short, environment branching is about having a separate branch for each environment and constantly evolving it. With this approach, complex processes and dependencies can be implemented very well. However, it takes some discipline (or the right technical measures) to avoid the drifting apart of the different code bases. Especially when Git newbies are in the project, the complexity can be overwhelming at first. In the area of ​​infrastructure, the fact that you want to keep the environments identical in order to allow meaningful testing and prevent problems in production makes things even more difficult. The need for different code bases is given here only in the case of further developments of the infrastructure. This usually affects the environment for testing the infrastructure. All subsequent environments should be identical. So you actually want to avoid exactly these differences, the flexibility of which is expected from environment-based development.

Remedy: Trunk-based development

With trunk-based development, the approach is exactly the opposite. Instead of constantly developing at the feature level, you always develop directly on the master/main branch (=the trunk). The advantage: There is only one code base. Coupled with a corresponding pipeline, it is therefore always clear which infrastructure has been rolled out in the respective environments. In order to still be able to exercise control over development and deployment, we have opted for the so-called scaled trunk-based development approach. In this case, development is no longer carried out directly on the trunk, but in short-lived branches (called feature branches), which are merged back into the master/main within a short period of time (a maximum of a few days). When merging, a review can take place here again see also this execellent article for a more-detailed look. Working with Git could thus be simplified. The current status is also much more transparent now. However, the introduction of this paradigm has also led to a more complex pipeline, since a mechanism was now needed to be able to deploy to all environments via this one branch.

Pipelining

As already mentioned, the trunk-based approach made the pipeline a bit more complex. In order to keep the environments as identical as possible and to ensure consistent quality, changes should henceforth have to go through each environment before they are finally deployed in production. However, it is often the case that a separate infrastructure has to be set up for development purposes. This can result in naming conflicts. In order to avoid this and to decouple the life cycles of the existing infrastructure from that created for development purposes, we have built in a number of mechanisms.

1. Separate terraform workspaces

A separate Terraform workspace is used for each newly created feature branch during deployment to the development environment (called devIaC in this case). This means that the same infrastructure can exist multiple times in the same environment and at the same time their life cycles are decoupled from one another.

2. Suffix

For each deployment that takes place via a feature branch, an automatically generated suffix is ​​attached to the created resources. This prevents naming conflicts in the development environment. The generation is based on the branch name. This is hashed and the first 8 digits of the hash are used for this. For “standard deployments”, i.e. deployments that do not take place via a feature branch, the suffix is ​​empty. This can be achieved by a Terraform variable which defaults to “”. The branch pipeline now passes a value for that variable and overrides it.

3. Dynamic backend configuration

Another necessary adjustment was the introduction of environment-specific configuration files. This affects the variables within Terraform: A file with the naming scheme <Environment>.tfvars was introduced for each environment, which is automatically read in when the pipeline runs. Furthermore, the backend configuration of the respective environments also differs. Fortunately, Terraform provides a mechanism for this case as well. It is thus possible to define only the basic structure of the backend and provide and to expand this by reading in additional environment-specific files. A file called backend.config was introduced for this purpose. In addition, analogous to the tfvars files, there is a backend configuration for each environment (e.g. devIaC.backend.config), which is automatically read in again by the pipeline.

4. Separate pipelines for feature branches, merge requests and master/main branch

The pipeline for deploying feature branches to the development environment differs from the standard pipeline in that it includes a step for validating the Terraform code, deploys it only to the development environment, and includes a destroy step to reclaim the resources for development purposes Clear. Gitlab knows which pipeline to execute, because we iintroduced a naming convention. Each feature branch has to start with feature/, so that Gitlab can recognize it´s a feature branch deployment instead of a regular one.

There is also a separate pipeline for merge requests. This validates the code and executes a terraform plan for the development environment (devIaC). The code is thus checked for its executability before it can be merged into master/main.

The code of the respective git pipeline can be found below (.gitlab-ci.yaml). Keep in mind that this is used as a pipeline master which will be used in multiple repositories. It is possible to override some of the environment variables and adapt it to your repositories needs.

image: <imgname>

stages:
  - validate
  - plan
  - applydeviac
  - destroydeviac
  - applydev
  - applytest
  - applyprod

variables:
  TERRAFORM_DIR: "."  # Can be overwritten
  TERRAFORM_PLAN_FILE: "planout"


.before_script_global: &before-script-global
  - terraform --version
  - echo "Terraform dir is ${TERRAFORM_DIR}"
  - cd "${TERRAFORM_DIR}"
  - ls


before_script:
  - *before-script-global


.validate_template: &validate-script
  script:
    - rm -rf .terraform
    - echo "${CI_JOB_STAGE} - ${CI_JOB_NAME}"
    - terraform init -backend=false
    - terraform validate ${TERRAFORM_DIR}/

.plan_feature_template: &plan-feature-script
  script:
    - terraform init --backend-config=${CI_ENVIRONMENT_NAME}.backend.config --backend-config=backend.config
    - echo "Creating plan file for Environment ${CI_ENVIRONMENT_NAME}"
    - FEATURE_SHA=$(echo -n $CI_COMMIT_BRANCH | sha256sum |cut -c1-8)
    - terraform workspace select feature${FEATURE_SHA} || terraform workspace new feature${FEATURE_SHA}
    #- terraform workspace new feature${CI_COMMIT_SHORT_SHA}
    - echo "Current workspace is $(terraform workspace show)"
    - terraform plan ${TF_ADDITIONAL_OPTS} -var-file ${TERRAFORM_COMMON_TFVARS} -var "feature_suffix=f${FEATURE_SHA}" -out=${TERRAFORM_PLAN_FILE}
  artifacts:
    paths:
      - ${TERRAFORM_DIR}/.terraform
      - ${TERRAFORM_DIR}/${TERRAFORM_PLAN_FILE}
      - ${TERRAFORM_DIR}/.terraform.lock.hcl

.plan_template: &plan-script
  script:
    - echo "Terrform DIR VARIABLE ${TERRAFORM_DIR}"
    - terraform init --backend-config=${CI_ENVIRONMENT_NAME}.backend.config --backend-config=backend.config
    - echo "Creating plan file for Environment ${CI_ENVIRONMENT_NAME}"
    - echo "Current workspace is $(terraform workspace show)"
    - terraform plan ${TF_ADDITIONAL_OPTS} -var-file ${TERRAFORM_COMMON_TFVARS} -out=${TERRAFORM_PLAN_FILE}
  artifacts:
    paths:
      - ${TERRAFORM_DIR}/.terraform
      - ${TERRAFORM_DIR}/${TERRAFORM_PLAN_FILE}
      - ${TERRAFORM_DIR}/.terraform.lock.hcl


.apply_template: &apply-script
  script:
    - echo "ENVIRONMENT=${CI_ENVIRONMENT_NAME}"
    - terraform apply --auto-approve ${TERRAFORM_DIR}/${TERRAFORM_PLAN_FILE}
  artifacts:
    paths:
      - ${TERRAFORM_DIR}/.terraform
      - ${TERRAFORM_DIR}/${TERRAFORM_PLAN_FILE}
      - ${TERRAFORM_DIR}/.terraform.lock.hcl

.destroy:template: &destroy-script
  script:
    - terraform init --backend-config=${CI_ENVIRONMENT_NAME}.backend.config --backend-config=backend.config
    - echo "Creating destroc plan file for Environment ${CI_ENVIRONMENT_NAME}"
    - FEATURE_SHA=$(echo -n $CI_COMMIT_BRANCH | sha256sum |cut -c1-8)
    - terraform workspace select feature${FEATURE_SHA} || terraform workspace new feature${FEATURE_SHA}
    - echo "Current workspace is $(terraform workspace show)"
    - terraform plan ${TF_ADDITIONAL_OPTS} -var-file ${TERRAFORM_COMMON_TFVARS} -var "feature_suffix=f${FEATURE_SHA}" -out=${TERRAFORM_PLAN_FILE}
    - terraform apply -destroy --auto-approve ${TERRAFORM_PLAN_FILE}

# ------ Terraform Plan Jobs

mr:validate:
  stage: validate
  rules:
    - if: $CI_MERGE_REQUEST_ID
  variables:
    TERRAFORM_COMMON_TFVARS: devIaC.tfvars
  <<: *validate-script
  environment:
    name: devIaC

mr:plan:
  stage: plan
  rules:
    - if: $CI_MERGE_REQUEST_ID
      when: on_success
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: devIaC.tfvars
  <<: *plan-script
  environment:
    name: devIaC

feature:validate:
  stage: validate
  rules:
    - if: $CI_MERGE_REQUEST_ID
      when: never
    - if: "$CI_COMMIT_REF_NAME =~ /^feature/"
  variables:
    TERRAFORM_COMMON_TFVARS: devIaC.tfvars
  <<: *validate-script
  environment:
    name: devIaC

feature:plan:
  stage: plan
  rules:
    - if: $CI_MERGE_REQUEST_ID
      when: never
    - if: "$CI_COMMIT_REF_NAME =~ /^feature/"
      when: on_success
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: devIaC.tfvars
  <<: *plan-feature-script
  environment:
    name: devIaC

feature:apply:
  stage: applydeviac
  rules:
    - if: $CI_MERGE_REQUEST_ID
      when: never
    - if: "$CI_COMMIT_REF_NAME =~ /^feature/"
      when: manual
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: devIaC.tfvars
  <<: *apply-script
  environment:
    name: devIaC

feature:destroy:
  stage: destroydeviac
  rules:
    - if: $CI_MERGE_REQUEST_ID
      when: never
    - if: "$CI_COMMIT_REF_NAME =~ /^feature/"
      when: manual
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: devIaC.tfvars
  <<: *destroy-script
  environment:
    name: devIaC



1-plan-deviac:
  stage: plan
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *plan-script
  only:
    - master
  environment:
    name: devIaC


2-plan-dev:
  stage: plan
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *plan-script
  when: manual
  only:
    - master
  environment:
    name: dev

3-plan-test1:
  stage: plan
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *plan-script
  when: manual
  only:
    - master
  environment:
    name: test1


4-plan-prod:
  stage: plan
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *plan-script
  when: manual
  only:
    - master
  environment:
    name: prod


# ------ Terraform Apply Jobs

1-apply-deviac:
  stage: applydeviac
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *apply-script
  when: manual
  only:
    - master
  needs:
    - job: 1-plan-deviac
  environment:
    name: devIaC

2-apply-dev:
  stage: applydev
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *apply-script
  when: manual
  needs:
    - job: 2-plan-dev
      optional: true
    - job: 1-apply-deviac
      optional: true
  only:
    - master
  environment:
    name: dev

3-apply-test1:
  stage: applytest
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *apply-script
  when: manual
  needs:
    - job: 3-plan-test1
    - job: 2-apply-dev
      optional: true
  only:
    - master
  environment:
    name: test1

4-apply-prod:
  stage: applyprod
  before_script:
    - *before-script-global
  variables:
    TERRAFORM_COMMON_TFVARS: ${CI_ENVIRONMENT_NAME}.tfvars
  <<: *apply-script
  when: manual
  needs:
    - job: 4-plan-prod
    - job: 3-apply-test1
      optional: true
  only:
    - master
  environment:
    name: prod