Simplifying Infrastructure Deployment on AWS with GitOps and GitHub Actions - Part 2

In the first part of this blog series, we learned how to create infrastructure as code in Terraform. Now that we have defined the infrastructure in Terraform, we will attempt to automate its deployment through a CI platform. While numerous CI platforms are available, we will go with GitHub Actions. It is closely integrated with the GitHub version control system. It has quite a good collection of actions created by the community and in the GitHub marketplace, which can be used within our workflows.

To get started with GitHub Actions, you can follow the tutorials here: https://docs.github.com/en/actions/quickstart

When developing the workflow we aim to be able to deploy the infrastructure in a secure and scalable manner.

We also follow the same principles as the previous post while defining the CI pipeline. To reiterate, they are:

  • Keep it simple

  • Keep it generic and follow the principle of 'do not repeat yourself' (DRY)

  • Keep it scalable

Here is the directory structure within our .github folder where action workflows are defined.

├── .github
│   ├── actions
│   │   └── terraform_action
│   │       └── action.yml
│   └── workflows
│       └── terraform.yml

Creating Reusable Actions

We create a reusable action within the folder .github/actions/terraform_action. https://docs.github.com/en/actions/using-workflows/reusing-workflows mentions the advantages of reusing the workflows and how to do it.

In the reusable workflow, we define the following steps:

  • Configure AWS Credentials

  • Set up a specific version of Terraform on the runner

  • Create a terraform plan

  • Based on the input, one can apply the plan, destroy the infrastructure, or take no action.

Authenticating with AWS

The recommended way to authenticate with AWS within GitHub Actions is to use OIDC. OIDC or OpenID Connect protocol lets any third-party application authenticate and verify the end user's identity. This article explains how to configure your IAM role to use OIDC. An important thing to note here is to construct your IAM policy following the principle of least privilege to be as specific as possible. Even in your IAM trust policy, providing a full path to your repository and branch ( if applicable ) is always recommended.

Create Terraform Plan

The Terraform plan provides a preview of upcoming infrastructure changes, serving as a crucial step to anticipate and avoid any unwanted or destructive alterations to our systems.

Deploy the infrastructure

In this step, the infrastructure gets deployed in AWS. This step is invoked based on the value provided to the input variable operation

Destroy the infrastructure

This step is invoked in the value provided for operation input variable is destroy. It is valuable when creating infrastructure for testing purposes and there is a need to tidy up and remove resources once the tests are concluded.

Here are the contents of .github/actions/terraform_action

name: "Terraform Action"
descriptions: "Terraform operations"
inputs:
  operation:
    description: "Whether the job should apply after plan"
    required: true
    default: 'plan'
  iam_role_arn:
    description: "IAM Role ARN for OIDC authentication"
    required: true
  domain_path:
    description: "name of the domain who records needs to be deployed"
    required: true
  aws_region: 
    description: "Region where state bucket and dynamodb table exists"
    required: true


runs:
  using: "composite"
  steps:
  - name: Configure AWS credentials
    uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: ${{inputs.iam_role_arn}}
      aws-region: ${{inputs.aws_region}}
  - uses: hashicorp/setup-terraform@v2
    with:
      terraform_version: "1.6.4"
  - name: Terraform Plan
    working-directory: ${{inputs.domain_path}}
    run: |
      set -x
      if [ ${{ inputs.operation  }} = "plan" ]; then
        terraform init -reconfigure
        terraform validate
        terraform plan -lock=false -input=false
      fi
    shell: bash
  - name: Terraform apply
    working-directory: ${{inputs.domain_path}}
    run: |
      if [ ${{ inputs.operation  }} = "apply" ]; then
        echo "Running terraform apply..."
        terraform --version
        terraform init -reconfigure
        terraform apply -auto-approve -input=false
      else
        echo "Bypassing Terraform apply..."
      fi
    shell: bash
  - name: Terraform destroy
    working-directory: ${{inputs.domain_path}}
    run: |
      if [ ${{ inputs.operation  }} = "destroy" ]; then
        echo "Running terraform destroy..."
        terraform --version
        terraform init -reconfigure
        terraform apply -auto-approve -input=false -destroy
      else
        echo "Bypassing Terraform apply..."
      fi
    shell: bash

Detecting Changes in the repository

Deploying every infrastructure defined in the repository each time a change occurs wouldn't be ideal, especially when dealing with many Route 53 hosted-zones and records. Hence we define our workflow in such a way that it can determine what exactly has changed and apply the changes to only that infrastructure. This makes it easier to scale.

We define a job changed_files to determine which files have changed. As discussed in the first segment of this blog series, Infrastructure as Code (IAC) comprises two main components: modules and templates, each accompanied by variables. Changes to modules will impact all Route 53 hosted zones and their associated records, whereas changes to individual templates will only affect the specific hosted zone they are associated with. Here we are using a third-party action called as tj-actions/changed-files . The documentation for this action can be found at https://github.com/marketplace/actions/changed-files. The code snippet to determine which files have changed is shown below:

#! /bin/bash -x
if [[ ${{steps.changed-files.outputs.modules_any_changed}} == 'true' || ${{steps.changed-files.outputs.workflows_any_changed}} == 'true' ]]; then
    echo "Adding all hosted zones"
    domain_zones=()
    for file in $(ls records); do
        domain_zones+=(records/$file)
    done
    domain_paths=$(jq -c -n '$ARGS.positional' --args ${domain_zones[@]})
    echo "$domain_paths"
    echo "domain_paths={\"domain\": $domain_paths}" >> $GITHUB_OUTPUT
elif [ ${{steps.changed-files.outputs.domainpaths_any_changed}} == 'true' ]; then
    changed_domain_zones=()
    for file in ${{steps.changed-files.outputs.domainpaths_all_changed_files}}; do
        echo "$file was changed"
        echo "Directory is $(dirname $file)"
        changed_domain_zones+=($(dirname $file))
    done
    echo "changed_domain_zones  ${changed_domain_zones[@]}"
    unique_changed_domains=($(printf "%s\n" "${changed_domain_zones[@]}" | sort -u))
    echo "Unique changed domains ${unique_changed_domains[@]}"
    domain_paths=$(jq -c -n '$ARGS.positional' --args ${unique_changed_domains[@]})
    echo "domain_paths={\"domain\": $domain_paths}" >> $GITHUB_OUTPUT
    echo "$domain_paths"
else
    echo "domain_paths={\"domain\": []}" >> $GITHUB_OUTPUT
fi

As evident in this snippet, we capture the output in JSON format. This proves beneficial in the subsequent job responsible for deploying the infrastructure.

Deploying the Infrastructure to AWS

Having identified the domains requiring updates, we proceed by generating a Terraform plan in the next step. Depending on the targeted environment, we then initiate the deployment of the infrastructure. As we've opted for Route 53 domains and records in our infrastructure, we can straightforwardly designate the environment as 'production.' However, when dealing with other infrastructure components such as VPCs, EC2 instances, and more, we may need to configure multiple environments, such as development, staging, and so on.

As the initial step, we generate individual Terraform plans for each of the domains requiring updates. Since these domains are independent of each other, we have the flexibility to execute these update jobs simultaneously in parallel. We use a matrix strategy to define this job as we can use the number of the changes domains determined in the previous job to define the degree of parallelism here.

Here is the definition of the job

terraform_plan:
        runs-on: ubuntu-latest
        needs: changed_files
        if: needs.changed_files.outputs.if_changed_files == 'true'
        strategy:
            max-parallel: 1  # assuming they should run in series
            matrix: ${{fromJSON(needs.changed_files.outputs.domains)}} 
        steps:
            - name: Checkout
              uses: actions/checkout@v4
              with:
                fetch-depth: 0
            - name: terraform-plan
              id: terraform-plan
              uses: ./.github/actions/terraform_action
              if: ${{needs.changed_files.outputs.if_changed_files}} == 'true'
              with:
                operation: plan
                iam_role_arn: arn:aws:iam::195368226277:role/gitops-terraform-demo-role
                domain_path: ${{matrix.domain}}
                aws_region: "us-east-1"

As we can see in the job definition we generate a Terraform plan for each of the changed domains.

The next step is to deploy the changes to AWS. Here we use the 'Environment' feature provided by GitHub Actions. Environments in GitHub Actions provide us with a way to define different rules for different environment targets. This helps us prevent untested changes from getting deployed to the production environment. GitHub also provides deployment protection rules. One can use this to include a manual approval step and to restrict deployment from only certain branches. For example, we might want to restrict any deployments to the production environment only from the main branch of the repository.

Here is a screenshot of the environment setting for the repository used in this blog post.

The job definition of 'deployment' is similar to that of terraform_plan except

  1. We specify the environment as production

  2. We pass the value apply to the variable operation when invoking the re-usable action.

Here is the code snippet for the deployment job.

deployment:
        runs-on: ubuntu-latest
        needs: [changed_files,terraform_plan]
        if: needs.changed_files.outputs.if_changed_files == 'true'
        strategy:
            max-parallel: 3  # configure this to your convenience
            matrix: ${{fromJSON(needs.changed_files.outputs.domains)}} 
        environment: 
          name: production
          url: https://github.com
        steps:
            - name: Checkout
              uses: actions/checkout@v4
              with:
                fetch-depth: 0
            - name: terraform-deploy
              id: terraform-deploy
              uses: ./.github/actions/terraform_action
              if: ${{needs.changed_files.outputs.if_changed_files}} == 'true'
              with:
                operation: apply
                iam_role_arn: arn:aws:iam::195368226277:role/gitops-terraform-demo-role
                domain_path: ${{matrix.domain}}
                aws_region: "us-east-1"

Cleaning up the infrastructure ( Optional )

The last step is the clean-up of the infrastructure. This can be an optional step and we can control it through environment variables. For example, one might want to clean up the infrastructure after testing in a non-production environment to save costs. The cleanup job is similar to deployment except that the operation will have the value destroy.

A typical workflow run is shown below:

The reference code can be found at https://github.com/sampritavh/terraform-deployment-demo

This is a demo workflow and it can be easily extended to other infrastructure or any other deployment patterns.