Service Control Policies and BDD

Introduction

Service control policies(SCPs) are a type of organization policy that you can use to centrally manage maximum available permissions for the IAM users and IAM roles in your organization. Since they are usually applied over a group of accounts, it is best practice to test changes in a sandbox environment/account. SCPs are often used to restrict actions as per standards applied by a company.

Behaviour-Driven Development(BDD) is a test driven software development philosophy. TDD generally involves writing tests before actually writing code. In BDD, tests are written using natural language to validate the expected outcome of a system. Domain specific language using natural language constructs, makes it easy to read for software developers and business users alike.

I had previously written about using BDD to make reliable deployments to Fastly. In this post, we will use BDD to reliably deploy Organization policies (service control and tag policies) to an entire AWS Organization or group of AWS accounts.

Problem

With growing complexity in AWS environments, SCPs are becoming increasingly important for organizations to apply guardrails, to ensure all member accounts adhere to best practices and established standards. As they are applied on a group of accounts at a time, any unintended side-effect due to a change can have massive impact. This makes testing, a crucial part of deploying any SCPs. AWS recommends testing these policies thoroughly in a sandbox environment before applying them to live accounts.

Running a suite of tests for every change in the policy, however minor, can be a time consuming process, if done manually. While it may be easy to verify the changes manually when the policies are short, it does not scale well as the size (and complexity) of these policies increases.

Example

Let’s consider the following requirements -

Ensure all instances have 3 tags - dept, cost and proj.
cost tag value is one of 001,002 or 003.
Block any EC2 from using a public IP unless it has a tag exempt: public-ip-control

In the next few sections, we will develop tests, and the policies to achieve this goal.

Solution

BDD tests (or specifications) are written in plain-text and use basic syntax rules called Gherkin. Cucumber reads and executes these specs, and validates the outcome is as expected.

Test Scenarios

For the example requirements (1,2) defined above, let’s write a basic feature test with some scenarios.

Feature: EC2 instances must have basic tags

  Scenario: Block ec2 with no tags
    Given I want to launch 1 ec2 instance 
    And use subnet "private-subnet-1" in vpc "my-vpc"
    And add tags:
        | key | value | add_to_resources |
    When I launch the ec2 instance
    Then the response is "UnauthorizedOperation"

  Scenario: Allow ec2 with basic tags
    Given I want to launch 1 ec2 instance
    And use subnet "private-subnet-1" in vpc "my-vpc"
    And add tags:
        | key  | value | add_to_resources |
        | Name | first | instance, network-interface, volume |
        | dept | blue  | instance, network-interface, volume |
        | cost | 001   | instance, network-interface, volume |
        | proj | one   | instance, network-interface, volume |
    When I launch the ec2 instance
    Then the response is "OK"

The first scenario checks if you don’t specify any tags, you are unable to launch an instance. The second scenario checks if you do specify the basic tags you are to launch the instance just fine. Full feature files are here.

Test Specifications

After the tests, we need to write an implementation of these scenarios, which will make AWS API calls to validate the behaviour.

In this example, we use godog, the official Cucumber BDD framework for Golang. It merges specification and test documentation into one cohesive whole, using Gherkin formatted scenarios. Their example section also provides an easy to follow quickstart guide.

godog helpfully creates the scaffolding around your tests, so you can easily fill in the gaps. Create a minimal go module like below, and execute go test to get the initial scaffolding.

// ec2_test.go
package main

import (
	"testing"

	"github.com/cucumber/godog"
)
func InitializeScenario(ctx *godog.ScenarioContext) {}

func TestFeatures(t *testing.T) {
	suite := godog.TestSuite{
		ScenarioInitializer: InitializeScenario,
		Options: &godog.Options{
			Format:   "pretty",
			Paths:    []string{"features"},
			TestingT: t, // Testing instance that will run subtests.
		},
	}

	if suite.Run() != 0 {
		t.Fatal("non-zero status returned, failed to run feature tests")
	}
}

For reference, full specifications are here.

Develop Organization Policies

With our tests in place, we can now create the policies to meet the requirements. For the example problem statements defined above, we need to use both, a service control and tag policy. Example policies are here.

Execute Tests

Once you are happy with the specifications, you can run through the scenarios same way as you would execute go tests - go test. Remember to provide AWS credentials as explained here. The easist method, while testing, is to simply export the credentials as environment variables. Best to use a sandbox environment during the testing phase.

It’s good to execute these tests before applying any organization policy to establish a baseline and tweak the tests accordingly. Depending on the state your sandbox environment (and permissions), there is a possiblity that some of the negative tests might succeed. This is because, missing IAM permissions will also result in the AWS API returning an error. Therefore, it’s important to distinguish between the type of error. See this function, which checks for different types of errors. This function fails the test if it encounters a permission error.

After an initial test, deploy the organization policies to a sandbox environment via your usual method. Execute these tests again post deployment, to validate whether the policy is working as expected. If not, tweak the policy as necessary. This is an iterative process - you will need to repeat these steps a few times to get the desired result.

Continous Integration

Test runs can (rather, should) be incorporated into your continuous integration pipeline, to run as part of pull-request checks. This will ensure changes do not have any unintended consequences. Once reviewed, and approved, a pull-request merge can trigger deployment of the polcies across larger groups of AWS accounts, or even the entire AWS Organization.

To take it a step further, you can even run these tests parallely against all accounts the policies are applied to. If these fail for any reason, trigger a rollback.

Source Code

You can refer to this repo which contains a complete working example - a service control, and tag policy, as well as some BDD tests. It does not contain the continous integration pipeline, to deploy the policies or run the tests. But, it should be relatively straight forward to configure one, based on the steps, we have already discussed in the previous sections.

Caveats

As with most solutions, this isn’t a silver bullet, but rather a starting point to make changes more reliable and predictable. It’s good to be mindful of the following caveats when working with this pattern.

Tests can succeed or fail due to lack of permissions. Ensure you do baseline tests on existing infrastructure, as well as distinguish between different errors that can be returned by AWS API. For example, here is a list of errors returned by the EC2 API.
This example uses DryRun flag available for RunInstances action. This may not be the case for every resource. If this isn’t available, tests will have to create actual resources, and that means managing the whole lifecycle of such resources.
Tests need AWS account credentials, which will need to be setup in every account you want to run the tests in.

In light of the above caveats, it may not always be possible to encode and automate such tests to cover the service control or tag policy in their entirety. There might be an element of manual testing that is still required.

Closing Thoughts

Service control (and tag) policies are a great way to establish guardrails across an AWS Oganization and drive best practices. However, it is essential to test the changes before deploying them to live accounts. Introducing a BDD methodology helps make these changes more reliable, predictable and less likely to have unintended side effects.

Hope this post has given you some ideas on how to adapt this into your own environment. Please comment below if you want to add something or have any questions.

Service Control Policies and BDD

Introduction

Problem

Example

Solution

Test Scenarios

Test Specifications

Develop Organization Policies

Execute Tests

Continous Integration

Source Code

Caveats

Closing Thoughts

References (6)

Related

AWS EKS Okta Auth

Terraform, multi-account and multi-region workloads

Terraform, Bitbucket pipelines and OIDC

Introduction#

Problem#

Example#

Solution#

Test Scenarios#

Test Specifications#

Develop Organization Policies#

Execute Tests#

Continous Integration#

Source Code#

Caveats#

Closing Thoughts#

References (6)

Related

AWS EKS Okta Auth

Terraform, multi-account and multi-region workloads

Terraform, Bitbucket pipelines and OIDC

Introduction

Problem

Example

Solution

Test Scenarios

Test Specifications

Develop Organization Policies

Execute Tests

Continous Integration

Source Code

Caveats

Closing Thoughts