These days I stumbled upon an interesting problem, the solution of which I would like to share with you here. It was difficult for me to find a suitable solution on the Internet, which is why I would like to make this article available to you here.

tl; dr

Use this yaml code in order to create a StepFunction triggered by a Cloudwatch/EventBridge event which starts any arbritrary ECS task by passing a start command to the target task.

Background

I had the task of migrating a cluster of applications from the customer’s own data center to AWS. It turned out that these applications are often only the same application that is called with different parameters. Our customer has already done preliminary work and containerized the application. The result was an image whose container should be started with different start commands. The applications should always be started at certain times (batch processing). The infrastructure should be made available to the application team in the form of CloudFormation scripts.

The idea

The first thing that came to mind was ECS as the service for running the containers. Coupled with Cloudwatch events, this service is an easy way to start containers on a time-controlled basis. However, it quickly became apparent that a task definition and associated other resources should have been created for each of these containers, which only differed from one another by their start command. That would have bloated the code extremely and resulted in poorly maintainable code. Unfortunately, CloudFormation also offers very few possibilities (if at all) to iterate over arrays or the like. Further considerations then led me to the idea of storing the commands in the Cloudwatch events and giving them to the containers at the start. Unfortunately, this turned out to be a dead end, as this is not technically supported.

So I thought about an alternative solution and came up with the following idea:

Instead of starting the containers (tasks) directly via Cloudwatch Events, I switch a StepFunction in between. This can receive parameters from Cloudwatch and also supports the transfer of these to the ECS task. In addition, you have the option to switch several calls in a row, should that be necessary. With a StepFunction I can now start all tasks with different start commands as long as they use the same image. An initial search on the Internet was disappointing, however: there was hardly any suitable documentation to be found. So I had to figure out how to do it myself.

The implementation

First, two roles need to be created. One role is required for the Cloudwatch event, the other for the execution of the actual StepFunction:

  InvokeStatemachineRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - events.amazonaws.com
            Action:
              - "sts:AssumeRole"
      Path: /
      Policies:
        - PolicyName: CloudWatchLogsDeliveryFullAccessPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - "logs:CreateLogDelivery"
                  - "logs:GetLogDelivery"
                  - "logs:UpdateLogDelivery"
                  - "logs:DeleteLogDelivery"
                  - "logs:ListLogDeliveries"
                  - "logs:PutResourcePolicy"
                  - "logs:DescribeResourcePolicies"
                  - "logs:DescribeLogGroups"
                Resource: "*"
        - PolicyName: StatemachineInvokePolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - "states:DescribeExecution"
                  - "states:StartExecution"
                  - "states:StopExecution"
                Resource: "*"

  StatemachineRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - states.amazonaws.com
            Action:
              - "sts:AssumeRole"
      Path: /
      Policies:
        - PolicyName: StatemachineExecutionPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - states:DescribeExecution
                  - states:StartExecution
                  - states:StopExecution
                  - ecs:DescribeTaskDefinition
                  - ecs:DescribeTasks
                  - ecs:ListTaskDefinitions
                  - ecs:ListTasks
                  - ecs:RunTask
                  - ecs:StartTask
                  - ecs:StopTask
                  - logs:CreateLogGroup
                  - logs:CreateLogDelivery
                  - logs:CreateLogStream
                  - logs:DescribeResourcePolicies
                  - logs:DescribeLogGroups
                  - logs:DescribeLogStreams
                  - logs:DeleteDestination
                  - logs:GetLogDelivery
                  - logs:ListTagsLogGroup
                  - logs:PutMetricFilter
                  - logs:TagLogGroup
                  - logs:DescribeSubscriptionFilters
                  - logs:FilterLogEvents
                  - logs:PutSubscriptionFilter
                  - logs:PutResourcePolicy
                  - logs:PutDestination
                  - logs:PutDestinationPolicy"
                  - logs:UpdateLogDelivery
                  - logs:DeleteLogDelivery
                  - logs:ListLogDeliveries
                  - logs:PutResourcePolicy
                  - events:PutTargets
                  - events:PutRule
                  - events:DescribeRule
                  - iam:PassRole
                Resource: "*"

In order to find out what happens when the StepFunction is executed, logs should be collected in Cloudwatch. A so-called LogGroup is required for this:

  CloudwatchLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      RetentionInDays: 7
      LogGroupName: my/loggroup

Now that the roles exist, the EventRules can be created. An example is shown here:

  ScheduledRule:
    Type: AWS::Events::Rule
    Properties:
      Description: "ScheduledRule"
      ScheduleExpression: "cron(45 21 * * ? *)" #Mon-Sun 23:45 CET
      State: "ENABLED"
      Targets:
        - Arn: !Ref MYStateMachine
          Id: MYStateMachine
          Input: |
            {
              "commands": [
                "put-your-command-here"
              ]
            }
          RoleArn: !GetAtt InvokeStatemachineRole.Arn
      RoleArn: !GetAtt InvokeStatemachineRole.Arn

The EventRule transfers the start commands to the StepFunction in the form of a JSON object.

Now comes the interesting part: The StepFunction.

  MYStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: MYStateMachine
      DefinitionString: !Sub |-
        {
          "Comment": "This is your state machine",
          "StartAt": "ECS RunTask",
          "States": {
            "ECS RunTask": {
              "Type": "Task",
              "Resource": "arn:aws:states:::ecs:runTask.sync",
              "Parameters": {
                "LaunchType": "FARGATE",
                "Cluster": "your-ecs-cluster-arn",
                "TaskDefinition": "your-ecs-taskdefinition-arn",
                "NetworkConfiguration": {
                  "AwsvpcConfiguration": {
                    "Subnets": [
                      "your-subnet-id",
                      "your-subnet-id"
                    ],
                    "SecurityGroups": ["your-security-group"],
                    "AssignPublicIp": "DISABLED"
                  }
                },
                "Overrides": {
                  "ContainerOverrides": [{
                      "Command.$": "$.commands",
                      "Name": "your-container-name"
                  }]
                }
              },
              "End": true
            }
          }
        }
      RoleArn: !GetAtt StatemachineRole.Arn
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
              LogGroupArn: !GetAtt CloudwatchLogGroup.Arn
        Level: ERROR

The so-called DefinitionString defines how the StepFunction behaves - i.e. which transitions it comprises of, in which order they are called and what the input and output variables are. The following line is particularly interesting:

"Command.$": "$.Commands",

This line ensures that the commands section of the provided input JSON object are passed to the ECS task as is.

Here´s the whole yaml file including the roles, the event rule and the state machine:

Putting it all together

Resources:
  InvokeStatemachineRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - events.amazonaws.com
            Action:
              - "sts:AssumeRole"
      Path: /
      Policies:
        - PolicyName: CloudWatchLogsDeliveryFullAccessPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - "logs:CreateLogDelivery"
                  - "logs:GetLogDelivery"
                  - "logs:UpdateLogDelivery"
                  - "logs:DeleteLogDelivery"
                  - "logs:ListLogDeliveries"
                  - "logs:PutResourcePolicy"
                  - "logs:DescribeResourcePolicies"
                  - "logs:DescribeLogGroups"
                Resource: "*"
        - PolicyName: StatemachineInvokePolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - "states:DescribeExecution"
                  - "states:StartExecution"
                  - "states:StopExecution"
                Resource: "*"

  StatemachineRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - states.amazonaws.com
            Action:
              - "sts:AssumeRole"
      Path: /
      Policies:
        - PolicyName: AdminPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - states:DescribeExecution
                  - states:StartExecution
                  - states:StopExecution
                  - ecs:DescribeTaskDefinition
                  - ecs:DescribeTasks
                  - ecs:ListTaskDefinitions
                  - ecs:ListTasks
                  - ecs:RunTask
                  - ecs:StartTask
                  - ecs:StopTask
                  - logs:CreateLogGroup
                  - logs:CreateLogDelivery
                  - logs:CreateLogStream
                  - logs:DescribeResourcePolicies
                  - logs:DescribeLogGroups
                  - logs:DescribeLogStreams
                  - logs:DeleteDestination
                  - logs:GetLogDelivery
                  - logs:ListTagsLogGroup
                  - logs:PutMetricFilter
                  - logs:TagLogGroup
                  - logs:DescribeSubscriptionFilters
                  - logs:FilterLogEvents
                  - logs:PutSubscriptionFilter
                  - logs:PutResourcePolicy
                  - logs:PutDestination
                  - logs:PutDestinationPolicy"
                  - logs:UpdateLogDelivery
                  - logs:DeleteLogDelivery
                  - logs:ListLogDeliveries
                  - logs:PutResourcePolicy
                  - events:PutTargets
                  - events:PutRule
                  - events:DescribeRule
                  - iam:PassRole
                Resource: "*"


  ScheduledRule:
    Type: AWS::Events::Rule
    Properties:
      Description: "ScheduledRule"
      ScheduleExpression: "cron(45 21 * * ? *)" #Mon-Sun 23:45 CET
      State: "ENABLED"
      Targets:
        - Arn: !Ref MYStateMachine
          Id: MYStateMachine
          Input: |
            {
              "commands": [
                "put-your-command-here"
              ]
            }
          RoleArn: !GetAtt InvokeStatemachineRole.Arn
      RoleArn: !GetAtt InvokeStatemachineRole.Arn

  CloudwatchLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      RetentionInDays: 7
      LogGroupName: my/loggroup

  MYStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: MYStateMachine
      DefinitionString: !Sub |-
        {
          "Comment": "This is your state machine",
          "StartAt": "ECS RunTask",
          "States": {
            "ECS RunTask": {
              "Type": "Task",
              "Resource": "arn:aws:states:::ecs:runTask.sync",
              "Parameters": {
                "LaunchType": "FARGATE",
                "Cluster": "your-ecs-cluster-arn",
                "TaskDefinition": "your-ecs-taskdefinition-arn",
                "NetworkConfiguration": {
                  "AwsvpcConfiguration": {
                    "Subnets": [
                      "your-subnet-id",
                      "your-subnet-id"
                    ],
                    "SecurityGroups": ["your-security-group"],
                    "AssignPublicIp": "DISABLED"
                  }
                },
                "Overrides": {
                  "ContainerOverrides": [{
                      "Command.$": "$.commands",
                      "Name": "your-container-name"
                  }]
                }
              },
              "End": true
            }
          }
        }
      RoleArn: !GetAtt StatemachineRole.Arn
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
              LogGroupArn: !GetAtt CloudwatchLogGroup.Arn
        Level: ERROR

Looking beyond

When thinking about StepFunctions, you usually have a use case which requires sequential or parallel execution of multiple tasks. This is of course also possible with the solution showed above. However, one problem I faced was that the output of the first task was passed as input to the second task, which resulted in a loss of the initial commands. To solve this issue, make sure you specify the following line in the task definition of your first task: "ResultPath": null. This passes the raw execution input to the subsequent task. See this SO article for more information on this.