Home / Notebooks / Cloud
Cloud
intermediate

Amazon ECS Essentials

Essential AWS Elastic Container Service concepts for container orchestration

March 10, 2024
Updated regularly

Amazon ECS Essentials

Quick reference guide for AWS Elastic Container Service (ECS).

What is Amazon ECS?

Amazon ECS is a fully managed container orchestration service that:

  • Runs Docker containers on AWS infrastructure
  • Scales automatically based on demand
  • Integrates with AWS services (ALB, CloudWatch, IAM, etc.)
  • Supports two launch types: Fargate (serverless) and EC2 (self-managed)
  • Provides high availability across multiple availability zones
  • Core Concepts

    Architecture Components

    ECS Cluster
    ├── Services
    │   └── Tasks (running containers)
    ├── Task Definitions (container blueprint)
    ├── Container Instances (EC2 launch type)
    └── Fargate (serverless launch type)
    

    Key Components

  • Cluster: Logical grouping of tasks or services
  • Task Definition: Blueprint for your application (JSON)
  • Task: Instance of a task definition (one or more containers)
  • Service: Maintains desired number of tasks running
  • Container Instance: EC2 instance running ECS agent
  • Fargate: Serverless compute engine for containers
  • Launch Types

    Fargate vs EC2

    FeatureFargateEC2
    InfrastructureServerless (AWS managed)Self-managed EC2 instances
    PricingPay per task (vCPU + memory)Pay for EC2 instances
    ScalingAutomaticManual (ASG) + ECS scaling
    Use CaseMicroservices, batch jobsCost optimization, custom needs
    MaintenanceNonePatch OS, manage instances
    Startup TimeSlower (~1 min)Faster (~10 sec)

    Task Definitions

    Basic Task Definition

    {
      "family": "my-app",
      "networkMode": "awsvpc",
      "requiresCompatibilities": ["FARGATE"],
      "cpu": "256",
      "memory": "512",
      "containerDefinitions": [
        {
          "name": "app",
          "image": "nginx:latest",
          "portMappings": [
            {
              "containerPort": 80,
              "protocol": "tcp"
            }
          ],
          "essential": true,
          "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
              "awslogs-group": "/ecs/my-app",
              "awslogs-region": "us-east-1",
              "awslogs-stream-prefix": "ecs"
            }
          }
        }
      ]
    }
    

    Multi-Container Task

    {
      "family": "web-app-with-sidecar",
      "networkMode": "awsvpc",
      "requiresCompatibilities": ["FARGATE"],
      "cpu": "512",
      "memory": "1024",
      "containerDefinitions": [
        {
          "name": "web-app",
          "image": "myapp:latest",
          "portMappings": [
            {
              "containerPort": 8080,
              "protocol": "tcp"
            }
          ],
          "essential": true,
          "environment": [
            {
              "name": "DATABASE_URL",
              "value": "postgres://db:5432/myapp"
            }
          ],
          "secrets": [
            {
              "name": "DB_PASSWORD",
              "valueFrom": "arn:aws:secretsmanager:region:account:secret:db-password"
            }
          ],
          "dependsOn": [
            {
              "containerName": "log-router",
              "condition": "START"
            }
          ]
        },
        {
          "name": "log-router",
          "image": "fluent/fluentd:latest",
          "essential": false,
          "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
              "awslogs-group": "/ecs/fluentd",
              "awslogs-region": "us-east-1",
              "awslogs-stream-prefix": "logs"
            }
          }
        }
      ]
    }
    

    Task Definition with Health Check

    {
      "containerDefinitions": [
        {
          "name": "app",
          "image": "myapp:latest",
          "healthCheck": {
            "command": [
              "CMD-SHELL",
              "curl -f http://localhost/health || exit 1"
            ],
            "interval": 30,
            "timeout": 5,
            "retries": 3,
            "startPeriod": 60
          },
          "portMappings": [
            {
              "containerPort": 80
            }
          ]
        }
      ]
    }
    

    AWS CLI Commands

    Cluster Management

    # ========== Create Cluster ==========
    aws ecs create-cluster --cluster-name my-cluster
    
    # ========== List Clusters ==========
    aws ecs list-clusters
    
    # ========== Describe Cluster ==========
    aws ecs describe-clusters --clusters my-cluster
    
    # ========== Delete Cluster ==========
    aws ecs delete-cluster --cluster my-cluster
    

    Task Definitions

    # ========== Register Task Definition ==========
    aws ecs register-task-definition \
      --cli-input-json file://task-definition.json
    
    # ========== List Task Definitions ==========
    aws ecs list-task-definitions
    
    # ========== Describe Task Definition ==========
    aws ecs describe-task-definition \
      --task-definition my-app:1
    
    # ========== Deregister Task Definition ==========
    aws ecs deregister-task-definition \
      --task-definition my-app:1
    

    Running Tasks

    # ========== Run Task (One-Time) ==========
    aws ecs run-task \
      --cluster my-cluster \
      --task-definition my-app:1 \
      --launch-type FARGATE \
      --network-configuration "awsvpcConfiguration={subnets=[subnet-12345],securityGroups=[sg-12345],assignPublicIp=ENABLED}" \
      --count 1
    
    # ========== List Tasks ==========
    aws ecs list-tasks --cluster my-cluster
    
    # ========== Describe Tasks ==========
    aws ecs describe-tasks \
      --cluster my-cluster \
      --tasks task-id-12345
    
    # ========== Stop Task ==========
    aws ecs stop-task \
      --cluster my-cluster \
      --task task-id-12345
    

    Services

    # ========== Create Service ==========
    aws ecs create-service \
      --cluster my-cluster \
      --service-name my-service \
      --task-definition my-app:1 \
      --desired-count 2 \
      --launch-type FARGATE \
      --network-configuration "awsvpcConfiguration={subnets=[subnet-12345],securityGroups=[sg-12345],assignPublicIp=ENABLED}"
    
    # ========== Update Service ==========
    aws ecs update-service \
      --cluster my-cluster \
      --service my-service \
      --desired-count 5
    
    # ========== Update Service with New Task Definition ==========
    aws ecs update-service \
      --cluster my-cluster \
      --service my-service \
      --task-definition my-app:2
    
    # ========== Delete Service ==========
    aws ecs delete-service \
      --cluster my-cluster \
      --service my-service \
      --force
    

    Service with Load Balancer

    Application Load Balancer Integration

    {
      "cluster": "my-cluster",
      "serviceName": "web-service",
      "taskDefinition": "web-app:1",
      "desiredCount": 3,
      "launchType": "FARGATE",
      "networkConfiguration": {
        "awsvpcConfiguration": {
          "subnets": ["subnet-12345", "subnet-67890"],
          "securityGroups": ["sg-12345"],
          "assignPublicIp": "ENABLED"
        }
      },
      "loadBalancers": [
        {
          "targetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets",
          "containerName": "web-app",
          "containerPort": 80
        }
      ],
      "healthCheckGracePeriodSeconds": 60
    }
    

    Create Service with ALB (CLI)

    aws ecs create-service \
      --cluster my-cluster \
      --service-name web-service \
      --task-definition web-app:1 \
      --desired-count 3 \
      --launch-type FARGATE \
      --network-configuration "awsvpcConfiguration={subnets=[subnet-1,subnet-2],securityGroups=[sg-12345]}" \
      --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg,containerName=web,containerPort=80" \
      --health-check-grace-period-seconds 60
    

    Auto Scaling

    Target Tracking Scaling

    # ========== Register Scalable Target ==========
    aws application-autoscaling register-scalable-target \
      --service-namespace ecs \
      --resource-id service/my-cluster/my-service \
      --scalable-dimension ecs:service:DesiredCount \
      --min-capacity 2 \
      --max-capacity 10
    
    # ========== Create Scaling Policy (CPU) ==========
    aws application-autoscaling put-scaling-policy \
      --service-namespace ecs \
      --resource-id service/my-cluster/my-service \
      --scalable-dimension ecs:service:DesiredCount \
      --policy-name cpu-scaling-policy \
      --policy-type TargetTrackingScaling \
      --target-tracking-scaling-policy-configuration '{
        "TargetValue": 70.0,
        "PredefinedMetricSpecification": {
          "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
        },
        "ScaleInCooldown": 300,
        "ScaleOutCooldown": 60
      }'
    
    # ========== Create Scaling Policy (Memory) ==========
    aws application-autoscaling put-scaling-policy \
      --service-namespace ecs \
      --resource-id service/my-cluster/my-service \
      --scalable-dimension ecs:service:DesiredCount \
      --policy-name memory-scaling-policy \
      --policy-type TargetTrackingScaling \
      --target-tracking-scaling-policy-configuration '{
        "TargetValue": 80.0,
        "PredefinedMetricSpecification": {
          "PredefinedMetricType": "ECSServiceAverageMemoryUtilization"
        }
      }'
    
    # ========== Create Scaling Policy (ALB Request Count) ==========
    aws application-autoscaling put-scaling-policy \
      --service-namespace ecs \
      --resource-id service/my-cluster/my-service \
      --scalable-dimension ecs:service:DesiredCount \
      --policy-name request-count-policy \
      --policy-type TargetTrackingScaling \
      --target-tracking-scaling-policy-configuration '{
        "TargetValue": 1000.0,
        "PredefinedMetricSpecification": {
          "PredefinedMetricType": "ALBRequestCountPerTarget",
          "ResourceLabel": "app/my-alb/xxx/targetgroup/my-tg/yyy"
        }
      }'
    

    Task Placement Strategies (EC2 Launch Type)

    Placement Strategies

    {
      "placementStrategy": [
        {
          "type": "spread",
          "field": "attribute:ecs.availability-zone"
        },
        {
          "type": "binpack",
          "field": "memory"
        }
      ],
      "placementConstraints": [
        {
          "type": "memberOf",
          "expression": "attribute:ecs.instance-type =~ t3.*"
        }
      ]
    }
    

    Strategy Types:

  • spread: Distribute tasks evenly (across AZs, instances)
  • binpack: Pack tasks on instances (minimize instance count)
  • random: Place tasks randomly
  • Service Discovery

    Cloud Map Integration

    # ========== Create Private DNS Namespace ==========
    aws servicediscovery create-private-dns-namespace \
      --name local \
      --vpc vpc-12345
    
    # ========== Create Service Discovery Service ==========
    aws servicediscovery create-service \
      --name my-app \
      --namespace-id ns-12345 \
      --dns-config '{
        "DnsRecords": [
          {
            "Type": "A",
            "TTL": 60
          }
        ]
      }'
    
    # ========== Create ECS Service with Service Discovery ==========
    aws ecs create-service \
      --cluster my-cluster \
      --service-name my-service \
      --task-definition my-app:1 \
      --desired-count 2 \
      --launch-type FARGATE \
      --network-configuration "awsvpcConfiguration={subnets=[subnet-12345],securityGroups=[sg-12345]}" \
      --service-registries "registryArn=arn:aws:servicediscovery:region:account:service/srv-12345"
    

    Secrets Management

    Using AWS Secrets Manager

    {
      "containerDefinitions": [
        {
          "name": "app",
          "image": "myapp:latest",
          "secrets": [
            {
              "name": "DB_PASSWORD",
              "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password-AbCdEf"
            },
            {
              "name": "API_KEY",
              "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:api-key-XyZ123:key::"
            }
          ]
        }
      ]
    }
    

    Using Systems Manager Parameter Store

    {
      "containerDefinitions": [
        {
          "name": "app",
          "image": "myapp:latest",
          "secrets": [
            {
              "name": "DATABASE_URL",
              "valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
            }
          ]
        }
      ]
    }
    

    Logging and Monitoring

    CloudWatch Logs

    {
      "containerDefinitions": [
        {
          "name": "app",
          "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
              "awslogs-group": "/ecs/my-app",
              "awslogs-region": "us-east-1",
              "awslogs-stream-prefix": "ecs",
              "awslogs-create-group": "true"
            }
          }
        }
      ]
    }
    

    View Logs with CLI

    # ========== Get Log Events ==========
    aws logs tail /ecs/my-app --follow
    
    # ========== Filter Logs ==========
    aws logs filter-log-events \
      --log-group-name /ecs/my-app \
      --filter-pattern "ERROR" \
      --start-time $(date -d '1 hour ago' +%s)000
    

    Container Insights

    # ========== Enable Container Insights ==========
    aws ecs update-cluster-settings \
      --cluster my-cluster \
      --settings name=containerInsights,value=enabled
    

    Deployment Strategies

    Rolling Update

    {
      "deploymentConfiguration": {
        "maximumPercent": 200,
        "minimumHealthyPercent": 100,
        "deploymentCircuitBreaker": {
          "enable": true,
          "rollback": true
        }
      }
    }
    

    Blue/Green Deployment

    # Using CodeDeploy for Blue/Green
    aws deploy create-deployment \
      --application-name my-app \
      --deployment-group-name my-deployment-group \
      --revision '{
        "revisionType": "AppSpecContent",
        "appSpecContent": {
          "content": "{
            \"version\": 0.0,
            \"Resources\": [{
              \"TargetService\": {
                \"Type\": \"AWS::ECS::Service\",
                \"Properties\": {
                  \"TaskDefinition\": \"my-app:2\",
                  \"LoadBalancerInfo\": {
                    \"ContainerName\": \"web\",
                    \"ContainerPort\": 80
                  }
                }
              }
            }]
          }"
        }
      }'
    

    IAM Roles

    Task Execution Role

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ecr:GetAuthorizationToken",
            "ecr:BatchCheckLayerAvailability",
            "ecr:GetDownloadUrlForLayer",
            "ecr:BatchGetImage",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "secretsmanager:GetSecretValue",
            "ssm:GetParameters"
          ],
          "Resource": [
            "arn:aws:secretsmanager:region:account:secret:my-secret-*",
            "arn:aws:ssm:region:account:parameter/prod/*"
          ]
        }
      ]
    }
    

    Task Role

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "s3:PutObject"
          ],
          "Resource": "arn:aws:s3:::my-bucket/*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "dynamodb:GetItem",
            "dynamodb:PutItem",
            "dynamodb:Query"
          ],
          "Resource": "arn:aws:dynamodb:region:account:table/my-table"
        }
      ]
    }
    

    Capacity Providers

    Fargate Capacity Provider

    # ========== Create Capacity Provider Strategy ==========
    aws ecs put-cluster-capacity-providers \
      --cluster my-cluster \
      --capacity-providers FARGATE FARGATE_SPOT \
      --default-capacity-provider-strategy \
        capacityProvider=FARGATE,weight=1,base=2 \
        capacityProvider=FARGATE_SPOT,weight=4
    

    EC2 Auto Scaling Group Capacity Provider

    # ========== Create Capacity Provider ==========
    aws ecs create-capacity-provider \
      --name my-capacity-provider \
      --auto-scaling-group-provider '{
        "autoScalingGroupArn": "arn:aws:autoscaling:region:account:autoScalingGroup:id:autoScalingGroupName/my-asg",
        "managedScaling": {
          "status": "ENABLED",
          "targetCapacity": 80,
          "minimumScalingStepSize": 1,
          "maximumScalingStepSize": 10
        },
        "managedTerminationProtection": "ENABLED"
      }'
    

    Best Practices

    Task Definition Best Practices

    {
      "family": "production-app",
      "taskRoleArn": "arn:aws:iam::account:role/task-role",
      "executionRoleArn": "arn:aws:iam::account:role/execution-role",
      "networkMode": "awsvpc",
      "requiresCompatibilities": ["FARGATE"],
      "cpu": "512",
      "memory": "1024",
      "containerDefinitions": [
        {
          "name": "app",
          "image": "account.dkr.ecr.region.amazonaws.com/my-app:v1.0.0",
          "essential": true,
          "readonlyRootFilesystem": true,
          "user": "1000:1000",
          "healthCheck": {
            "command": ["CMD-SHELL", "curl -f http://localhost/health || exit 1"],
            "interval": 30,
            "timeout": 5,
            "retries": 3,
            "startPeriod": 60
          },
          "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
              "awslogs-group": "/ecs/production/my-app",
              "awslogs-region": "us-east-1",
              "awslogs-stream-prefix": "ecs"
            }
          },
          "environment": [
            {
              "name": "ENVIRONMENT",
              "value": "production"
            }
          ],
          "secrets": [
            {
              "name": "DB_PASSWORD",
              "valueFrom": "arn:aws:secretsmanager:region:account:secret:prod/db-password"
            }
          ]
        }
      ]
    }
    

    Service Configuration Best Practices

  • Use Application Load Balancer for HTTP/HTTPS traffic
  • Enable Circuit Breaker for automatic rollback on failures
  • Set Health Check Grace Period for slow-starting containers
  • Use Service Discovery for microservices communication
  • Enable Container Insights for monitoring
  • Configure Auto Scaling based on metrics
  • Use multiple availability zones for high availability
  • Implement proper health checks
  • Security Best Practices

  • Use IAM roles (never embed credentials)
  • Store secrets in Secrets Manager or Parameter Store
  • Use private subnets with NAT Gateway
  • Restrict security groups to minimum required
  • Enable VPC Flow Logs for network monitoring
  • Use ECR image scanning for vulnerabilities
  • Implement least privilege IAM policies
  • Enable CloudTrail for audit logging
  • Troubleshooting

    Common Issues

    # ========== Task Fails to Start ==========
    # Check task stopped reason
    aws ecs describe-tasks \
      --cluster my-cluster \
      --tasks task-id \
      --query 'tasks[0].stoppedReason'
    
    # Check container exit code
    aws ecs describe-tasks \
      --cluster my-cluster \
      --tasks task-id \
      --query 'tasks[0].containers[0].exitCode'
    
    # ========== Service Deployment Stuck ==========
    # Check service events
    aws ecs describe-services \
      --cluster my-cluster \
      --services my-service \
      --query 'services[0].events[:10]'
    
    # ========== Cannot Pull Image ==========
    # Verify ECR permissions
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin account.dkr.ecr.region.amazonaws.com
    
    # Check task execution role has ECR permissions
    
    # ========== Connection Issues ==========
    # Verify security groups allow traffic
    # Check VPC route tables
    # Verify target group health checks
    

    Cost Optimization

    Fargate Cost Optimization

  • Right-size tasks (don't over-provision CPU/memory)
  • Use Fargate Spot for fault-tolerant workloads (up to 70% savings)
  • Schedule tasks for non-critical workloads
  • Use capacity provider strategy to mix Fargate and Fargate Spot
  • EC2 Cost Optimization

  • Use Reserved Instances for predictable workloads
  • Use Spot Instances with capacity providers
  • Right-size instance types based on task requirements
  • Use Auto Scaling to match demand
  • Consolidate tasks on fewer instances with binpack strategy
  • Resources

  • AWS ECS Documentation
  • ECS Best Practices Guide
  • AWS Fargate Documentation
  • ECS CLI Reference
  • Container Insights Documentation
  • Topics

    AWSECSDockerContainerCloudDevOps

    Found This Helpful?

    If you have questions or suggestions for improving these notes, I'd love to hear from you.