Skip to main content

Terraform SCP Management

Last Updated: 2026-03-06 Source: https://github.com/co-cddo/ndx-try-aws-scp Captured SHA: 3443cac

Executive Summary

The ndx-try-aws-scp repository implements a 4-layer defense-in-depth cost control and observability system using Terraform modules to protect Innovation Sandbox leases from cost abuse and monitor pool health. The four layers are Service Control Policies (prevention), AWS Budgets with per-account isolation and service-specific tracking (detection), a DynamoDB Billing Enforcer Lambda (auto-remediation), and OU Metrics CloudWatch Alarms (observability). The system originated from a need to override and extend the default ISB SCPs to support NDX scenarios (Textract async operations, Bedrock cross-region inference) while simultaneously introducing comprehensive cost guardrails and operational monitoring that the upstream ISB platform lacks.

Design Context

The PROPOSAL.md in this repository documents the original problem statement: the Innovation Sandbox default SCPs were too restrictive for NDX scenarios (blocking Textract async operations and Bedrock cross-region inference) while simultaneously lacking cost controls. Investigation in January 2025 found that some issues were already resolved (Bedrock cross-region had an existing exception) while others required SCP modifications (Textract async operations) and entirely new SCPs (cost avoidance). The Terraform approach was chosen to take ownership of existing ISB-managed SCPs via terraform import and create new policies, avoiding conflicts with the LZA SCP revert mechanism.


Architecture Overview


Module: scp-manager

Location: modules/scp-manager/

The SCP manager creates and attaches up to 5 Service Control Policies to the Innovation Sandbox pool OU. All policies exempt a standard set of administrative role ARN patterns from restrictions:

arn:aws:iam::*:role/InnovationSandbox-ndx*
arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*AWSReservedSSO_ndx_IsbAdmins*
arn:aws:iam::*:role/stacksets-exec-*
arn:aws:iam::*:role/AWSControlTowerExecution

SCPs Created

SCP NameAlways CreatedAttached ToPurpose
InnovationSandboxAwsNukeSupportedServicesScpYessandbox_ou_idAllowlist of ~130 services via NotAction deny
InnovationSandboxRestrictionsScpYessandbox_ou_idRegion lock, Bedrock model deny, security isolation, cost implications, operational restrictions
InnovationSandboxCostAvoidanceComputeScpWhen enable_cost_avoidance = truecost_avoidance_ou_id or sandbox_ou_idEC2, EBS, RDS, ElastiCache, EKS, ASG limits
InnovationSandboxCostAvoidanceServicesScpWhen enable_cost_avoidance = truecost_avoidance_ou_id or sandbox_ou_idBlock expensive ML/data/misc services
InnovationSandboxIamWorkloadIdentityScpWhen enable_iam_workload_identity = truesandbox_ou_idControlled IAM role/user creation with privilege escalation prevention

The cost avoidance SCP is split into two policies (Compute and Services) due to the AWS 5,120 character limit per SCP.

Service Allowlist (Nuke Supported Services)

The InnovationSandboxAwsNukeSupportedServicesScp uses a NotAction deny pattern to restrict sandbox accounts to approximately 130 services that AWS Nuke can clean up. Notable additions beyond the ISB defaults include Textract async operations:

  • textract:StartDocumentAnalysis, textract:StartDocumentTextDetection
  • textract:StartExpenseAnalysis, textract:StartLendingAnalysis
  • textract:GetDocumentAnalysis, textract:GetDocumentTextDetection
  • textract:GetExpenseAnalysis, textract:GetLendingAnalysis, textract:GetLendingAnalysisSummary

Restrictions SCP

The restrictions SCP implements five categories of controls:

Region Lock: Denies all actions (except Bedrock) outside us-east-1 and us-west-2. Bedrock is excluded to allow cross-region inference profiles.

Expensive Bedrock Models: Denies invocation of Claude Opus and Claude Sonnet models via ARN pattern matching on anthropic.claude*opus* and anthropic.claude*sonnet*.

Security and Isolation: Blocks account portal access, CloudTrail service-linked channel modification, Transit Gateway peer association, RAM resource sharing, SSM document permission modification, and WAF Firewall Manager disassociation. Also blocks cloudtrail:LookupEvents to prevent event log access by sandbox users.

Cost Implications: Blocks billing modifications, Cost Explorer configuration, reserved instance purchases, Savings Plans creation, and Shield subscriptions across 16 services.

Operational Restrictions: Blocks 40+ potentially dangerous or expensive actions including region enablement, CloudHSM usage, Direct Connect, Migration Hub, RoboMaker fleet management, Route53 Domains, and storage gateway operations.

Compute Cost Controls

ResourceControlDefault Values
EC2 Instance TypesAllowlistt2.micro/small/medium, t3.micro-large, t3a.micro-large, m5.large/xlarge, m6i.large/xlarge
EC2 Denied TypesExplicit denyp*, g*, inf*, trn*, dl*, u-*, .metal, *.12xlarge and larger
EBS Volume TypesDeny io1/io2io1, io2 blocked
EBS Volume SizeMax limit500 GB
RDS Instance ClassesAllowlistdb.t3., db.t4g., db.m5.large/xlarge, db.m6g/m6i.large/xlarge
RDS Multi-AZConfigurableAllowed (default: true in production)
ElastiCache Node TypesAllowlistcache.t3., cache.t4g., cache.m5.large, cache.m6g.large
EKS Nodegroup SizeMax limit5 nodes
ASG Max SizeMax limit10 instances
Lambda Provisioned ConcurrencyBlockedPutProvisionedConcurrencyConfig denied

Expensive Services Blocked

SageMaker (endpoints, training jobs, tuning), EMR (RunJobFlow), Redshift (CreateCluster), GameLift (CreateFleet), plus 20+ additional services: Kafka, FSx, Kinesis streams, Dedicated Hosts, Reserved Instance purchases, Neptune, DocumentDB, MemoryDB, Elasticsearch/OpenSearch, Batch, Glue jobs/dev endpoints, Timestream, and QLDB.

IAM Workload Identity SCP (Optional)

When enabled, this SCP allows sandbox users to create IAM roles and users for workloads (EC2 instance profiles, Lambda execution roles) while preventing privilege escalation. Users are blocked from creating or modifying roles matching protected patterns (Admin*, OrganizationAccountAccessRole, AWSAccelerator*, AWSControlTower*, InnovationSandbox*) and from passing or assuming these privileged roles.


Module: budgets-manager

Location: modules/budgets-manager/

The budgets module implements dynamic per-account budget creation. The production environment's main.tf uses aws_organizations_organizational_unit_descendant_accounts to auto-discover all ACTIVE accounts in the sandbox pool OU, then creates individual budgets for each account. This eliminates manual account ID management and scales automatically as new pool accounts are added.

Budget Types

Per-Account Daily Budget: $50/day per sandbox account with notifications at 10%, 50%, and 100% of actual spend. Each account gets its own isolated budget to prevent one account consuming another's allocation.

Per-Account Monthly Budget: $1000/month per sandbox account with notifications at 85% and 100% actual plus 100% forecasted spend.

Consolidated Fallback: If sandbox_account_ids is not provided, a single consolidated budget is created instead.

Service-Specific Budgets

10 service-specific daily budgets provide granular visibility across all sandbox accounts:

ServiceDaily LimitAlert ThresholdsFilter
EC2 Compute$10080%, 100%Service filter
RDS$3080%, 100%Service filter
Lambda$5080%, 100%Service filter
DynamoDB$5080%, 100%Service filter
Bedrock$5050%, 80%, 100%Service filter
CloudWatchconfigurable50%, 80%, 100%Service filter
Step Functionsconfigurable80%, 100%Service filter
S3configurable80%, 100%Service filter
API Gatewayconfigurable80%, 100%Service filter
Data Transfer$2080%, 100%UsageType filter

Bedrock and CloudWatch budgets include an additional 50% threshold for earlier detection due to their high abuse potential.

Automated Actions

When enable_automated_actions is true, an IAM role is created allowing AWS Budgets to:

  • Stop EC2 instances tagged with ManagedBy: InnovationSandbox
  • Stop RDS instances and clusters
  • Attach AWSDenyAll policy to users/roles (emergency lockdown)

Module: dynamodb-billing-enforcer

Location: modules/dynamodb-billing-enforcer/

This module closes a critical cost control gap: DynamoDB On-Demand billing mode bypasses all WCU/RCU service quotas, allowing potentially unlimited costs.

Architecture

Implementation Details

  • Trigger: EventBridge rule matching CloudTrail events for dynamodb.amazonaws.com with CreateTable or UpdateTable event names
  • Runtime: Python 3.11 Lambda with 30-second timeout
  • Action: Deletes On-Demand tables and publishes an SNS alert and EventBridge event
  • Exemptions: Tables with name prefixes matching exempt_table_prefixes are not enforced
  • Log Retention: 7 days (minimized for cost control)
  • Permissions: dynamodb:DescribeTable, dynamodb:DeleteTable, sns:Publish, events:PutEvents

Module: ou-metrics-alarms

Location: modules/ou-metrics-alarms/

This module consumes CloudWatch custom metrics published by the OU metrics stop-gap service (innovation-sandbox-on-aws-ou-metrics) and creates CloudWatch alarms for pool health monitoring. Metrics are published to the InnovationSandbox/OUMetrics namespace.

Alarms Created

AlarmMetricConditionRationale
Low Available AccountsAvailableAccounts< threshold for 1 datapointPool running low, users may not get a sandbox
Stuck Entry AccountsEntryAccounts> 0 for 4 datapoints (~1 hour)Accounts stuck transitioning into the pool
Stuck Exit AccountsExitAccounts> 0 for 4 datapoints (~1 hour)Accounts stuck transitioning out of the pool
Metrics StaleTotalManagedAccountsINSUFFICIENT_DATA for ~30 minLambda may be failing (uses treat_missing_data = breaching)

All alarms send notifications to the configured SNS topic. The metrics_stale alarm uses treat_missing_data = "breaching" so that INSUFFICIENT_DATA triggers the alarm, catching Lambda failures.

Configuration

VariableDescriptionDefault
namespaceAlarm name prefixrequired
sns_topic_arnSNS topic for alarm notificationsrequired
available_accounts_thresholdLow available accounts threshold10
metric_period_secondsMetric evaluation period900 (15 min)

Deployment

State Management

Terraform state is stored in S3 with DynamoDB locking:

Bucket: ndx-terraform-state-955063685555
Key: scp-overrides/terraform.tfstate
Region: eu-west-2
DynamoDB Lock Table: ndx-terraform-locks

Production Environment

environments/ndx-production/
main.tf - Module orchestration with dynamic account discovery
variables.tf - Input variable definitions
backend.tf - S3 state backend configuration
terraform.tfvars.example - Example configuration values

Deployment Process

cd environments/ndx-production
terraform init
terraform plan
terraform apply

For first-time deployment with existing ISB-managed SCPs:

# Import existing SCPs into Terraform state
terraform import 'module.scp_manager.aws_organizations_policy.nuke_supported_services' p-xxxxxxxxx
terraform import 'module.scp_manager.aws_organizations_policy.restrictions' p-yyyyyyyyy

GitHub Actions CI/CD

The terraform.yaml workflow provides automated plan on PR and apply on merge. The production environment in repository settings should be configured with required reviewers for approval gates before terraform apply.

LZA Conflict Resolution

The LZA scpRevertChangesConfig.enable: true setting in security-config.yaml can revert Terraform-managed SCP changes. The PROPOSAL.md recommends setting this to false in the LZA configuration. The InnovationSandboxRestrictionsScp uses lifecycle { prevent_destroy = true } to prevent accidental Terraform destruction.


Comparison with LZA SCPs

Complementary Design

LZA SCPs focus on security and compliance: protecting CloudTrail, Config, GuardDuty, Security Hub, IAM roles, networking, and encryption. They are attached to Infrastructure, Security, and Workloads OUs.

Terraform SCPs focus on cost control and scenario enablement: service allowlists, region restrictions, compute/service cost limits, and Bedrock model restrictions. They are attached exclusively to the InnovationSandbox pool OU.

Both can coexist on sandbox accounts without conflict. LZA SCPs inherited through the organizational hierarchy provide the security baseline, while Terraform SCPs layered at the sandbox OU provide cost controls.


Testing

The repository includes Python-based tests in tests/:

  • test_dynamodb_enforcer.py - Tests for the DynamoDB billing enforcer Lambda
  • conftest.py - pytest fixtures
  • requirements.txt - Test dependencies

Additional documentation in docs/:

  • EVENTBRIDGE_EVENTS.md - Event schemas emitted by the enforcer
  • GITHUB_ACTIONS_SETUP.md - CI/CD configuration guide
  • SCP_CONSOLIDATION_ANALYSIS.md - Analysis of SCP consolidation options

Cost Protection Summary

Maximum Bounded Daily Cost (All Defenses Active)

CategoryProtection LayerMax Daily Cost
EC2 ComputeSCP (instance type limits)~$77
EBS StorageSCP (io1/io2 blocked, 500GB max)~$6
RDSSCP (instance class limits)~$22
ElastiCacheSCP (node type limits)~$40
LambdaBudget ($50/day)~$50
DynamoDBEnforcer (table deletion)~$0
BedrockBudget ($50/day) + model deny~$50
CloudWatchBudget (configurable)~$5+
GPU/Expensive ServicesSCP (blocked)$0
Total Bounded~$250/day

Attack Vector Coverage

VectorLayer 1 (SCP)Layer 2 (Budget)Layer 3 (Enforcer)
GPU InstancesBlocked$100/day EC2-
Large EC2Type limit$100/day EC2-
EBS io1/io2Blockedvia EC2 budget-
RDS Multi-AZConfigurable$30/day RDS-
Lambda MemoryNo SCP key$50/day Lambda-
DynamoDB On-DemandNo SCP key$50/day DynamoDBTable deletion
CloudWatch LogsNo SCP keyConfigurable-
SageMaker/EMR/RedshiftBlocked--


Source Files Referenced

File PathPurpose
repos/ndx-try-aws-scp/PROPOSAL.mdDesign intent and investigation findings
repos/ndx-try-aws-scp/README.mdModule documentation and deployment guide
repos/ndx-try-aws-scp/modules/scp-manager/main.tfSCP resource definitions (~720 lines)
repos/ndx-try-aws-scp/modules/scp-manager/variables.tfSCP configuration variables
repos/ndx-try-aws-scp/modules/budgets-manager/main.tfBudget resource definitions
repos/ndx-try-aws-scp/modules/dynamodb-billing-enforcer/main.tfLambda enforcer infrastructure
repos/ndx-try-aws-scp/modules/dynamodb-billing-enforcer/lambda/index.pyEnforcer Lambda code
repos/ndx-try-aws-scp/environments/ndx-production/main.tfProduction environment orchestration
repos/ndx-try-aws-scp/environments/ndx-production/variables.tfProduction variables
repos/ndx-try-aws-scp/environments/ndx-production/backend.tfS3 state backend
repos/ndx-try-aws-scp/environments/ndx-production/terraform.tfvars.exampleExample configuration
repos/ndx-try-aws-scp/tests/test_dynamodb_enforcer.pyEnforcer unit tests
repos/ndx-try-aws-scp/scripts/bootstrap-backend.shState backend bootstrap
repos/ndx-try-aws-scp/scripts/import-existing-scps.shSCP import automation
repos/ndx-try-aws-scp/docs/EVENTBRIDGE_EVENTS.mdEvent schemas
repos/ndx-try-aws-scp/docs/GITHUB_ACTIONS_SETUP.mdCI/CD setup
repos/ndx-try-aws-scp/docs/SCP_CONSOLIDATION_ANALYSIS.mdSCP analysis
repos/ndx-try-aws-scp/modules/ou-metrics-alarms/main.tfOU metrics alarm definitions
repos/ndx-try-aws-scp/modules/ou-metrics-alarms/variables.tfAlarm configuration variables
repos/ndx-try-aws-scp/modules/ou-metrics-alarms/outputs.tfAlarm ARN outputs

Generated from source analysis. See 00-repo-inventory.md for full inventory.