Skip to main content

Lease Lifecycle

Last Updated: 2026-03-02 Source: co-cddo/innovation-sandbox-on-aws Captured SHA: cf75b87

Executive Summary

A lease represents a user's temporary access to a sandboxed AWS account within the Innovation Sandbox ecosystem. Each lease passes through a well-defined state machine, from request through approval, active monitoring, and eventual termination with automated account cleanup. The lifecycle is orchestrated by the Leases Lambda (API-driven state changes), the Lease Monitoring Lambda (scheduled budget/duration checks), and the Account Lifecycle Manager Lambda (event-driven OU transitions and IDC assignments). Account cleanup is handled by a Step Functions state machine that invokes CodeBuild running AWS Nuke in a container.

Complete Lease State Machine

Lease States

StateSchemaAccount AllocatedOUMonitoringTerminal
PendingApprovalPendingLeaseSchemaNo--NoNo
ApprovalDeniedApprovalDeniedLeaseSchemaNo--NoYes
ActiveMonitoredLeaseSchemaYesActiveYesNo
FrozenMonitoredLeaseSchemaYesFrozenYesNo
ExpiredExpiredLeaseSchemaYes (cleanup queued)CleanUpNoYes
BudgetExceededExpiredLeaseSchemaYes (cleanup queued)CleanUpNoYes
ManuallyTerminatedExpiredLeaseSchemaYes (cleanup queued)CleanUpNoYes
AccountQuarantinedExpiredLeaseSchemaYes (quarantined)QuarantineNoYes
EjectedExpiredLeaseSchemaYes (ejected)ExitNoYes

Source: source/common/data/lease/lease.ts


Phase 1: Lease Request

Sequence Diagram

DynamoDB Writes

Auto-approved path:

  1. LeaseTable INSERT: New MonitoredLease with status Active, awsAccountId, startDate, expirationDate, approvedBy: "AUTO_APPROVED"
  2. SandboxAccountTable UPDATE: Account status from Available to Active, set lease association

Pending approval path:

  1. LeaseTable INSERT: New PendingLease with status PendingApproval
  2. No account table changes

Validation rules (from global AppConfig):

  • Template must exist and be active
  • User's concurrent active lease count < maxLeasesPerUser (default 3)
  • If auto-approved, at least one account must be in Available status
  • If userEmail differs from requester, requester must be Manager or Admin

Source: source/lambdas/api/leases/src/leases-handler.ts


Phase 2: Lease Approval

Sequence Diagram

Account Lifecycle Manager Actions on LeaseApproved

The Account Lifecycle Manager Lambda handles the physical account provisioning:

  1. Move account to Active OU: organizations:MoveAccount from Available OU to Active OU
  2. Grant IDC access: sso:CreateAccountAssignment with the user's permission set (User, Manager, or Admin PS)
  3. Update DynamoDB: Record the IDC assignment state on the account

The Write Protection SCP is removed when leaving the Available OU (it only applies to Available, CleanUp, Quarantine, Entry, and Exit OUs), enabling the user to create resources.

Source: source/lambdas/account-management/account-lifecycle-management/src/account-lifecycle-manager.ts


Phase 3: Active Lease Monitoring

Monitoring Schedule

The LeaseMonitoringLambda runs on a scheduled EventBridge rule and evaluates all Active and Frozen leases.

Alert-to-Action Mapping

Alert EventCurrent StateActionNew State
LeaseBudgetExceededActive/FrozenTerminate lease, queue cleanupBudgetExceeded
LeaseExpiredActive/FrozenTerminate lease, queue cleanupExpired
LeaseBudgetThresholdAlertActiveSend notification onlyActive (unchanged)
LeaseDurationThresholdAlertActiveSend notification onlyActive (unchanged)
LeaseFreezingThresholdAlertActiveFreeze accountFrozen

Threshold Configuration

Thresholds are defined per lease template:

Budget thresholds: [{ dollarsSpent: number, action: "ALERT" | "FREEZE_ACCOUNT" }]

  • ALERT: Publishes LeaseBudgetThresholdAlert (notification only)
  • FREEZE_ACCOUNT: Publishes LeaseFreezingThresholdAlert (triggers freeze)

Duration thresholds: [{ hoursRemaining: number, action: "ALERT" | "FREEZE_ACCOUNT" }]

  • Same action types as budget thresholds

Source: source/lambdas/account-management/lease-monitoring/src/lease-monitoring-handler.ts


Phase 4: Lease Freeze and Unfreeze

Freeze Flow

Unfreeze Flow

Freezing preserves existing resources but the Frozen OU may have additional restrictions. Unfreezing restores full access. Both operations require Manager or Admin role.

Source: source/common/events/lease-frozen-event.ts, lease-unfrozen-event.ts


Phase 5: Lease Termination and Cleanup

Termination Triggers

A lease enters a terminal state through three paths:

  1. Manual termination: POST /leases/{id}/terminate (Manager/Admin)
  2. Budget exceeded: Lease Monitoring detects totalCostAccrued > maxSpend
  3. Duration expired: Lease Monitoring detects now > expirationDate

Account Lifecycle Manager on Terminal Events

The Account Lifecycle Manager handles the tracked events LeaseBudgetExceeded, LeaseExpired, and processes the transition:

  1. Update lease record to terminal status (Expired / BudgetExceeded / ManuallyTerminated)
  2. Set endDate and ttl on the lease
  3. Revoke IDC access: sso:DeleteAccountAssignment
  4. Move account to CleanUp OU: organizations:MoveAccount
  5. Publish CleanAccountRequest event to trigger the cleanup Step Function

Account Cleaner Step Function

Key parameters (from Global AppConfig cleanup section):

  • numberOfSuccessfulAttemptsToFinishCleanup: Number of consecutive AWS Nuke successes required (default: 2)
  • waitBeforeRerunSuccessfulAttemptSeconds: Delay between successful runs (default: 30s)
  • numberOfFailedAttemptsToCancelCleanup: Max failures before quarantine (default: 3)
  • waitBeforeRetryFailedAttemptSeconds: Delay between failed retries (default: 5s)
  • Step Function total timeout: 12 hours
  • CodeBuild timeout: 60 minutes per run

AWS Nuke Execution

CodeBuild runs an AWS Nuke container that:

  1. Assumes the IntermediateRole in the hub account
  2. Then assumes the {namespace}_IsbCleanupRole in the target sandbox account
  3. Loads nuke config from AppConfig (with placeholder substitution)
  4. Deletes all resources except those in the blocklist/filters
  5. Returns exit code to Step Functions

Protected resources (from nuke-config.yaml):

  • CloudFormation StackSet instances (StackSet-Isb-*)
  • AWS Control Tower resources (trails, rules, roles, functions, logs)
  • SSO-related roles (AWSReservedSSO_*)
  • OrganizationAccountAccessRole
  • StackSet execution roles (stacksets-exec-*)
  • SAML providers (AWSSSO)
  • Config Service recorders/channels

Source: source/infrastructure/lib/components/account-cleaner/step-function.ts, cleanup-buildspec.yaml, source/infrastructure/lib/components/config/nuke-config.yaml


Phase 6: Post-Cleanup

On AccountCleanupSucceeded

The Account Lifecycle Manager:

  1. Moves account from CleanUp OU to Available OU
  2. Resets the account record in SandboxAccountTable (clears lease association, sets status to Available)
  3. Account is now ready for the next lease

On AccountCleanupFailed

The Account Lifecycle Manager:

  1. Moves account from CleanUp OU to Quarantine OU
  2. Updates account status to Quarantine
  3. Updates lease status to AccountQuarantined
  4. Publishes AccountQuarantined event
  5. Sends admin notification for manual review

Admin Recovery Options

  • Retry cleanup: POST /accounts/{id}/retryCleanup -- moves account back to CleanUp OU and re-triggers cleanup
  • Eject account: POST /accounts/{id}/eject -- moves account to Exit OU, removes from pool permanently

Account OU Transition Diagram

Each OU has specific SCPs applied:

  • Available, CleanUp, Quarantine, Entry, Exit: Write Protection SCP (blocks create/modify)
  • Active: Full access within allowed services and regions
  • Frozen: Full access but practically limited (no active user sessions)
  • All OUs: AWS Nuke Supported Services SCP, Restrictions SCP, Protect ISB SCP, Limit Regions SCP

EventBridge Event Routing

Event-to-Lambda Routing

EventRule TargetDeliveryConcurrency
LeaseApproved, LeaseBudgetExceeded, LeaseExpired, AccountCleanupSucceeded, AccountCleanupFailed, AccountDriftDetected, LeaseFreezingThresholdAlertAccount Lifecycle ManagerSQS -> LambdaReserved: 1
CleanAccountRequestAccount Cleaner Step FunctionDirect--
LeaseRequested, LeaseApproved, LeaseDenied, LeaseTerminated, LeaseFrozen, LeaseUnfrozen, alertsEmail Notification LambdaSQS -> Lambda--
All eventsCloudWatch LogsDirect--

The Account Lifecycle Manager uses reserved concurrency of 1 to ensure serialized processing of events, preventing race conditions in account OU transitions and DynamoDB updates.

Source: source/infrastructure/lib/components/events/isb-internal-core.ts, source/infrastructure/lib/components/account-management/account-lifecycle-management-lambda.ts


Error Handling and Recovery

SQS-based Retry Pattern

Events routed through SQS queues benefit from:

  • Visibility timeout: Prevents re-processing during Lambda execution
  • Max receive count: 3 retries before DLQ
  • Max event age: 4 hours for lifecycle events
  • DLQ: Dead letter queue for manual investigation

Step Function Error Handling

  • The InitializeCleanupLambda invoke has a catch-all that publishes AccountCleanupFailed
  • The CodeBuild step has a catch-all that increments the failure counter and retries
  • The entire state machine has a 12-hour timeout

Idempotency

  • The InitializeCleanupLambda checks if cleanup is already in progress (by querying the cleanupExecutionContext on the account record) and skips if so
  • Lease state transitions use DynamoDB conditional writes to prevent conflicting updates

DynamoDB Query Patterns

QueryMethodKey/IndexFilter
Get lease by IDQueryPK: userEmail, SK: uuid--
User's leasesQueryPK: userEmailOptional status filter
Leases by statusQueryGSI StatusIndex PK: status--
Available accountsScan--status = "Available"
Account by IDGetItemPK: awsAccountId--
Template by IDGetItemPK: uuid--

Note: The StatusIndex GSI on LeaseTable uses status as partition key and originalLeaseTemplateUuid as sort key, enabling efficient queries for all leases in a given state.



Generated from source analysis. See 00-repo-inventory.md for full inventory.