ISB CDDO Customizations
Last Updated: 2026-03-06 Source: co-cddo/innovation-sandbox-on-aws Captured SHA:
cf75b87
Executive Summary
The UK Government's Central Digital and Data Office (CDDO) operates a fork of the AWS Innovation Sandbox on AWS solution (v1.1.4, solution ID SO0284) to support cross-government cloud experimentation and digital skills training. The fork follows a non-invasive extension pattern: the upstream codebase remains completely unmodified, with all CDDO-specific functionality delivered through six satellite repositories that integrate via EventBridge events and the ISB REST API. This architecture provides a clean merge path for upstream upgrades while enabling rapid iteration on government-specific features.
Key findings:
- Zero core code divergence -- the fork at SHA
cf75b87has no changes to upstream source code - Seven satellite services -- Approver, Billing Separator, Costs, Deployer, Client library, OU Metrics, and Utils scripts
- Eight releases behind upstream -- v1.1.4 vs v1.2.1 (12 upstream commits)
- Event-driven integration -- satellites subscribe to and publish on the ISB EventBridge bus
- UK government adaptations --
@dsit.gov.ukemail domain,ndxnamespace,us-east-1/us-west-2regions, Slack-based approval workflow, 91-day billing quarantine
Customization Strategy
Non-Invasive Extension Architecture
Benefits of this approach:
- Clean merge path -- no conflicts when pulling upstream updates (zero code divergence confirmed by
git diff) - Independent lifecycles -- each satellite deploys and scales independently via its own CDK stack
- Modular enablement -- satellites can be disabled without affecting core ISB functionality
- Shared client library --
@co-cddo/isb-clientprovides authenticated ISB API access to all TypeScript satellites - Contribute back -- generic improvements can be submitted upstream as pull requests
Fork Status
Version Comparison
| Aspect | Upstream (aws-solutions) | CDDO Fork (co-cddo) |
|---|---|---|
| Current Version | v1.2.0 | v1.1.4 |
| Releases Behind | -- | 6 (v1.1.5, v1.1.6, v1.1.7, v1.1.8, v1.2.0) |
| Upstream Commits Ahead | 10 | -- |
| Files Changed | 451 | 0 (clean fork) |
| Lines Added (upstream) | +50,515 | -- |
| Lines Removed (upstream) | -18,274 | -- |
| Fork SHA | -- | cf75b87 |
| License | Apache 2.0 | Apache 2.0 (unchanged) |
Missing Upstream Features
Since the CDDO fork was taken at v1.1.4, the following upstream releases have been made:
| Version | Key Changes |
|---|---|
| v1.1.5 | Security patch: qs library vulnerability fix |
| v1.1.6 | Security patches: @remix-run/router, glib2, libcap, python3 |
| v1.1.7 | AWS Nuke upgrade to v3.63.2 (fixes SCP-protected log group deletion) |
| v1.1.8 | Undisclosed changes (merge commit only) |
| v1.2.0 | Major release -- likely includes blueprints feature and significant refactoring (451 files changed, 50K+ insertions) |
Upgrade recommendation: Upgrade to at least v1.1.7 for security patches and Nuke fixes. Evaluate v1.2.0 carefully as the large changeset (50K+ lines) may introduce breaking changes requiring satellite service updates.
CDDO Satellite Services
1. Approver (innovation-sandbox-on-aws-approver)
Repository: co-cddo/innovation-sandbox-on-aws-approver Technology: TypeScript, CDK, Node.js 20+, esbuild, Vitest Purpose: Automated lease approval with 19-rule scoring engine, AI-enhanced risk assessment, business hours enforcement, and Slack-based manual review workflow.
Architecture:
The approver replaces ISB's built-in manual approval flow. ISB routes all lease requests with requiresManualApproval=true to EventBridge; the approver listens for LeaseRequested events and makes autonomous approval decisions via a state machine.
State Machine:
RECEIVED -> VALIDATING -> TIMING_CHECK -> ACCOUNT_AVAILABILITY_CHECK -> SCORING -> DECIDING -> [APPROVED | DENIED | ESCALATED | DELAYED | ERROR]
Terminal states:
- APPROVED -- score below threshold (default: 20), auto-approved via ISB API
- DENIED -- high-risk request, auto-denied
- ESCALATED -- borderline score, sent to Slack for manual review
- DELAYED -- outside business hours or no accounts available, queued via SQS
Scoring Engine (19 rules):
| Rule | Weight | Type | Description |
|---|---|---|---|
allow_list_override | -100 | Bonus | Guarantees approval for allow-listed users |
verified_gov_domain | -5 | Bonus | Domain in ukps-domains allowlist |
familiar_template | -1 | Bonus | Previously used template successfully |
manual_early_termination | -2 | Bonus | Responsible early termination history |
org_clean_record | -2 | Bonus | Domain has 5+ leases with zero negatives in 90d |
expired_leases | +2 | Penalty | Per expired lease in last 30 days |
budget_exceeded | +5 | Penalty | Per budget exceeded lease in last 30 days |
first_time_user | +5 | Penalty | No previous lease history |
first_time_user_group_mailbox_compound | +20 | Penalty | First lease AND group mailbox detected (AI) |
cooldown_violation | +10 | Penalty | Request within 1 hour of previous lease |
outside_target_audience | +50 | Penalty | Domain NOT in local authority allowlist |
group_mailbox_detected | +20 | Penalty | AI detected group email pattern (Bedrock) |
org_recent_negative | +3 | Penalty | Same domain had negative outcomes in 30d |
template_hopper | +2 | Penalty | 3+ leases never repeating a template |
end_of_window | +2 | Penalty | Request in final 2 hours (5-7pm London) |
budget_amount | +1/unit | Per-unit | +1 per $10 of budget requested |
duration_requested | +1/unit | Per-unit | +1 per 8 hours of duration |
user_rate_limit | +5/excess | Rate limit | Per request beyond 2/hour |
org_rate_limit | +3 | Rate limit | Triggered if 5+ org users submit in 1 hour |
Auto-approve threshold: Score must be strictly less than 20 (configurable via AUTO_APPROVE_THRESHOLD env var).
Infrastructure (CDK):
- DynamoDB:
ApproverIdempotencytable (idempotent processing),ApproverQueuePositiontable (FIFO queue when no accounts available) - S3: Domain allowlist bucket (populated from ukps-domains)
- SQS: Delay queue with DLQ (out-of-hours/no-account requests)
- SNS:
isb-approval-notificationstopic for Slack integration - Amazon Bedrock: AI-based group mailbox detection
- AWS Chatbot: Slack channel configuration with Approve/Deny custom actions
- EventBridge Scheduler: 30-minute queue check for pending requests
- CloudWatch: Error rate, latency, DLQ depth, and Slack action alarms
Slack Integration:
- Notifications sent via SNS to Amazon Q Developer (Chatbot) Slack channel
- Custom actions:
isb-approveandisb-denybuttons invoke separate Lambda functions SlackApproveLambdaandSlackDenyLambdacall the ISB API/leases/{id}/reviewendpoint- CloudWatch dashboard:
ISB-Approver-Slack-Actions
Source: /Users/CNesbittSmith/httpdocs/ndx-try-arch/repos/innovation-sandbox-on-aws-approver/
2. Billing Separator (innovation-sandbox-on-aws-billing-seperator)
Repository: co-cddo/innovation-sandbox-on-aws-billing-seperator Technology: TypeScript, CDK, Node.js 22, Jest Purpose: 91-day quarantine of sandbox accounts after cleanup to prevent billing data attribution leakage between successive leaseholders.
Problem: When a sandbox account is recycled, AWS Cost and Usage Report (CUR) data for the previous leaseholder may still be accumulating. If the account is immediately reassigned, the new leaseholder's billing view would include residual charges from the previous tenant.
Architecture:
The billing separator deploys across two stacks:
- OrgMgmtStack (Org Management account): EventBridge rule forwarding CloudTrail
MoveAccountevents to the Hub account's custom event bus - HubStack (Hub account): Event bus, SQS queue, two Lambda functions, EventBridge Scheduler group
Flow:
Key Constants (from source/lambdas/shared/constants.ts):
QUARANTINE_DURATION_HOURS: 2,184 (91 days)SCHEDULER_GROUP:isb-billing-separatorBYPASS_QUARANTINE_TAG_KEY:do-not-separate(AWS Organizations tag to skip quarantine)MAX_SQS_RECORDS_PER_BATCH: 10
Features:
- Idempotent processing (skips accounts already in Quarantine status)
- Bypass tag (
do-not-separate) for emergency account recycling - Cross-account role chain using ISB's own
fromTemporaryIsbOrgManagementCredentialsutility - SQS partial batch response pattern for reliable event processing
- CloudWatch alarms for DLQ depth, Lambda errors, and EventBridge rule delivery failures
- ISB Commons dependency via git submodule (
deps/isb/source/common)
Source: /Users/CNesbittSmith/httpdocs/ndx-try-arch/repos/innovation-sandbox-on-aws-billing-seperator/
3. Costs (innovation-sandbox-on-aws-costs)
Repository: co-cddo/innovation-sandbox-on-aws-costs Technology: TypeScript, CDK, Node.js, Vitest, Zod (v4), X-Ray tracing Purpose: Automated cost collection for terminated leases with delayed execution, CSV reporting, and EventBridge event emission.
Architecture:
Two Lambda functions orchestrated by EventBridge Scheduler:
-
Scheduler Handler -- triggered by
LeaseTerminatedevents, creates a one-shot EventBridge Schedule that fires after a configurable delay (default 8 hours viaBILLING_PADDING_HOURS) to allow billing data to settle -
Cost Collector Handler -- triggered by the scheduled event, performs the full cost collection pipeline:
- Fetches lease details from ISB API (via
@co-cddo/isb-client) - Assumes role in orgManagement account for Cost Explorer access
- Calculates billing window with configurable padding
- Queries Cost Explorer with pagination
- Generates CSV report
- Uploads to S3 with SHA-256 checksum integrity verification
- Generates presigned URL (7-day expiry, configurable via
PRESIGNED_URL_EXPIRY_DAYS) - Emits
LeaseCostsGeneratedevent to EventBridge - Publishes CloudWatch business metrics (TotalCost, ResourceCount, ProcessingDuration)
- Deletes the scheduler schedule (best-effort, auto-delete also configured)
- Fetches lease details from ISB API (via
-
Cleanup Handler -- handles orphaned schedule cleanup
CDK Infrastructure:
CostCollectionStack(Hub account): S3 bucket, Lambda functions, EventBridge rules, Scheduler groupCostExplorerRoleStack(OrgMgmt account): IAM role for cross-account Cost Explorer access
Source: /Users/CNesbittSmith/httpdocs/ndx-try-arch/repos/innovation-sandbox-on-aws-costs/
4. Deployer (innovation-sandbox-on-aws-deployer)
Repository: co-cddo/innovation-sandbox-on-aws-deployer Technology: TypeScript, CDK, Node.js 22, esbuild, Vitest Purpose: Automatically deploy CloudFormation templates and CDK applications to sandbox sub-accounts when leases are approved.
Deployment Flow:
- Event parsing -- validates incoming
LeaseApprovedEventBridge event - Lease lookup -- fetches lease details from DynamoDB to get
accountIdandtemplateName - Template handling -- detects scenario type (CDK vs CloudFormation), fetches from GitHub
- Template validation -- validates CloudFormation template structure
- Role assumption -- assumes role in target sub-account via STS
- CDK bootstrap -- ensures target account has CDKToolkit stack (CDK scenarios only)
- Stack deployment -- creates/updates CloudFormation stack with parameters mapped from lease data
- Event emission -- publishes
DeploymentSucceededorDeploymentFailedto EventBridge
CDK Scenario Support:
- Auto-detects CDK projects via
cdk.jsonpresence - Sparse clones scenario from GitHub (only needed files)
- Installs dependencies securely (
npm ci --ignore-scripts) - Synthesizes CDK to CloudFormation (
cdk synth)
CDK Infrastructure:
DeployerStack(Hub account): Lambda function, EventBridge rules, Secrets Manager integrationGithubOidcStack: OIDC provider for GitHub Actions CI/CD
Environment Variables:
| Variable | Purpose |
|---|---|
GITHUB_REPO | Scenario repository (e.g., co-cddo/ndx_try_aws_scenarios) |
GITHUB_BRANCH | Branch to fetch from (e.g., main) |
TARGET_ROLE_NAME | Assumable role in sub-accounts (e.g., ndx_IsbUsersPS) |
DEPLOY_REGION | Target deployment region |
Source: /Users/CNesbittSmith/httpdocs/ndx-try-arch/repos/innovation-sandbox-on-aws-deployer/
5. ISB Client (@co-cddo/isb-client)
Repository: co-cddo/innovation-sandbox-on-aws-client Technology: TypeScript, Node.js 20+, Jest, Yarn 4 Version: 2.0.1 (distributed as tarball via GitHub Releases) Purpose: Shared authenticated API client for satellite services to interact with the ISB REST API.
Features:
- JWT token signing using HS256 with secret from Secrets Manager
- Automatic token caching with 60-second pre-expiry refresh
- Secret cache invalidation on 401/403 responses (handles secret rotation)
- JSend response format parsing
- Paginated list endpoint support
- Read operations (
fetchLease,fetchLeaseByKey,fetchAccount,fetchTemplate,fetchAllAccounts) - Write operations (
reviewLease,registerAccount) - Graceful degradation -- returns
nullon 404, 5xx, or network errors for read operations - Configurable timeout (default 5 seconds)
- Correlation ID propagation via
X-Correlation-Idheader
Usage by satellites:
| Satellite | ISB Client Version | Operations Used |
|---|---|---|
| Approver | v2.0.1 | fetchLease, fetchAccount, fetchAllAccounts, reviewLease |
| Costs | v2.0.0 | fetchLease (via getLeaseDetails) |
| Deployer | v2.0.0 | fetchLease (via lookupLease) |
Source: /Users/CNesbittSmith/httpdocs/ndx-try-arch/repos/innovation-sandbox-on-aws-client/
6. Utils (innovation-sandbox-on-aws-utils)
Repository: co-cddo/innovation-sandbox-on-aws-utils Technology: Python 3, boto3 Purpose: CLI scripts for manual operational tasks that complement the ISB web interface.
Scripts:
| Script | Purpose |
|---|---|
create_sandbox_pool_account.py | Create and register new pool accounts (Organizations + ISB API) |
create_user.py | Create Identity Center users and add to ndx_IsbUsersGroup |
assign_lease.py | Assign leases via API, optionally configure local SSO profiles |
terminate_lease.py | Terminate all active leases for a user |
force_release_account.py | Force-release stuck accounts |
clean_console_state.py | Clean AWS Console state (recently visited services, favorites, theme) from recycled accounts via undocumented CCS API |
Key Constants (from create_sandbox_pool_account.py):
| Constant | Value |
|---|---|
ENTRY_OU | ou-2laj-2by9v0sr |
SANDBOX_READY_OU | ou-2laj-oihxgbtr |
BILLING_VIEW_ARN | arn:aws:billing::955063685555:billingview/custom-466e2613-e09b-4787-a93a-736f0fb1564b |
| Account email pattern | ndx-try-provider+gds-ndx-try-aws-pool-NNN@dsit.gov.uk |
| Account name pattern | pool-NNN |
Authentication: All scripts use AWS SSO profiles (NDX/orgManagement, NDX/InnovationSandboxHub) and generate HS256 JWT tokens using the ISB signing secret from Secrets Manager.
Source: /Users/CNesbittSmith/httpdocs/ndx-try-arch/repos/innovation-sandbox-on-aws-utils/
Configuration Customizations
Global Configuration (global-config.yaml)
The ISB core is configured via AWS AppConfig profiles. Key CDDO-specific settings:
maintenanceMode: true # Controlled rollout
leases:
requireMaxBudget: true
maxBudget: 50 # USD (upstream default: 5000)
requireMaxDuration: true
maxDurationHours: 168 # 7 days
maxLeasesPerUser: 3
ttl: 30 # Days
cleanup:
numberOfFailedAttemptsToCancelCleanup: 3
waitBeforeRetryFailedAttemptSeconds: 5
numberOfSuccessfulAttemptsToFinishCleanup: 2
waitBeforeRerunSuccessfulAttemptSeconds: 30
| Setting | CDDO | Upstream Default | Rationale |
|---|---|---|---|
maintenanceMode | true | false | Controlled rollout for government users |
maxBudget | $50 | $5,000 | Cost control for training/experimentation |
maxDurationHours | 168 | 168 | Same (7 days) |
maxLeasesPerUser | 3 | 3 | Same |
Nuke Configuration (nuke-config.yaml)
Protected resources for AWS Nuke account cleanup:
# Protected from deletion
CloudFormationStack:
- type: glob
value: StackSet-Isb-* # ISB-managed StackSet stacks
IAMRole:
- type: exact
value: OrganizationAccountAccessRole
- type: glob
value: AWSReservedSSO_* # SSO-provisioned roles
- type: contains
value: AWSControlTower # Control Tower roles
# Excluded resource types
S3Object # Bucket deletion handles objects
ConfigServiceConfigurationRecorder # Preserved for audit
ConfigServiceDeliveryChannel # Preserved for audit
These protections are consistent with the upstream defaults and ensure that ISB infrastructure, SSO roles, and Control Tower configurations survive account recycling.
Deployment Configuration
| Parameter | CDDO Value | Upstream Default |
|---|---|---|
NAMESPACE | ndx | myisb |
HUB_ACCOUNT_ID | 955063685555 | Configurable |
AWS_REGIONS | us-east-1, us-west-2 | Configurable |
ADMIN_GROUP_NAME | ndx_IsbAdmins | Configurable |
MANAGER_GROUP_NAME | ndx_IsbManagers | Configurable |
USER_GROUP_NAME | ndx_IsbUsers | Configurable |
The ndx namespace prefixes all CloudFormation stack names, IAM roles, and resource identifiers, ensuring isolation from any other ISB deployments in the same AWS Organization.
UK Government Adaptations
Email Domain
All account and user emails use the @dsit.gov.uk domain (Department for Science, Innovation and Technology):
- Pool account emails:
ndx-try-provider+gds-ndx-try-aws-pool-NNN@dsit.gov.uk - User emails:
{name}@dsit.gov.uk
Region Strategy
ISB is deployed exclusively in us-east-1 and us-west-2:
- us-east-1: Required for AWS Organizations API, IAM Identity Center
- us-west-2: Primary compute region for Lambda, DynamoDB, API Gateway, CloudFront
- Production UK government workloads would typically use
eu-west-2(London); sandbox accounts use US regions for cost optimization and broader service availability
Domain-Based Access Control
The approver service integrates with the ukps-domains dataset -- a curated list of UK public sector email domains. This enables:
- Automatic
-5scoring bonus for verified government domains - Automatic
+50scoring penalty for non-local-authority domains - AI-enhanced group mailbox detection via Amazon Bedrock
Business Hours Enforcement
The approver enforces UK business hours (London timezone):
- Requests outside business hours are delayed to the next processing window via SQS
- Requests in the final 2 hours (5-7pm London) receive a
+2scoring penalty - A 30-minute EventBridge Scheduler polls the queue for delayed requests
Slack-Based Operations
Manual lease approval/denial is handled via Slack rather than the ISB web console:
- Amazon Q Developer (Chatbot) integration with configurable Slack workspace and channel
- Custom actions with "Approve" and "Deny" buttons on notification messages
- Dedicated Lambda functions for each action (
ApproverSlackApprove,ApproverSlackDeny) - CloudWatch dashboard for Slack action monitoring
Billing Data Isolation
The billing separator addresses a UK government requirement for clean billing attribution between successive sandbox leaseholders:
- 91-day quarantine period ensures all residual CUR data has been finalized
- Bypass mechanism via
do-not-separateAWS Organizations tag for emergency recycling - Custom billing view (ARN:
arn:aws:billing::955063685555:billingview/custom-...) aggregates costs across all pool accounts
Console State Cleanup
The clean_console_state.py utility addresses an upstream limitation: AWS Nuke cannot clean AWS Management Console preferences (recently visited services, favorites, theme) because they are stored in the Console Control Service (CCS) -- an undocumented internal AWS service outside the account resource plane. The script calls the CCS UpdateCallerSettings and DeleteCallerDashboard APIs directly.
Integration Architecture
Event Flow Summary
Cross-Service Dependencies
| Satellite | Depends On | ISB API Endpoints Used | EventBridge Events Consumed | EventBridge Events Produced |
|---|---|---|---|---|
| Approver | ISB Client v2.0.1, Bedrock, ukps-domains S3 | GET /leases/{id}, GET /accounts, POST /leases/{id}/review | LeaseRequested, AccountCleanupSucceeded | -- |
| Billing Separator | ISB Commons (git submodule) | -- (uses DynamoDB directly) | CloudTrail MoveAccount | -- |
| Costs | ISB Client v2.0.0 | GET /leases/{id} | LeaseTerminated (via Scheduler) | LeaseCostsGenerated |
| Deployer | ISB Client v2.0.0, GitHub API | GET /leases/{id} (via DynamoDB direct) | LeaseApproved | DeploymentSucceeded, DeploymentFailed |
| Utils | ISB API (direct HTTP) | POST /accounts, GET /leaseTemplates, POST /leases, POST /leases/{id}/terminate | -- | -- |
Comparison Summary
| Aspect | Upstream AWS Solution | CDDO Fork + Satellites |
|---|---|---|
| Version | v1.2.0 | v1.1.4 (core) |
| Core Code Changes | N/A | None (clean fork) |
| Extension Services | None | 6 satellites |
| Lease Approval | Manual (web UI) | Automated (19-rule scoring + Slack escalation) |
| Cost Tracking | None | Automated CSV reports with presigned URLs |
| Scenario Deployment | Manual | Automated (CFn + CDK from GitHub) |
| Billing Isolation | None | 91-day quarantine |
| API Client | None | Shared @co-cddo/isb-client library |
| Admin Tooling | Web UI only | Web UI + 6 Python CLI scripts |
| Console Cleanup | aws-nuke only | aws-nuke + CCS API cleanup |
| Max Budget | $5,000 | $50 |
| Namespace | myisb | ndx |
| Email Domain | Configurable | @dsit.gov.uk |
| Regions | Configurable | us-east-1, us-west-2 only |
| AI Integration | None | Amazon Bedrock (email pattern analysis) |
| Chat Integration | None | Slack via Amazon Q Developer |
Upgrade Path
Recommended Strategy
The clean fork pattern means upgrading is straightforward:
- Fetch upstream changes:
git fetch upstream && git merge upstream/main - Resolve conflicts: Expect conflicts only in configuration files (
.env,global-config.yaml) -- use CDDO values - Test:
npm ci && npm run build && npm test - Validate satellite compatibility: Verify ISB API response schemas and EventBridge event schemas have not changed
- Deploy: Enable maintenance mode, deploy upgraded stacks, smoke test, disable maintenance mode
Risk assessment: Low for v1.1.5-v1.1.8 (security patches). Medium for v1.2.0 (major release with 50K+ line changes that may alter event schemas or API contracts used by satellites).
Estimated effort: 2-4 hours for v1.1.7, 1-2 days for v1.2.0 (including satellite service validation).
References
- ISB Core Architecture -- CDK stacks, Lambda catalog, API endpoints, EventBridge events
- Lease Lifecycle -- State machine, account OU transitions, cleanup workflow
- ISB Frontend -- React UI, authentication flow, CloudFront hosting
- Upstream Repository
- CDDO Fork
- ISB Client
- Approver
- Billing Separator
- Costs
- Deployer
- Utils