Report
Cost & Sustainability
How does your organization allocate capacity for production workloads in the cloud?
Peak Provisioning: Capacity is typically provisioned based on peak usage estimates, potentially leading to underutilization during off-peak times.
How to determine if this good enough?
TODO
How do I do better?
TODO
Manual Scaling Based on Average Consumption: Capacity is provisioned for average usage, with manual scaling adjustments made seasonally or as needed.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Autoscaling for Certain Components: Autoscaling is enabled for some cloud components, primarily based on simple capacity or utilization metrics.
How to determine if this good enough?
TODO
How do I do better?
TODO
Widespread Autoscaling with Basic Metrics: Autoscaling is a common practice, although it mainly utilizes basic metrics, with limited use of log or application-specific metrics.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Autoscaling Using Detailed Metrics: Autoscaling is ubiquitously used, based on sophisticated log or application metrics, allowing for highly responsive and efficient capacity allocation.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization approach the use of compute services in the cloud?
Long-Running Homogeneous VMs: Workloads are consistently deployed on long-running, homogeneously sized Virtual Machines (VMs), without variation or optimization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Primarily Long-Running VMs with Limited Experimentation: Most workloads are on long-running VMs, with some limited experimentation in containers or function-based services for non-critical tasks.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mixed Use with Some Advanced Compute Options: Some production workloads are run in containers or function-based compute services. Ad-hoc use of short-lived VMs is practiced, with efforts to right-size based on workload needs.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Use of Short-Lived VMs and Containers: There is regular use of short-lived VMs and containers, along with some function-based compute services. This indicates a move towards more flexible and scalable compute options.
How to determine if this good enough?
TODO
How do I do better?
TODO
‘Fit for Purpose’ Approach with Rigorous Right-Sizing: Cloud services selection is driven by a strict ‘fit for purpose’ approach. This includes a rigorous continual right-sizing process and a solution evaluation hierarchy favoring SaaS > FaaS > Containers as a Service > Platform/Orchestrator as a Service > Infrastructure as a Service.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization plan, measure, and optimize the environmental sustainability and carbon footprint of its cloud compute resources?
Basic Vendor Reliance: Sustainability isn’t actively measured internally; reliance is placed on cloud vendors who are contractually obligated to work towards carbon neutrality, likely through offsetting.
How to determine if this good enough?
TODO
How do I do better?
TODO
Initial Awareness and Basic Policies: Some basic policies and goals for sustainability are set. Efforts are primarily focused on awareness and selecting vendors with better environmental records.
How to determine if this good enough?
TODO
How do I do better?
TODO
Active Measurement and Target Setting: The organization actively measures its cloud compute carbon footprint and sets specific targets for reduction. This includes choosing cloud services based on their sustainability metrics.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Sustainability Practices: Sustainability is integrated into cloud resource planning and usage. This includes regular monitoring and reporting on sustainability metrics and making adjustments to improve environmental impact.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Optimization and Dynamic Management: Advanced strategies are in place, like automatic time and location shifting of workloads to minimize impact. Data retention and cloud product selection are deeply aligned with sustainability goals and carbon footprint metrics.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What approaches does your organization use to plan, measure, and optimize cloud spending?
Restricted Billing Visibility: Billing details are only accessible to management and finance teams, with limited transparency across the organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Proactive Spend Commitment by Finance: The finance team uses billing information to make informed decisions about pre-committed cloud spending where it’s deemed beneficial.
How to determine if this good enough?
TODO
How do I do better?
TODO
Cost-Effective Resource Management: Cloud environments and applications are configured for cost-efficiency, such as automatically shutting down or scaling down non-production environments during off-hours.
How to determine if this good enough?
TODO
How do I do better?
TODO
Cost-Aware Development Practices: Developers and engineers have daily visibility into cloud costs and are encouraged to consider the financial impact of their choices in the development phase.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive Cost Management and Optimization: Multi-tier spend alerts are configured to notify various levels of the business for immediate action. Developers and engineers regularly review and prioritize changes to improve cost-effectiveness significantly.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What strategies guide your decisions on geographical distribution and operational management of cloud workloads and data storage?
Single Zone, Constant Operation: All data and workloads are confined to a single availability zone within an approved region, with workloads typically running continuously.
How to determine if this good enough?
Many cloud vendors do not offer a Service Level Agreement (SLA) for compute and storage services when workloads exist only in a single availability zone.
If a loss of availability or loss of data for your services are important then then considering at least an Intra-Region distribution strategy is important.
If you are consuming services that are server based such as virtual machines or database servers where you are paying based on the time spent, there are likely times where you might not require them to be constantly available, for example non production environments may not always be necessary out of working hours and weekends.
How do I do better?
A relative quick win for the resilience concern may be to create a regular backup process that stores a snapshot of your data at regular intervals in to another availability zone; this will at least provide an ability to manually recover data to the point of the last backup in the event of a failure.
For the cost concern, if you were to schedule your non production environments to shutdown at 19:00 and start backup at 7:00 on weekdays, savings up to 68% of your non production workload may be achievable, and you will be continually restarting these environments which will test their resilience and surface any potential startup issues close to where the issue was introduced.
Backup based solutions to consider:
- AWS Backup
- Azure Backup
- GCP Backup and DR Service
- OCI Backup
- Also to consider:
While a solid backup solution is a great way to lay down the foundations, by developing a Recovery Point Objective (RPO) it does not inherently provide a Recovery Time Objective this is the time it takes you restore availability to the service; which could be quite a manual process, especially if you’ve not deployed the services recently or don’t have access to who did.
Intra-region distribution
You can reduce your RPO by replicating data across multiple availability zones within a single region, there are various ways to achieve this such as using a distributed file system. In this way your data is replicated in realtime between availability zones. This is a rather blunt solution and not recommended if you are running a database for cost and performance reasons.
Distributed File System solutions to consider:
If your workload includes a database then migrating to a managed database solution could provide more resilient service with a lower operational burden, and will be far more performant than running a database on top of a distributed file system.
You may have some configuration options, and features in your self managed database that are not compatible with a managed database solution such as stored procedures or low level performance tweaks, you should be confident in the benefits of maintaining a self managed solution over a managed solution in relation to the operational responsibilities you’re retaining.
Managed database solutions to consider:
Similarly other components in your application may be possible to migrate to a managed service including message queues, file conversion, video processing, load balancing and so on.
Scheduling workloads
Many cloud vendors provide capability to schedule your workloads to startup and shutdown, you will still pay for the storage costs, and some other baseline, costs, so generally don’t expect this to reduce your cost to zero when things are turned off, but you should still be able to enjoy a noticeable benefit from this.
There may be scenarios where this needs to be overridden, for example during an incident, the operations team may find it beneficial to test their proposed fixes on a non-production instance before applying to production, so you will need to add to the operational runbook/playbook guidance on how to override this.
Intra-Region Distribution: Workloads and data are spread across multiple availability zones within a single region to enhance availability and resilience.
How to determine if this good enough?
TODO
How do I do better?
TODO
Selective Multi-Region Utilization: An additional, legally compliant non-UK region is used for specific purposes, such as non-production workloads, certain data types, or as part of disaster recovery planning.
How to determine if this good enough?
TODO
How do I do better?
TODO
Capability and Sustainability-Driven Selection: Regions are chosen based solely on their technical capabilities, cost-effectiveness, and environmental sustainability credentials, without any specific technical constraints.
How to determine if this good enough?
TODO
How do I do better?
TODO
Dynamic and Cost-Sustainable Distribution: Workloads are dynamically allocated across various regions and availability zones, with scheduling optimized for cost-efficiency and sustainability, adapting in real-time to changing conditions.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
Data
How does your organization identify, classify, and manage its data storage and usage?
Decentralized and Ad Hoc Management: Data management is largely uncoordinated and informal, with limited organizational oversight of data storage locations and types.
How to determine if this good enough?
TODO
How do I do better?
TODO
Team-Based Documentation and Manual Policy Adherence: Each team documents the data they handle, including its schema and sensitivity. Compliance with organizational data policies is managed manually by individual teams.
How to determine if this good enough?
TODO
How do I do better?
TODO
Inventoried and Classified Data: An inventory of data, created manually or via scanning tools, exists. Data is classified by type (e.g., PII, card data), sensitivity, and regulatory requirements (e.g., retention, location).
How to determine if this good enough?
TODO
How do I do better?
TODO
Reviewed and Partially Documented Data Understanding: There’s a comprehensive understanding of data location, classification, and sensitivity, with regular compliance reviews. Data lineage is generally understood but not consistently documented.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Data Catalog and Lineage Tracking: A detailed data catalog exists, encompassing data types and metadata. It includes a user-friendly glossary, quality metrics, use cases, and thorough tracking of data lineage.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What is your approach to managing data retention within your organization?
Organization-Level Policy Awareness: Data retention policies are defined at the organization level, and all projects/programs are aware of their specific responsibilities.
How to determine if this good enough?
TODO
How do I do better?
TODO
Compliance Attestation by Projects: Projects and programs are not only aware but also required to formally attest their compliance with the data retention policies.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Audits and Reviews: Data retention practices are periodically audited and reviewed for compliance, with findings addressed through action plans.
How to determine if this good enough?
TODO
How do I do better?
TODO
Inclusion in Risk Management: Edge cases and exceptions in data retention are specifically identified and managed within the organization’s risk register.
How to determine if this good enough?
TODO
How do I do better?
TODO
Automated Enforcement with Cloud Tools: Data retention is actively monitored and enforced using native cloud services and tools, ensuring adherence to policies through automation.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
Governance
How does the shared responsibility model influence your organization's approach to cloud consumption?
Minimal Consideration of Shared Responsibility: The shared responsibility model is not a primary factor in cloud consumption decisions, often leading to misunderstandings or gaps in responsibility.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Awareness of Shared Responsibilities: There is a basic understanding of the model, but it’s not systematically applied or deeply understood across the organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Informed Decision-Making Based on Shared Responsibilities: Decisions regarding cloud consumption are informed by the shared responsibility model, ensuring a clearer understanding of the division of responsibilities.
How to determine if this good enough?
TODO
How do I do better?
TODO
Strategic Integration of Shared Responsibility in Cloud Planning: The shared responsibility model is strategically integrated into cloud consumption planning, with regular assessments to ensure responsibilities are well-managed. Decisions to retain responsibilities in house are documented and shared with the cloud vendor.
How to determine if this good enough?
TODO
How do I do better?
TODO
Critical Factor in Cloud Consumption and Value Assessment: The shared responsibility model is central to all decision-making regarding cloud consumption. It’s regularly revisited to assess value for money and optimize the division of responsibilities with the cloud vendor.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization handle the creation and storage of build artifacts?
Ad-Hoc or Non-Existent Artifact Management: Build artifacts are not systematically managed; code and configurations are often edited live on servers.
How to determine if this good enough?
TODO
How do I do better?
TODO
Environment-Specific Rebuilds: Artifacts are rebuilt in each environment, leading to potential inconsistencies and inefficiencies.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Artifact Storage with Version Control: Build artifacts are stored, possibly with version control, but without strong emphasis on immutability or security measures.
How to determine if this good enough?
TODO
How do I do better?
TODO
Pinned Dependencies with Cryptographic Verification: All dependencies in build artifacts are tightly pinned to specific versions, with cryptographic signing or hashes to ensure integrity.
How to determine if this good enough?
TODO
How do I do better?
TODO
Immutable, Signed Artifacts with Audit-Ready Storage: Immutable build artifacts are created and cryptographically signed, especially for production. All artifacts are stored in immutable storage for a defined period for audit purposes, with a clear process to recreate environments for thorough audits or criminal investigations.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization manage and update access policies and controls, and how are these changes communicated?
Ad-Hoc Policy Management and Inconsistent Application: Policies are not formally defined; decisions are based on individual opinion or past experience. Policies are not published, access controls are inconsistently applied, and exemptions are often granted without follow-up.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Policy Documentation with Some Communication: Access policies are documented, but updates and their communication are irregular. There is a lack of a systematic approach to applying and communicating policy changes.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Policy Reviews with Formal Communication Processes: Policies are regularly reviewed and updated, with formal processes for communicating changes to relevant stakeholders, though the process may not be fully transparent or collaborative.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Policy Management with Stakeholder Engagement: Policy updates are managed through an integrated process involving key stakeholders. Changes are communicated effectively, ensuring clear understanding across the organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Policy as Code with Transparent, Collaborative Updates: Policy intent and implementation are maintained in version control, accessible to all. The process for proposing updates is clear and well-understood, allowing for regular, transparent updates akin to software releases. Policies have testable side effects, ensuring clarity and comprehension.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization manage its cloud environment?
Manual Click-Ops as Required: Cloud management is performed manually as and when needed, without any systematic approach or automation.
How to determine if this good enough?
Cloud vendors have heavily invested in graphical user interfaces that make their services rapidly consumable with minimal upfront investment in up skilling, allowing for fast experimentation and exploration.
While this is an enormous benefit and accelerator on a micro level for individual workloads, the aggregate effect over time will inevitably lead to issues with misconfigurations which may manifest themselves in:
- Security breaches, exposing sensitive data
- Unexpected cost overruns
- Over provisioned resources
- Lack of accountability and traceability
- Apprehension to make changes due to complexity and unknown side effects
- Inconsistencies between multiple ‘environments’ such as dev, test, staging and production
- Over dependence on key individuals
- Employee Dissatisfaction and reduced productivity due to a disproportionate amount of time spent addressing incidents arising to misconfigurations
How do I do better?
Runbooks and Playbooks
You should start developing Runbooks and Playbooks to help you manage your cloud environment, these will be an essential first step towards identifying processes that can be later totally automated, so will have long term benefits in helping you on your cloud journey.
- AWS Runbooks and AWS Playbooks
- TODO oci links
- TODO azure links
- TODO gcp links
- TODO aws links
These documents must be readily accessible whilst not include any sensitive information such as passwords.
It is imperative to ruthlessly maintain the documentation with every change made, if deviations are even suspected the documentation will inevitably find itself immediately dismissed as irrelevant and unreliable; and divergences will only grow.
Change Logs and Audit Logs
Most cloud service providers provide a means of recording some audibility of changes that have been applied capturing the who, when, what, and how of the change, however by their nature they do not provide a means of capturing the why of the change.
You should ensure that this audit logging is enabled and familiarize yourself with its storage, how to query it, and its fundamental limitations.
After which you should develop a similar pattern to your runbooks of recording changes to your cloud environment, with a rationale behind them.
Documented Manual Click-Ops: Manual click-ops are used, but steps are documented. Operations may be tested in a similarly maintained non-production environment, though discrepancies likely exist between environments.
How to determine if this good enough?
TODO
How do I do better?
TODO
Semi-Automated with Some Scripting: Some aspects of cloud management are automated, possibly through scripting, but manual interventions are still common for complex tasks or configurations.
How to determine if this good enough?
TODO
How do I do better?
TODO
Highly Automated with Standardized Processes: Cloud management is largely automated with standardized processes across environments. Regular reviews and updates are made to ensure alignment with best practices.
How to determine if this good enough?
TODO
How do I do better?
TODO
Fully Managed by Declarative Code with Drift Detection: Cloud management is fully automated and managed by declarative code. Continual automated drift detection is in place, with alerts for any deviations treated as significant incidents.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How is policy application and enforcement managed in your organization?
No Policy Application: Policies are not actively applied within the organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Policy Existence Without Enforcement: Policies exist but are not actively enforced or monitored.
How to determine if this good enough?
TODO
How do I do better?
TODO
Process-Driven Application: Policies are applied primarily through organizational processes without significant technical support.
How to determine if this good enough?
TODO
How do I do better?
TODO
Process-Driven with Limited Technical Control: Policies are comprehensively applied through processes, supported by limited technical control mechanisms.
How to determine if this good enough?
TODO
How do I do better?
TODO
Fully Integrated Application and Enforcement: Policies are applied and enforced comprehensively through well-established processes, with robust technical controls executed at all stages.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How is version control and branch strategy implemented in your organization?
Limited Version Control Usage: Version control is used minimally, indicating a lack of robust processes for managing code changes and history.
How to determine if this good enough?
-
Small Scale Projects: For very small or short-term projects with a minimal development team (possibly a single developer), the complexity introduced by extensive version control practices might not justify the benefits.
-
Non-Critical Applications: Projects that do not directly impact the core operations or integrity of the organization, and where the risk of downtime or bugs is low and manageable without requiring immediate fixes.
-
Early-Stage Development: In the very early stages of development, where exploration and rapid prototyping are prioritized over maintaining a detailed history of changes.
-
Low Collaboration Requirements: Environments where developers work in silos on separate components without much need for collaboration or where there is no concurrent development occurring.
However, it’s crucial to recognize that these conditions are relatively rare in modern software development, particularly within the context of cloud in the public sector. Moving towards a more comprehensive version control system and branch strategy is generally advisable as it:
- Enhances collaboration among developers.
- Improves the traceability of changes, aiding in compliance and audit processes.
- Facilitates continuous integration/continuous deployment (CI/CD) pipelines, which are integral to efficient cloud-native development.
- Mitigates risks associated with codebase regressions and facilitates faster rollback to stable versions when necessary.
- Decentralized systems can provide an inherent risk mitigation over individual failures, since every participant will typically have a full copy of the codebase locally, and they can push their work in progress to share with others or just mitigate against their workstation failure or theft.
How do I do better?
Select a Version Control System (VCS)
Generally speaking Git is considered the most widely used version control system today, and will service the majority of needs. There are some exceptions to this such as handling many large files such as video and other large files, where industry as developed specific solutions to this. However it is important to select the right tool for the job which may mean multiple tools.
Git is fantastic where the majority of the content is code stored as (plain text), it is not a good fit for keeping compiled copies though (these are typically called artifacts and have different storage requirements).
Select a Version Control Hosting Platform (VCS)
On the whole most public sector departments and workloads should by default look to a managed specialist service like GitHub, GitLab or Bitbucket for hosting your version control.
In addition many cloud vendors to also provide their own such as:
Onboard teams and projects to the hosting platform
Moving to version control is a significant investment not to be understated, the dividends are also significant; though the path may not be easy when teams and projects haven’t done this before, it requires a level of discipline and change to ways of working and thinking.
You should look to migrate teams incrementally, and promote the benefits that they will enjoy such as not overwriting each others changes, and speeding up identifying the cause of regression defects.
Related reading material:
- GOV.UK Service Manual - Maintaining version control in coding
- How GDS uses Git and GitHub
- Internal guidance for GDS teams about using Git
- GOV.UK Developer docs: Github
- Government analysts introduction to Git
- Codecademy: Learn Git & GitHub
- NCSC: Manage change effectively
- NCSC: Protect your code repository
Custom, Unconventional Branch Strategy: An invented branch strategy is in use, not aligning with standard methodologies and potentially leading to confusion or inefficiencies.
How to determine if this good enough?
TODO
How do I do better?
TODO
Adapted Recognized Branch Strategy: The organization adapts a recognized branch strategy (like GitFlow or GitHubFlow), tailoring it to specific needs while maintaining some standard practices.
How to determine if this good enough?
TODO
How do I do better?
TODO
Textbook Implementation of a recognized branch strategy: The organization adheres strictly to a model such as GitFlow, a recognized branch strategy suitable for managing complex development processes.
How to determine if this good enough?
TODO
How do I do better?
TODO
Textbook Implementation of a streamlined branch strategy: The organization follows a streamlined branch strategy ideal for continuous delivery and simplified collaboration such as GitHubFlow precisely.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What is your primary method for provisioning cloud services?
Manual or Imperative Provisioning: Cloud services are primarily provisioned manually through consoles, portals, CLI, or other tools, without significant automation.
How to determine if this good enough?
TODO
How do I do better?
TODO
Limited Scripting with No Standards: Provisioning involves some scripting, but there are no formal standards or consistency across project teams.
How to determine if this good enough?
TODO
How do I do better?
TODO
Partial Declarative Automation: Declarative automation is used for provisioning some cloud services across their lifecycle, but this practice is not uniform across all teams.
How to determine if this good enough?
TODO
How do I do better?
TODO
Widespread Use of Declarative Automation: Most project teams employ declarative automation for cloud service provisioning, indicating a higher level of maturity in automation practices.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mandatory Declarative Automation via CI/CD: Declarative automation is mandated for provisioning all production services, and it is exclusively executed through Continuous Integration/Continuous Deployment (CI/CD) pipelines.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
Operations
How comprehensive is the use of CI/CD tooling in your organization?
No CI/CD Tooling: Traditional build, test, and deploy practices are in use, with no implementation of CI/CD tooling.
How to determine if this good enough?
TODO
How do I do better?
TODO
Limited CI/CD Tooling on Some Projects: CI/CD tooling is used by some projects, but there are no formal standards or widespread adoption across the organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Varied CI/CD Tooling Across Teams: Many project teams use CI/CD tooling, though the choice of tools and practices is based on individual team preferences.
How to determine if this good enough?
TODO
How do I do better?
TODO
Widespread, Team-Preferred CI/CD Tooling: Most project teams employ CI/CD tooling, largely based on team preferences, with traditional practices being very limited.
How to determine if this good enough?
TODO
How do I do better?
TODO
Standardized and Consistent CI/CD Practices: A standardized CI/CD pipeline is consistently used across project teams organization-wide, indicating a high level of maturity in deployment practices.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization ensure that applications are built and deployed in a timely manner?
No Routine Measurements, Slow Processes: There are no routine measurements for build and deployment times. Builds and deployments often take days to plan and hours to execute, with little monitoring for SLA compliance.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Tracking with Some Delays: Some basic tracking of build and deployment times is in place, but processes are still relatively slow, often resulting in delays.
How to determine if this good enough?
TODO
How do I do better?
TODO
Moderate Efficiency with Occasional Monitoring: The organization has moderately efficient build and deployment processes, with occasional monitoring and efforts to adhere to timelines.
How to determine if this good enough?
TODO
How do I do better?
TODO
Streamlined Processes with Regular Monitoring: Builds and deployments are streamlined and regularly monitored, ensuring that they are completed within reasonable timeframes.
How to determine if this good enough?
TODO
How do I do better?
TODO
Continual Improvement with Rapid Execution: The organization has a strong focus on continual improvement and efficiency. 99% of builds and deployments are completed in single-digit minutes, with consistent monitoring and optimization efforts.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization monitor and observe its cloud infrastructure and application data?
Reactive and Development-Focused Observation: Observations are primarily made during the development phase or in response to issues, with no continuous monitoring in place.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Monitoring Tools and Manual Checks: Basic monitoring tools are used. Checks are often manual and are not fully integrated across different cloud services.
How to determine if this good enough?
TODO
How do I do better?
TODO
Systematic Monitoring with Alerts: Systematic monitoring is in place with alert systems for potential issues. However, the integration of infrastructure and application data is still developing.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Monitoring with Partial Integration: Advanced monitoring tools are used, providing more comprehensive data. There’s a degree of integration between infrastructure and application monitoring, but it’s not fully seamless.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated ‘Single Pane of Glass’ Monitoring: A sophisticated, integrated monitoring system is in place, offering a ‘single pane of glass’ view. This system provides actionable insights from both infrastructure and application data.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization obtain real-time insights and answer business-related questions?
SME Analysis with Limited Data Literacy Understanding: Insights largely depend on subject mater experts who analyze available data and provide answers. These experts, while knowledgeable in their field, may not always have a high level of data literacy, making the process more costly and only point in time, not real-time.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Reporting Tools with Delayed Insights: The organization uses basic reporting tools that provide insights, but there is typically a delay in data processing and limited real-time capabilities.
How to determine if this good enough?
TODO
How do I do better?
TODO
Intermediate Analytics with Some Real-Time Data: A combination of analytics tools is used, offering some real-time data insights, though comprehensive, immediate access is limited.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Analytics Tools with Broad Real-Time Access: The organization employs advanced analytics tools that provide broader access to real-time data, enabling quicker insights and decision-making.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive Self-Service Dashboarding: A self-service dashboarding capability is in place, offering wide access to various data points and enabling users across the organization to derive real-time insights independently.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization release updates to its applications and services?
Downtime for Updates: Updates are applied by shutting down production, updating applications in place, and restarting. Rollbacks rely on backups if needed.
How to determine if this good enough?
TODO
How do I do better?
TODO
Rolling Updates During Maintenance Windows: Updates are performed using rolling updates, impacting production capacity to some extent, usually scheduled during maintenance windows.
How to determine if this good enough?
TODO
How do I do better?
TODO
Manual Cut-Over with New Versions: New versions of applications are deployed without impacting existing production, with a manual transition to the new version during a maintenance window. Manual rollback to the previous version is possible if needed.
How to determine if this good enough?
TODO
How do I do better?
TODO
Canary or Blue/Green Strategy with Manual Transition: Updates are released using a canary or blue/green strategy, allowing manual transition between current and new versions. Formal maintenance windows are not routinely necessary.
How to determine if this good enough?
TODO
How do I do better?
TODO
Dynamic Canary/Blue/Green Strategy without Maintenance Windows: Updates are managed via a canary or blue/green strategy with dynamic transitioning of users between versions. This approach eliminates the need for formal maintenance windows.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How is your deployment and QA pipeline structured?
Manual Scheduled QA Process: Deployment and QA are handled through a manually scheduled process, lacking automation and continuous integration.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Automation with Infrequent Deployments: Some level of automation exists in the QA process, but deployments are infrequent and partially manual.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Deployment and Regular QA Checks: Deployment is integrated with regular QA checks, featuring a moderate level of automation and consistency in the pipeline.
How to determine if this good enough?
TODO
How do I do better?
TODO
CI/CD with Automated Testing: A Continuous Integration/Continuous Deployment (CI/CD) pipeline is in place, including automated testing and frequent, reliable deployments.
How to determine if this good enough?
TODO
How do I do better?
TODO
On-Demand Ephemeral Environments: Deployment and QA utilize short-lived, ephemeral environments provisioned on demand, indicating a highly sophisticated, efficient, and agile pipeline.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How is your organization structured to develop and implement its cloud vision and strategy?
No Dedicated Cloud Team: There is no specific team focusing on cloud strategy; teams operate in silos based on traditional, on-premises role definitions.
How to determine if this good enough?
TODO
How do I do better?
TODO
Informal Cloud Expertise: Informal groups or individuals with cloud expertise exist, facilitating some degree of cross-organizational collaboration.
How to determine if this good enough?
TODO
How do I do better?
TODO
Formal Cross-Functional Cloud Team/COE: A formal Cloud Center of Excellence or equivalent cross-functional team exists, providing foundational support and guidance for cloud operations.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Cloud Teams Following COE Standards: Cloud teams across the organization follow standards and patterns established by the Cloud COE. Cross-functional roles are increasingly common within development teams.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Cloud COE Operating Model: The Cloud COE has matured into a comprehensive operating model with fully autonomous, cross-functional teams that include experts in all necessary technology and process domains.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What is the structure of your organization in terms of managing cloud operations?
Developer-Managed Cloud Operations: There is no dedicated cloud team; application developers are responsible for managing all aspects of cloud operations.
How to determine if this good enough?
TODO
How do I do better?
TODO
Fully Outsourced Cloud Operations and Strategy: All cloud operations, including the definition of strategic direction, are outsourced to a third-party supplier.
How to determine if this good enough?
TODO
How do I do better?
TODO
Outsourced Operations with Internal Strategic Ownership: Cloud operations are outsourced, but the strategic direction for cloud usage is developed and owned internally by the department.
How to determine if this good enough?
TODO
How do I do better?
TODO
Hybrid Approach with Outsourced Augmentation: A mix of in-house and outsourced resources is used. Third-party suppliers provide additional capabilities (e.g., on-call support), while strategic cloud direction is led by departmental leaders.
How to determine if this good enough?
TODO
How do I do better?
TODO
Dedicated In-House Cloud Team: A robust, dedicated cloud team exists within the organization, comprising at least 5 civil/public servant employees per cloud platform. This team has a shared roadmap for cloud capabilities, adoption, and migration.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What is your organization's approach to planning and preparing for incident response?
Ad-Hoc and Basic Efforts: Incident response is primarily ad-hoc, with some basic efforts in place but no formalized plan or structured approach.
How to determine if this good enough?
TODO
How do I do better?
TODO
Initial Documentation at Service Launch: A documented incident response plan is required and established at the point of introducing a new service to the live environment.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regularly Updated Incident Plan: The incident response plan is not only documented but also periodically reviewed and updated to ensure its relevance and effectiveness.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated and Tested Plans: Incident response planning is integrated into the broader IT and business continuity planning. Regular testing of the plan is conducted to validate procedures and roles.
How to determine if this good enough?
TODO
How do I do better?
TODO
Rehearsed and Proven Response Capability: Incident response plans are not only documented and regularly updated but also rigorously rehearsed. The organization is capable of successfully recovering critical systems within a working day.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
People
How does your organization engage with cloud providers to develop capabilities and services?
Minimal Interaction with Cloud Providers: The relationship with cloud providers is transactional, limited to accessing their services without any significant contact or support from their account or technical teams.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Support Utilization: Some basic support services from cloud providers are utilized, such as occasional technical assistance or access to standard documentation and resources.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Interaction and Support: There is regular interaction with cloud provider account managers, including access to standard training and support services to assist in leveraging cloud capabilities.
How to determine if this good enough?
TODO
How do I do better?
TODO
Proactive Engagement and Tailored Support: The organization engages proactively with cloud providers, receiving tailored support, training, and workshops that align with specific needs and goals.
How to determine if this good enough?
TODO
How do I do better?
TODO
Strategic Partnership with Comprehensive Support: Cloud providers are engaged as strategic partners, offering comprehensive support, including regular training, workshops, and active collaboration. This partnership is instrumental in realizing strategic goals and includes opportunities for the organization to showcase its work through the provider’s platforms.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization manage and incentivize the completion of cloud-related training and certification goals?
No Formal Training Support: There is no formal support for certification or training, nor are any specific goals or targets defined for employee development in cloud skills.
How to determine if this good enough?
TODO
How do I do better?
TODO
Managerial Discretion on Training: Training and certifications are supported at the discretion of individual managers. Team-level training goals are set but not consistently monitored or reported.
How to determine if this good enough?
TODO
How do I do better?
TODO
Corporate-Level Training Support and Tracking: Training and certifications are strongly supported with allocated budgets and managerial encouragement. Team-level training goals are consistently defined, tracked, and reported at the corporate level.
How to determine if this good enough?
TODO
How do I do better?
TODO
Role-Based Training Recommendations and Self-Assessment: Relevant certifications are recommended based on specific roles and incorporated into personal development plans. Employees are encouraged to self-assess their progress against role-specific and team-level goals.
How to determine if this good enough?
TODO
How do I do better?
TODO
Incentivized and Assessed Training Programs: Employees completing certifications are rewarded with merit incentives and receive structured guidance and development plans. Periodic formal role-specific assessments are conducted, with achievements recognized through systems like GovUKCloudBadges.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization prioritize cloud experience in its hiring practices of senior/executive/leadership roles, suppliers and contingent labour?
No Specific Cloud Experience Requirement: Cloud experience is not a requirement in job postings; candidates are not specifically sought out for their cloud skills.
How to determine if this good enough?
TODO
How do I do better?
TODO
Selective Requirement for Cloud Experience: Some job postings, particularly those in relevant areas, require candidates to have prior cloud experience.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mandatory Cloud Experience for Relevant Roles: All relevant job postings mandate cloud experience, aligning with the Digital, Data, and Technology (DDaT) role definitions.
How to determine if this good enough?
TODO
How do I do better?
TODO
Updated Role Requirements and Cloud-Focused Hiring: In addition to requiring cloud experience for new hires, existing roles have been reviewed and updated as necessary to reflect a cloud-first IT organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive Cloud Experience Requirement and Role Adaptation: All job postings require cloud experience, and every existing role within the organization has been evaluated and updated where necessary to align with the needs of a cloud-first IT organization.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization qualify suppliers and partners for cloud initiatives?
Basic Qualification Based on Marketing and Framework Presence: Selection is based primarily on the supplier’s sales literature and their presence on commercial buying frameworks.
How to determine if this good enough?
TODO
How do I do better?
TODO
Initial Due Diligence and Basic Compliance Checks: Suppliers are chosen through basic due diligence, focusing on compliance with minimum standards and requirements.
How to determine if this good enough?
TODO
How do I do better?
TODO
Moderate Screening for Experience and Compliance: Partners are qualified based on their industry experience, compliance with relevant standards, and basic alignment with organizational needs.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive Evaluation Including Technical and Ethical Alignment: Suppliers are thoroughly vetted for technical competence, ethical alignment with organizational values, and their ability to support specific cloud objectives.
How to determine if this good enough?
TODO
How do I do better?
TODO
Strategic Selection with Emphasis on Long-Term Value and Leadership Vision: Suppliers are selected based on a track record of excellence, recommendations from other departments, relevant certifications, demonstrable technical leadership, alignment with the civil service code, support for programs like apprenticeships, strong engagement with the leadership vision, clear articulation of risks, measurable KPIs, and long-term value for money.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization support and develop individuals with limited or no cloud experience for roles in cloud initiatives?
No Specific Development Path: There is no special accommodation or development path for individuals with limited or no cloud experience.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic On-the-Job Training: Individuals with limited cloud experience are provided basic on-the-job training to help them adapt to cloud-related tasks.
How to determine if this good enough?
TODO
How do I do better?
TODO
Structured Training and Mentorship Programs: The organization offers structured training programs, including mentorship and peer learning, to develop cloud skills among employees with limited cloud experience.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Learning and Development Initiatives: Comprehensive learning initiatives, such as in-house training courses or collaborations with external training providers, are in place to up-skill employees in cloud technologies.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mature Apprenticeship/Bootcamp Program with Aftercare: A robust apprenticeship, bootcamp, career change programs exists for rapid skill development in cloud technologies. This program includes significant aftercare support to ensure long-term development and retention of the investment in these individuals.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
To what extent are third parties involved in the development and support of your organization's cloud initiatives?
Complete Reliance on Third Parties: Third parties are fully responsible for all cloud work, with unrestricted access to the entire cloud infrastructure.
How to determine if this good enough?
TODO
How do I do better?
TODO
Significant Third-Party Involvement: Third parties play a major role in delivering certain aspects of cloud work and have full access to cloud accounts.
How to determine if this good enough?
TODO
How do I do better?
TODO
Specialized Third-Party Support with Limited Access: Third-party providers contribute specialized knowledge and maintain ‘break glass’ (emergency) admin access only.
How to determine if this good enough?
TODO
How do I do better?
TODO
Specialized Knowledge without Privileged Access: Third parties provide specialized expertise but do not have any form of privileged access to cloud infrastructure.
How to determine if this good enough?
TODO
How do I do better?
TODO
Minimal or Augmentative Third-Party Role: Third parties are either not used at all or serve purely as staff augmentation, without any privileged access or holding exclusive knowledge.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What are the success criteria for your cloud team?
No Defined Success Criteria: The cloud team operates without specific, defined criteria for measuring success.
How to determine if this good enough?
TODO
How do I do better?
TODO
Initial Achievements with Proofs of Concept: Success is measured by completing initial proofs of concept or developing a ‘minimum viable cloud/platform’.
How to determine if this good enough?
TODO
How do I do better?
TODO
Launching Workloads in Production: Success includes transitioning one or more workloads into a live production environment on the cloud.
How to determine if this good enough?
TODO
How do I do better?
TODO
Scaling Prototypes to Core Services: Success involves scaling initial prototypes to operate core technical services in the cloud, supporting business-critical applications.
How to determine if this good enough?
TODO
How do I do better?
TODO
Innovation and Value Creation Alignment: The organization has established success criteria that not only focus on cloud-based innovation and experimentation but also on creating tangible value through transformation initiatives, all aligned with the organization’s broader goals and strategy.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What level of executive sponsorship supports your organization's 100% cloud adoption initiative?
No Executive Sponsorship: There is no executive support for cloud adoption, indicating a lack of strategic prioritization at the leadership level.
How to determine if this good enough?
TODO
How do I do better?
TODO
Senior Management Sponsorship: The initiative is sponsored by senior management, indicating some level of support but potentially lacking full executive influence.
How to determine if this good enough?
TODO
How do I do better?
TODO
C-Level Executive Sponsorship: One or more C-level executives sponsor the cloud adoption, demonstrating significant commitment at the highest levels of leadership.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive C-Level Sponsorship with Roadmap: Full sponsorship from C-level executives, accompanied by a shared, strategic roadmap for cloud adoption and migration.
How to determine if this good enough?
TODO
How do I do better?
TODO
C-Level Sponsorship Driving Cloud-First Culture: Comprehensive C-level sponsorship not only provides strategic direction but also actively fosters a culture of cloud-first adoption, experimentation, and innovation throughout the organization.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
Security
How does your organization authenticate and manage non-human service accounts?
Basic User/Pass Credentials: Non-human service accounts are managed using basic ID/secret pair credentials, with a user/password approach.
How to determine if this good enough?
TODO
How do I do better?
TODO
API Key Usage: Non-human service accounts are authenticated using API keys, which are less dynamic and might have longer lifespans.
How to determine if this good enough?
TODO
How do I do better?
TODO
Centralized Secret Store with Some Credential Rotation: A central secret store is in place, possibly supporting automated rotation of credentials for some systems, enhancing security and management efficiency.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mutual TLS for Authentication: Mutual Transport Layer Security (mTLS) is used for non-human service accounts, providing a more secure, certificate-based authentication method.
How to determine if this good enough?
TODO
How do I do better?
TODO
Short-Lived, Federated Identities with Strong Verification: Non-human service accounts use short-lived, federated identities that are strongly verifiable and validated with each request, ensuring a high level of security and minimizing the risk of credential misuse.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization authenticate and manage user identities?
Basic or No Identity Policies: There are limited or no organization-wide identity policies, such as password policies, with minimal audit or enforcement mechanisms to ensure compliance.
How to determine if this good enough?
TODO
How do I do better?
TODO
Manual Identity Policy Enforcement: While a common set of identity policies may exist, their enforcement and audit rely on manual efforts, such as retrospective analysis of logs or reports.
How to determine if this good enough?
TODO
How do I do better?
TODO
Partially Automated Identity Management: Organization-wide identity policies, including 2FA/MFA for privileged accounts, are in place. Audit and enforcement processes are partially automated.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced and Mostly Automated Identity Management: Centralized identity policies and audit procedures, possibly including 2FA/MFA for all users and leveraging Single Sign-On (SSO). Most audit and enforcement activities are automated.
How to determine if this good enough?
TODO
How do I do better?
TODO
Fully Centralized and Automated Identity Management: Comprehensive, fully centralized identity policies and audit procedures with complete automation in enforcement. Policies encompass enterprise-standard MFA and SSO. Automated certification processes for human users and system accounts are in place, especially for accessing sensitive data, along with on-demand reporting capabilities.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization ensure that users have appropriate permissions aligned with their roles?
Ad-Hoc and Informal Review Process: User entitlements and profiles are reviewed in an ad-hoc, informal manner with administrators manually managing these as they see fit.
How to determine if this good enough?
TODO
How do I do better?
TODO
Periodic Manual Reviews with Limited Action: Periodic manual reviews of access rights are conducted for some systems, but access is rarely revoked or reduced due to concerns about unintended consequences.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Manual Reviews, Primarily Additive: Regular, manual reviews of access rights are conducted across most systems. However, changes to access are generally additive rather than reductive.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Reviews with Defined Expiry Dates: Access is regularly reviewed, certified, and remediated. Role allocations include defined expiry dates, necessitating review and re-certification.
How to determine if this good enough?
TODO
How do I do better?
TODO
Automated, Risk-Based Access Reviews: Fully integrated, automated reviews ensure users have permissions appropriate to their roles. Access rights are dynamically adjusted based on role changes or review outcomes. Both access roles and their allocations have expiry dates for mandatory review and re-certification.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization handle user provisioning for cloud systems, focusing on authentication for human users?
Shared Accounts and Manual Account Management: Accounts are shared or reused between multiple people with limited ability to discern who carried out an action from any logs collected. Where individual accounts exist for each user accounts they are manually created, deleted, updated, and assigned, involving significant manual effort and potential for inconsistency.
How to determine if this good enough?
TODO
How do I do better?
TODO
Identity Repository with Manual Processes: An organizational identity repository (like Active Directory or LDAP) is used as the user source of truth, but processes for cloud system integration are manual or inconsistent.
How to determine if this good enough?
TODO
How do I do better?
TODO
Common Standards for Identity Management: Standardized protocols and practices are in place for managing and mapping user identities between identity providers and cloud systems. Non-compliant services are less preferred.
How to determine if this good enough?
TODO
How do I do better?
TODO
Automated Federated Identity Management: Federated identity management is fully automated, ensuring consistent user provisioning across all systems. Non-compliant services are isolated with appropriate mitigations.
How to determine if this good enough?
TODO
How do I do better?
TODO
Unified Cloud-Based Identity Provider: A fully cloud-based user directory or identity provider acts as the single source of truth. Centralized management is aligned with user onboarding, movements, and terminations. Services not supporting federated identity have been phased out.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization manage authentication for non-human service accounts in cloud systems?
Human-like Accounts for Services: Non-human service accounts are set up similarly to human accounts, with long-lived credentials that are often shared.
How to determine if this good enough?
TODO
How do I do better?
TODO
Locally Managed Long-Lived API Keys: Long-lived API keys are used for service accounts, with management handled locally at the project or program level.
How to determine if this good enough?
TODO
How do I do better?
TODO
Centralized Secret Store for Service Accounts: A centralized repository or secret store is in place for all non-human service accounts, and its use is mandatory across the organization.
How to determine if this good enough?
TODO
How do I do better?
TODO
Ephemeral Identities with Attestation: Service accounts do not use long-lived secrets; instead, identity is established dynamically based on attestation mechanisms.
How to determine if this good enough?
TODO
How do I do better?
TODO
Code-Managed Identities with Federated Trust: Identities for non-human services are managed as part of the infrastructure-as-code paradigm, allowing seamless federation across the organization without needing point-to-point trust relationships.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization manage risks?
Basic and Informal Risk Management: Risk management is carried out in a basic and informal manner, often relying on individual judgement without structured processes.
How to determine if this good enough?
TODO
How do I do better?
TODO
Ad-Hoc Spreadsheets for Risk Tracking: Risks are tracked using ad-hoc spreadsheets at the project or program level, without a standardized or centralized system.
How to determine if this good enough?
TODO
How do I do better?
TODO
Formalized Risk Registers with Periodic Reviews: Formal risk registers are maintained for projects or programs, with risks reviewed and updated on a periodic basis.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Risk Management with Central Oversight: A centralized risk management system is used, integrating risks from various projects or programs, with regular updates and reviews.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Risk Management Tool with Proactive Escalation: A shared, advanced risk management tool is in place, allowing for tracking and managing risks across multiple projects or programs. This system supports informed prioritization and proactively escalates unacceptable risks.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization manage staff identities?
Independent Identity Management: Each service manages identities independently, without integration or synchronization across systems.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Centralized Identity System: There is a centralized system for identity management, but it’s not fully integrated across all services.
How to determine if this good enough?
TODO
How do I do better?
TODO
Integrated Identity Management with Some Exceptions: Identities are mostly managed through an integrated system, with a few services still operating independently.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced Integrated Identity Management: A comprehensive system manages identities, integrating most services and applications, with efforts to ensure synchronization and uniformity.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mandatory Single Source of Identity: A single source of identity is mandated for all services, with a strict one-to-one mapping of human to identity, ensuring consistency and security across the organization.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization mitigate risks associated with privileged internal threat actors?
Vetting of Privileged Users: All users with privileged access undergo thorough internal vetting (Internal/UKSV) or are vetted according to supplier/contractual requirements.
How to determine if this good enough?
TODO
How do I do better?
TODO
Audit Logs as a Non-Functional Requirement: Systems are required to maintain audit logs, although these logs lack technical controls for centralization or comprehensive monitoring.
How to determine if this good enough?
TODO
How do I do better?
TODO
Local Audit Log Checks During Assessments: Local audit log presence is verified as part of Integrated Technical Health Checks (ITHC) or other pre-launch processes, but routine monitoring may be absent.
How to determine if this good enough?
TODO
How do I do better?
TODO
Centralized, Immutable Audit Logs with Automated Monitoring: Immutable system audit logs are centrally stored. Their integrity is continuously assured, and the auditing process is automated. Log retention is defined and enforced automatically.
How to determine if this good enough?
TODO
How do I do better?
TODO
Regular Audits and Legal Compliance Checks: Regular rehearsal exercises are conducted with the assistance of auditors and legal experts. These checks ensure the integrity, completeness, and legal admissibility of logs as key evidence in potential criminal prosecutions.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization monitor and manage security within its software supply chain?
Unmanaged Dependencies: Dependencies are not formally managed, installed ad-hoc as needed, and updated periodically without tracking versions or full dependency trees, such as using apt
or yum
to install packages without a manifest file that can operate as an SBOM.
How to determine if this good enough?
TODO
How do I do better?
TODO
Basic Dependency Management with Ad-Hoc Monitoring: All dependencies are set at project initiation and updated during major releases or in response to significant advisories. Some teams use tools to monitor supply chain security in an ad-hoc manner, scanning dependency manifests with updates aligning with project releases.
How to determine if this good enough?
TODO
How do I do better?
TODO
Proactive Remediation Across Repositories: All repositories are actively monitored, with automated remediation steps. Updates are systematically applied, aligning with project release schedules.
How to determine if this good enough?
TODO
How do I do better?
TODO
Centralized Monitoring with Context-Aware Triage: A centralized Security Operations Center (SOC) maintains an overview of all repositories, coordinating high-severity issue remediation. The system also triages issues based on dependency usage context, focusing remediation efforts on critical issues.
How to determine if this good enough?
TODO
How do I do better?
TODO
Advanced, Integrated Security Management: This approach combines centralized monitoring, risk management, and context-aware triage, with a focus on minimizing false positives and ensuring focused, effective remediation across the organization’s software supply chain.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
How does your organization monitor and manage threats, vulnerabilities, and misconfigurations?
No Vulnerability Management: It is not clear to a member of the public how they can report vulnerabilities in your systems.
How to determine if this good enough?
TODO
How do I do better?
TODO
Open Policy or Participation in Responsible Disclosure Platforms: Clear instructions for responsible vulnerability disclosure are published, with a commitment to prompt response upon receiving reports, you may also have active participation in well-known responsible disclosure platforms to facilitate external reporting of vulnerabilities.
How to determine if this good enough?
TODO
How do I do better?
TODO
Automated Scanning and Regular Assessments: Implementation of automated tools for scanning vulnerabilities and misconfigurations, combined with regular security assessments.
How to determine if this good enough?
TODO
How do I do better?
TODO
Proactive Threat Hunting and Incident Response: Proactive threat hunting practices are in place. Incident response teams rapidly address identified threats and vulnerabilities, with some degree of automation in responses.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive Security Operations with Red/Purple Teams: Utilization of red teams (offensive security) and purple teams (combined offensive and defensive) for a full-spectrum security assessment. An empowered Security Operations Center (SOC) conducts at least annual and major change-based IT Health Checks (ITHC). Analysts prioritize and coordinate remediation of high-severity issues, with many mitigation actions automated and event-triggered.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What approach does your organization take towards network architecture for security?
Traditional Network Perimeter Security: Security relies primarily on network-level controls like IP-based allow-lists and firewall rules to create a secure perimeter around hosted data and applications.
How to determine if this good enough?
TODO
How do I do better?
TODO
Network Security with Basic Identity Verification: The traditional network-based security perimeter is supplemented with mechanisms to verify user identity within the context of access requests.
How to determine if this good enough?
TODO
How do I do better?
TODO
Enhanced Identity Verification: Security includes verification of both user and service identities in the context of requests, augmenting the network-based security perimeter.
How to determine if this good enough?
TODO
How do I do better?
TODO
Partial Shift to Identity-Centric Security: In some areas, the network-based security perimeter is replaced by robust identity verification mechanisms for users and services, reducing the reliance on VPNs for secure access.
How to determine if this good enough?
TODO
How do I do better?
TODO
No Reliance on Network Perimeter or VPN: The organization has moved away from a network-based security perimeter. Access control is centered around individual devices and users, requiring strong attestations for trust establishment.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What is your organization's approach to implementing 2FA/MFA for securing access?
Encouraged but Not Enforced: 2FA/MFA is broadly recommended in organizational guidelines, but it is not mandatory or consistently enforced across services and users.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mandated but Inconsistently Enforced: 2FA/MFA is a requirement for all services and users, but enforcement is inconsistent and may have gaps.
How to determine if this good enough?
TODO
How do I do better?
TODO
Uniform Enforcement with Some Exceptions: 2FA/MFA is uniformly enforced across all services and users, with only a few exceptions based on specific use cases or risk assessments.
How to determine if this good enough?
TODO
How do I do better?
TODO
Prohibition of Vulnerable 2FA/MFA Methods: Stronger 2FA/MFA methods are enforced, explicitly excluding forms vulnerable to attacks like SIM swapping (e.g., SMS/phone-based methods).
How to determine if this good enough?
TODO
How do I do better?
TODO
Stringent 2FA/MFA with Hardware Key Management: Only services supporting robust 2FA/MFA are used. Hardware-based MFA keys are centrally managed and distributed, ensuring high-security standards for authentication.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What is your organization's approach to managing privileged access?
Ad-Hoc Management by Administrators: Privileged credentials are managed on an ad-hoc basis by individual system administrators, without standardized processes.
How to determine if this good enough?
TODO
How do I do better?
TODO
Centralized Controls with Basic Vaulting: Technology controls are in place for centralized management, including initial password and key vaulting, integrated logs, and policy-based activities.
How to determine if this good enough?
TODO
How do I do better?
TODO
Structured Identity Administration with OTPs: Identity administration controls and processes are established for managing privileged access, including the use of one-time passwords (OTPs).
How to determine if this good enough?
TODO
How do I do better?
TODO
Automated Risk-Based Access Control: Privileged access is managed through automated, risk-based workflows and controls. This includes consistent monitoring across cloud platforms.
How to determine if this good enough?
TODO
How do I do better?
TODO
Context-Aware Just-in-Time Privileges: Access is granted on a just-in-time basis, using contextual factors to determine necessity (e.g., time-based access for critical tasks). Real-time alerting is in place for all activity, with mandatory wash-ups that require Senior leadership present, prioritization given to automating and preventing further need.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What measures are in place in your organization to mitigate the risk of data breaches, including exfiltration, corruption, deletion, and non-availability?
Manual Data Access Classification: Data access is primarily managed through manual classification, with minimal automation or centralized control.
How to determine if this good enough?
TODO
How do I do better?
TODO
Centralized Policies and Controls: A centralized set of policies and controls is in place to prevent unauthorized data access, forming the core of the data security strategy.
How to determine if this good enough?
TODO
How do I do better?
TODO
Policies with Limited Monitoring: In addition to centralized policies and controls, limited monitoring for data exfiltration is conducted to identify potential breaches.
How to determine if this good enough?
TODO
How do I do better?
TODO
Comprehensive Controls with Automated Detection: Preventative, detective, and corrective controls are implemented. Anomaly detection and correction are automated using a range of platforms and tools, providing a more robust defense.
How to determine if this good enough?
TODO
How do I do better?
TODO
Fully Automated Security and Proactive Monitoring: Advanced, fully automated controls and anomaly detection systems are in place. This includes proactive monitoring, regular access reviews, and continuous auditing to ensure data security and compliance.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
Technology
How are technology selections made for new projects within your organization?
Ad-Hoc and Independent Selections: Each project independently selects technologies, leading to a diverse and often incompatible technology estate.
How to determine if this good enough?
TODO
How do I do better?
TODO
Uniform Technology Mandate: Technology choices are highly regulated, with a uniform, organization-wide technology stack that all projects must adhere to.
How to determine if this good enough?
TODO
How do I do better?
TODO
Guided by Outdated Resources: A technology radar and some documented patterns exist, but they are outdated and not widely regarded as useful or relevant.
How to determine if this good enough?
TODO
How do I do better?
TODO
Current and Maintained Guidance: A regularly updated technology radar, along with current documentation and patterns, covers a wide range of use cases and is actively used for guidance.
How to determine if this good enough?
TODO
How do I do better?
TODO
Collaborative and Evolving Ecosystem: Regular show-and-tell sessions and collaboration with existing teams are encouraged. There’s a strong emphasis on reusing and extending existing solutions, alongside rewarding innovation and experimentation.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.
What characterizes the majority of your current technology stack?
Monolithic Applications with Wide Technology Stack: The predominant architecture is monolithic, with applications deployed as single, indivisible units encompassing a wide range of technologies.
How to determine if this good enough?
TODO
How do I do better?
TODO
Modular but Not Independently Deployable: Applications are broken down into modules, offering greater development flexibility, yet these modules are not deployable as independent components.
How to determine if this good enough?
TODO
How do I do better?
TODO
Modularized and Individually Deployable Components: Applications are structured into self-contained, individually deployable components. However, significant interdependencies add complexity to testing.
How to determine if this good enough?
TODO
How do I do better?
TODO
Mostly Independent Deployment with Some Monoliths: While most application components are independently deployable and testable, a few core system components still rely on a monolithic architecture.
How to determine if this good enough?
TODO
How do I do better?
TODO
Fully Component-Based Modular Architecture: The technology stack consistently utilizes a component-based modular approach. All components are independently testable and deployable, free from monolithic stack dependencies.
Keep doing what you’re doing, write some blog posts and make pull requests to this guidance to help others.