The SRE engagement model may evolve over time as SREs can be involved earlier in the development lifecycle.
The process of initiating SRE support for a system is called onboarding. Systems might be:
- Greenfield developments, with no deployments.
- Already in production:
- with no monitoring and an unspoken 100% availability goal; or
- with an SLO without teeth: a < 100% availability goal but no understanding of importance or how to leverage it for continuous improvement; an SLO without teeth.
There are three high-level approaches to onboarding new services:
- Engage from the design stage, ensuring readiness out of the gate.
- Provide an pre-approved platform for building new services as a base.
- Onboard existing services.
Axes of concerns
These axes represent the areas of a service an SRE team will seek to improve:
- Architecture and inter-service dependencies
- Instrumentation, metrics and monitoring
- Emergency response
- Capacity planning
- Change management
- Performance: availability, latency and efficiency
SRE engagement can be seen as a progression as the process becomes embedded and reaches maturity.
SRE resources are constrained and not all services require SRE engagement due to lower availability/reliability constraints. It's possible to provide support for other means:
- Documentation for internal systems and best practices, maybe including worked examples or a production guide.
- Consultation on specific services or problem domains.