Automation adds value by removing toil:

  • Consistency, and reduced margin for error.
  • Providing a platform which can be applied to get the same benefits elsewhere.
  • Faster repairs, by reducing the MTTR.
  • Faster action, e.g. for failing over to an alternative region in response to a regional connectivity problem.
  • Time saving, freeing up humans to do more valuable work.

Use cases

  • Account creation;
  • Cluster turnup and turndown;
  • Software or hardware preparation and decommissioning;
  • Rollouts of new versions;
  • Runtime configuration changes; and
  • Changes to dependencies.

Automation classes

  1. No automation, e.g. manual regional failover of a database.
  2. Externally maintained system-specific automation, e.g. an SRE has a shellscript in their home directory.
  3. Externally maintained generic automation, e.g. the SRE adds database support to a generic failover script used elsewhere.
  4. Internally maintained system-specific automation, e.g. the database ships with its own failover script.
  5. Systems that don't need any automation, e.g. the database engine is self-healing and addresses the problem without human intervention.