PagerDuty

PagerDuty is an incident management application which ingests data about service health and on-call rotas and routes notifications to responders.

Concepts

  • On-call Schedules represent individual responders' posts. They're comprised of multiple levels, allowing overrides to work around responders' schedules. The
  • Escalation Policies combine on-call schedules (preferred) or specific users to form a rota.
  • Services represent the technical services, e.g. microservices or applications, against which incidents may be raised.
  • Teams allow grouping on-call schedules, escalation policies and services and delegating their management to the staff that own them.
  • Integrations can be considered triggers for incidents against services.
  • Business Services can be used to group services into user-facing services or by key workflows to help business stakeholders contextualise incidents.
  • Response Plays allow automation of common activities within an incident response. These can be executed automatically upon incident creation or manually with a couple of taps in the mobile or web apps.
  • Alerts are created to represent events raised in integrations.
  • Incidents consolidate one or more incidents and organises the response.
  • Rulesets allow routing of incoming events, received via a webhook, to specific services.

Searching for services by integration key

To find the service with the integration key 6f5902ac237024bdd0c176cb93063dc4, search the Service Directory for key:6f5902ac237024bdd0c176cb93063dc4.

Triggering incidents

Incidents can be triggered via the API or email by integrations or manually via the PagerDuty web and mobile apps.

API

PagerDuty has two APIs for submitting events:

Webhooks

There are three webhooks APIs:

Integrations

Email

The email integration gives an email address (NAME@TENANT.pagerduty.com) to which emails can be sent. The sender and body of the email are available as properties for event routing.

Regular expressions can be used to accept only messages matching a specific pattern, and deduplication is possible by extracting an identifier from the message body using a capture group.

Opened incidents can be automatically closed using custom rules which match against the message subject and body.

AWS CloudWatch

The Message property of each event must contain a JSON-encoded object in order for this integration to trigger.

  1. Create an SNS topic.
  2. Create a subscription to the SNS topic using HTTPS and the PagerDuty enqueue URL as the Protocol and Endpoint respectively. Enable raw message delivery should be disabled.
  3. To enable ALARM notifications create a CloudWatch Alarm, configuring notifications to be sent to the SNS topic.
  4. To enable OK notifications create an additional notification action for the OK state.

Event rules

Event rules can be configured either locally (per-service) or globally:

  • Global Rulesets (formerly Global Event Rules) allow routing events to multiple destination services based upon events matching conditions, the time of day
  • Service Event Rules

Backlinks