PagerDuty
PagerDuty is an incident management application which ingests data about service health and on-call rotas and routes notifications to responders.
Concepts
- On-call Schedules represent individual responders' posts. They're comprised of multiple levels, allowing overrides to work around responders' schedules. The
- Escalation Policies combine on-call schedules (preferred) or specific users to form a rota.
- Services represent the technical services, e.g. microservices or applications, against which incidents may be raised.
- Teams allow grouping on-call schedules, escalation policies and services and delegating their management to the staff that own them.
- Integrations can be considered triggers for incidents against services.
- Business Services can be used to group services into user-facing services or by key workflows to help business stakeholders contextualise incidents.
- Response Plays allow automation of common activities within an incident response. These can be executed automatically upon incident creation or manually with a couple of taps in the mobile or web apps.
- Alerts are created to represent events raised in integrations.
- Incidents consolidate one or more incidents and organises the response.
- Rulesets allow routing of incoming events, received via a webhook, to specific services.
Searching for services by integration key
To find the service with the integration key 6f5902ac237024bdd0c176cb93063dc4
, search the Service Directory for key:6f5902ac237024bdd0c176cb93063dc4
.
Triggering incidents
Incidents can be triggered via the API or email by integrations or manually via the PagerDuty web and mobile apps.
API
PagerDuty has two APIs for submitting events:
Webhooks
There are three webhooks APIs:
- Webhooks v1
- Webhooks v2 is the current stable version.
- Webhooks v3 is currently in early access.
Integrations
The email integration gives an email address (NAME@TENANT.pagerduty.com
) to which emails can be sent. The sender and body of the email are available as properties for event routing.
Regular expressions can be used to accept only messages matching a specific pattern, and deduplication is possible by extracting an identifier from the message body using a capture group.
Opened incidents can be automatically closed using custom rules which match against the message subject and body.
AWS CloudWatch
The Message
property of each event must contain a JSON-encoded object in order for this integration to trigger.
- Create an SNS topic.
- Create a subscription to the SNS topic using HTTPS and the PagerDuty enqueue URL as the Protocol and Endpoint respectively. Enable raw message delivery should be disabled.
- To enable
ALARM
notifications create a CloudWatch Alarm, configuring notifications to be sent to the SNS topic. - To enable
OK
notifications create an additional notification action for theOK
state.
Event rules
Event rules can be configured either locally (per-service) or globally:
- Global Rulesets (formerly Global Event Rules) allow routing events to multiple destination services based upon events matching conditions, the time of day
- Service Event Rules
Backlinks