startups
incident-response-microsoft-teams
home
jsm-jira-vs-rootly-full-comparison
pagerduty-vs-rootly-on-call
opsgenie-vs-rootly-on-call
humans-of-reliability
retrospectives
integrations
incident-response-slack
on-call
blog
ai-sre
pricing
customers
Download PNG
Download SVG
Build richer alert workflows with full resolution context.
The Unofficial KubeCon EU '26 SRE Track
Sometimes, as these 4 incidents highlight, major failure results from a mere typo or configuration oversight.
JJ Tang
Let's all face it, on call work isn't fun. But it can be better. Even if you have to work on call, it would be nice to have at least some of the work done for you, before you drag yourself out of bed at 3am to respond to an incident.
How can creating chaos achieve better reliability? Chaos and reliability might seem mutually exclusive, but through the use of Chaos Engineering, SREs can bring about meaningful changes to system resiliency.
The Suez Canal has been big news over the last couple of weeks. We wondered how a Site Reliability Engineer (SRE) might conduct a postmortem on what happened with the Ever Given, and what that might mean if a comparable incident occurred at a modern tech company.