An Introduction to Incident Response Roles
Learn about the key roles within an incident response team, as well as optional incident roles you may not have thought about.
What Managed Kubernetes Service is Best for SREs?
A comparison of EKS, AKS, GKE, Rancher and OpenShift from an SRE’s perspective.
What SREs Can Learn from Facebook’s Largest Outage
An SRE’s analysis of the October 2021 Facebook outage.
Google’s State of DevOps 2021 Report: What SREs Need to Know
The four key takeaways for SREs from Google’s State of DevOps 2021 report
SRE vs. DevOps: What are the Differences?
SRE and DevOps are closely related concepts, and many businesses can benefit from embracing both of them. Nonetheless, there are important distinctions between SRE and DevOps.
What is an SRE?
A comprehensive definition of SREs and Site Reliability Engineering, including what SREs do and what makes SREs different from other roles.
The Role of SREs in Observability
Although conversation about observability often ignores SREs, SREs have a central role to play in observability success.
You Do the Math: Reliability Issues Triggered by Math Errors
Even seemingly minor math bugs in software code can have outsize consequences.
Making Your On-call and Incident Management Program Stick
Maintenance of your incident management practice is as important as creation - find out what you can do to keep your engineering organization strong and consistent year over year.