Incident management best practices, guides, and product updates from Rootly
What Does AIOps Mean for SREs? It’s Complicated.
AIOps can bring some value to SREs, but it’s important to maintain healthy perspective about the limitations of AIOps.
What SREs Can Learn from Capt. Sully: When to Follow Playbooks
Does it always make sense to stick to your playbooks? There’s no clear answer, but it’s still something you should think about.
Why and How SREs Can Benefit from Feature Flags
An overview of how SREs can benefit from feature flags to improve reliability.
Top 9 Skills for SREs from ex-Instacart SRE
A list of the top nine SRE skills, from incident management, to cloud computing, to networking and beyond.
Importance of Good Incident Communication
From alerting to during to post incident, great communication is the key to effective incident response.
Analyzing SRE Job Postings - From Amazon to Microsoft
An analysis of SRE job descriptions from 4 companies highlights what businesses actually expect SREs to do.
A Primer on the History and Evolution of Incident Management to Today
Many of the concepts SREs take for granted about incident management originated with efforts to fight fires in California in the 1970s.
What Log4j Vulnerability Means for SREs?
A summary of the Log4j vulnerability, and key takeaways for SREs.