Blog

Incident management insights, guides, and product updates from Rootly

Search...
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Round Robin escalation policies: do's and don'ts

Round Robin escalation policies: do's and don'ts

Minimize alert fatigue by distributing incoming alerts evenly across responders with a Round Robin schedule. This strategy comes in two variations and can benefit some teams more than others.

Jorge Lainfiesta

Jorge Lainfiesta

July 9, 2024
7 mins
Measuring developer productivity IRL: practical tips for platform engineers

Measuring developer productivity IRL: practical tips for platform engineers

What should you measure and how ? Industry experts weight in sharing insights from their experience leading engineering organizations at scale.

Jorge Lainfiesta

Jorge Lainfiesta

July 5, 2024
5 mins
How Meta and Google use AI to improve incident response

How Meta and Google use AI to improve incident response

Discover how Google is optimizing for accuracy in its AI strategy, while Meta strives to expand its response capabilities through machine learning.

JJ Tang

JJ Tang

July 2, 2024
6 mins
The Top Resources for Site Reliability Engineers in 2024

The Top Resources for Site Reliability Engineers in 2024

We recently spoke to Google's Reliability Advocate, Steve McGhee, in our Humans of Reliability interview series. In addition to his interesting anecdotes on the early days of SRE at Google, and his journey to becoming a Reliability Advocate, he also shared a handful of his favorite SRE resources, which we compiled here into a list.

Jorge Lainfiesta

Jorge Lainfiesta

June 21, 2024
5 min
How Wealthsimple uses Rootly to create a culture of wellness and psychological safety

How Wealthsimple uses Rootly to create a culture of wellness and psychological safety

"Our goal is to make it easy for employees to come in and run an incident without needing deep technical knowledge about the system. Rootly has made this easier by allowing us to automate a lot of the “hand-holding" someone needs when they’re first navigating an incident."

Rootly & Wealthsimple

Rootly & Wealthsimple

June 11, 2024
5 min
What is ‘Incident Overhead’ and why does It matter?

What is ‘Incident Overhead’ and why does It matter?

Not all incidents are created equal. Thus, trying to fit all the possible inputs an incident declaration may need in a single form can slow down responders and impact your data quality.

Jorge Lainfiesta

Jorge Lainfiesta

June 5, 2024
4 mins
What we can learn from Google’s UniSuper incident comms

What we can learn from Google’s UniSuper incident comms

Earlier this month, an inadvertent misconfiguration in an internal tool used by Google Cloud resulted in the deletion of a user’s GCVE Private Cloud. The user in question? UniSuper Australia — a $125 billion Australian pension fund with over 600,000 users. In this post, Ashley reflects on the communications shared and what we can learn from them.

Ashley Sawatsky

Ashley Sawatsky

May 30, 2024
11 mins
Remote Team Rotations: On-Call Across Timezones

Remote Team Rotations: On-Call Across Timezones

Use the different timezones and varied needs of your team to schedule on-call rotations that make everyone happy

Jorge Lainfiesta

Jorge Lainfiesta

May 3, 2024
5 min read
Just hired an SRE? Five onboarding tips

Just hired an SRE? Five onboarding tips

No matter how good a new teammate is, a lot of their success is in your hands.

Jorge Lainfiesta

Jorge Lainfiesta

April 24, 2024
4 min read