Alerting as Code: How Mistral AI Uses Terraform as the Source of Truth
A Terraform-first model for deterministic alerting in AI systems
.png)


Learning expert Sorrel digs into how stress inhibits our ability to learn, and what we can do about it.

Learning expert Sorrel digs into how stress inhibits our ability to learn, and what we can do about it.
.png)

Discover the essential SRE tools for monitoring, incident management, automation, and more!
.png)
Discover the essential SRE tools for monitoring, incident management, automation, and more!


Measure what matters, not what is easier. Learn tips to untangle the different common metrics used by SREs.

Measure what matters, not what is easier. Learn tips to untangle the different common metrics used by SREs.


Your on-call management software can make or break your reliability story. Find out which boxes your on-call solution should be checking for you.

Your on-call management software can make or break your reliability story. Find out which boxes your on-call solution should be checking for you.


Discover the best on-call scheduling strategies for SREs in 2024

Discover the best on-call scheduling strategies for SREs in 2024


Minimize alert fatigue by distributing incoming alerts evenly across responders with a Round Robin schedule. This strategy comes in two variations and can benefit some teams more than others.

Minimize alert fatigue by distributing incoming alerts evenly across responders with a Round Robin schedule. This strategy comes in two variations and can benefit some teams more than others.


What should you measure and how ? Industry experts weight in sharing insights from their experience leading engineering organizations at scale.

What should you measure and how ? Industry experts weight in sharing insights from their experience leading engineering organizations at scale.


Discover how Google is optimizing for accuracy in its AI strategy, while Meta strives to expand its response capabilities through machine learning.

Discover how Google is optimizing for accuracy in its AI strategy, while Meta strives to expand its response capabilities through machine learning.
.png)

We recently spoke to Google's Reliability Advocate, Steve McGhee, in our Humans of Reliability interview series. In addition to his interesting anecdotes on the early days of SRE at Google, and his journey to becoming a Reliability Advocate, he also shared a handful of his favorite SRE resources, which we compiled here into a list.
.png)
We recently spoke to Google's Reliability Advocate, Steve McGhee, in our Humans of Reliability interview series. In addition to his interesting anecdotes on the early days of SRE at Google, and his journey to becoming a Reliability Advocate, he also shared a handful of his favorite SRE resources, which we compiled here into a list.