Alerting as Code: How Mistral AI Uses Terraform as the Source of Truth
A Terraform-first model for deterministic alerting in AI systems
.png)


Before I stumbled into the tech industry (a story for another day), I spent several years in the customer service world as a server and front of house manager in restaurants. It was in these jobs that I first honed some critical skills that would later lead me on the path to incident response. In this article, I draw comparisons between life in the service industry and IT incident response.

Before I stumbled into the tech industry (a story for another day), I spent several years in the customer service world as a server and front of house manager in restaurants. It was in these jobs that I first honed some critical skills that would later lead me on the path to incident response. In this article, I draw comparisons between life in the service industry and IT incident response.


When incidents reach a heightened level of complexity and scale, Strong argues that companies ought to consider having multiple lead roles present, rather than a single Commander overseeing the entire response. In this post, he breaks down when and how he recommends you consider bringing additional command roles in.

When incidents reach a heightened level of complexity and scale, Strong argues that companies ought to consider having multiple lead roles present, rather than a single Commander overseeing the entire response. In this post, he breaks down when and how he recommends you consider bringing additional command roles in.


Status pages are a simple yet underutilized element of incident communication. Done well, they’re a low-lift way to keep your customers and stakeholders informed when incidents impact them. But without a solid approach, updating status pages can easily become a tedious and often neglected task during incidents. In this post, we’ll cover some tips to get your status page right.

Status pages are a simple yet underutilized element of incident communication. Done well, they’re a low-lift way to keep your customers and stakeholders informed when incidents impact them. But without a solid approach, updating status pages can easily become a tedious and often neglected task during incidents. In this post, we’ll cover some tips to get your status page right.


For many people, the first and only times they interact with Executives is during an incident. It can be an intimidating first introduction! While Execs are first and foremost just people too, they tend to require some specific care when it comes to communication, especially when it involves issues that critically impact your business and customers. In this post, we’ll cover the best practices for communicating effectively with Executives during incidents.

For many people, the first and only times they interact with Executives is during an incident. It can be an intimidating first introduction! While Execs are first and foremost just people too, they tend to require some specific care when it comes to communication, especially when it involves issues that critically impact your business and customers. In this post, we’ll cover the best practices for communicating effectively with Executives during incidents.


In this guest post, Rohit Ghumare explores the most crucial trends for resiliency in 2023 – from automated incident management and real-time analysis to cloud-native services and human factors driving secure, collaborative workflows. By incorporating these cutting-edge approaches into your software development processes, you'll position your organization for long-term success.

In this guest post, Rohit Ghumare explores the most crucial trends for resiliency in 2023 – from automated incident management and real-time analysis to cloud-native services and human factors driving secure, collaborative workflows. By incorporating these cutting-edge approaches into your software development processes, you'll position your organization for long-term success.


We’re proud to share that we've been recognized as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter in G2 Summer 2023 Report! In total, Rootly received nine G2 awards in the Summer Report.

We’re proud to share that we've been recognized as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter in G2 Summer 2023 Report! In total, Rootly received nine G2 awards in the Summer Report.


Hans Chung refers to the tendency for SREs to independently zoom in on one task or problem at a time, and the consequences that come with it, as the “solo hero pattern”. In this post, he explores some of the reasons it happens, and what SRE leaders can do about it.

Hans Chung refers to the tendency for SREs to independently zoom in on one task or problem at a time, and the consequences that come with it, as the “solo hero pattern”. In this post, he explores some of the reasons it happens, and what SRE leaders can do about it.


Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. So what do you do when someone else's incident becomes your problem? It’s probably not realistic to completely eliminate third-party dependencies, but there are things you can do to enhance your resilience against third-party failures and maintain trust with your customers when outages out of your control impact them.

Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. So what do you do when someone else's incident becomes your problem? It’s probably not realistic to completely eliminate third-party dependencies, but there are things you can do to enhance your resilience against third-party failures and maintain trust with your customers when outages out of your control impact them.


Rootly has already helped companies manage 60,000+ incidents and we are just getting started! We are on a mission to make reliability every company’s superpower.

Rootly has already helped companies manage 60,000+ incidents and we are just getting started! We are on a mission to make reliability every company’s superpower.