x


Pedro Espindula
,
SRE
Cora is a Brazilian fintech built for small and medium businesses, giving entrepreneurs one place to manage cash flow, issue and manage receipts and boletos, and send payment links, so they can focus on their company instead of their finances.
Founded: 2019 in São Paulo, Brasil
Size: ~350 employees
Rootly’s Impact
Hours → minutes
time-to-acknowledge
390% increase
in acknowledgement rate
<1 month
implementation
Before Rootly, it could take most of a day for Cora's engineers to find out that something was broken. Time to acknowledge a critical alert ran hours, alerts piled up in a Slack channel nobody trusted, and only about one in five was ever acknowledged at all. Today, acknowledgement takes less than ten minutes. This is how a new SRE team at a central-bank-regulated fintech closed an hours gap down to minutes, after evaluating nearly every alerting tool on the market, choosing Rootly, and standing the whole thing up in code in less than a month.
"Our time to acknowledge a critical alert went from hours to minutes. We evaluated every alerting tool out there, and nothing else matched Rootly on ease of use, scale, or price." -Pedro Espindula, SRE
Cora's SRE team is new, and it inherited an on-call setup that predated it. The team ran Grafana OnCall, the open-source version, on a legacy configuration built at Cora before the team even existed. On paper, Grafana OnCall had the right pieces; alerts flowed in from Datadog, Grafana, Mimir, and Superset, reached a team with a schedule, and landed in a Slack channel where they could be acknowledged and resolved. In practice, it had stopped working as a system.
The alerts were noisy. As Fabiana Carvalho put it, many of them weren't flagging anything specific or useful, they were just noise, dumped into Slack where, in Pedro Espindula's words, nobody acted on them. There was no standard for how someone got notified if an alert went unacknowledged, and no standard for who to escalate to. Teams could change their own escalation policies, and they did, often, so the team had no certainty that an alert would actually reach the person on call. The numbers told the story. Acknowledgment rates sat around twenty percent, and time to acknowledge a critical alert could stretch to tens of hours. It could take most of a day just to learn that something was wrong.
Two structural problems made it worse. The configuration was hard to manage as code, because Grafana's Terraform provider wasn't strong, and the team hit a wall trying to Terraform schedules and escalation policies. And then Grafana announced it was discontinuing the open-source OnCall product. The tool they were already fighting was going away.
With Grafana OnCall being retired, the team ran a genuinely thorough evaluation. They looked at PagerDuty, Opsgenie, Zenduty, incident.io, Grafana's own IRM, other SaaS alerting tools, and open-source replacements. Opsgenie, Pedro noted, was itself being folded into Jira ITSM, which he didn't rate as highly. They talked to Grafana IRM directly and walked away mainly over support. incident.io needed workarounds almost out of the gate. In the end, none of the options did what Rootly did at the price Rootly did, and for Cora's workflow Rootly came out ahead.
Then Terraform did the heavy lifting. Because Rootly's configuration is fully manageable as code, the team stood up every integration, schedule, and escalation policy in about a month. Now when a new team comes on, it takes a couple of hours for Terraform to sync with their data sources and build the structure automatically. The genuinely slow part of the rollout, Pedro is clear, isn't the tooling at all, it's the people process; getting teams to install the app and register their schedules. Cora ran it as a canary rollout, engineering teams first, then data teams, then everyone else.
"Terraform is the core of how we operate, and it's why we moved so fast. With Rootly, we Terraformed every integration, schedule, and escalation policy in less than two weeks. Now when a new team comes on, it takes minutes to sync and stand up their whole structure automatically." -Pedro Espindula, SRE
This is the headline, and it's rare to be able to state it so plainly because Cora is a team that tracked the numbers thoroughly. Time to acknowledge a critical alert dropped to minutes. The acknowledgement rate climbed dramatically. The noise that used to bury real signals has been standardized into routing the team can trust, which is what makes those numbers move. When the right person is reliably paged, alerts actually get acknowledged.
Terraform isn't a side feature for this team, it's the core of how they operate, and it's the direct answer to one of their biggest frustrations with Grafana, whose Terraform provider couldn't cleanly handle their schedules and escalation policies. With Rootly, the entire on-call configuration lives in code. Every integration, schedule, and escalation policy, stood up less than a week and reproducible in minutes for each new team. For an SRE org that wants its reliability infrastructure version-controlled and consistent, that's the difference between standardizing on purpose and hoping each team configures itself correctly.
Cora runs in a regulated payments environment. Pix availability is mandated by Brazil's central bank at 99.5 percent, and the team holds itself to availability targets across its products. In that context, the old reality, where any team leader could quietly change an escalation policy and break the routing, was a real risk. Rootly replaced ad hoc routing with a standard the team controls, so the certainty that an alert reaches the right responder is built in rather than left to chance.
For Pedro and the team, this is the part worth underlining. Core features like escalation policies and schedules exist in other tools, he acknowledges. What he hasn't found elsewhere is the set of features this polished, and the support behind them. Cora walked away from Grafana IRM primarily over support, and with Rootly they've had the opposite experience. He calls it “the best vendor support they've ever had”, and points to opening a feature request and seeing it shipped by the end of the same week. That responsiveness, more than any single capability, is what he names as the biggest game changer.
“Core features we have in Rootly could technically be found somewhere else. The polish and support is what can't be." -Fabiana Carvalho, SRE
When a team can tell you exactly what changed, in numbers, it usually means the change was real. Cora went from noisy alerts nobody trusted and a twenty percent acknowledgment rate to standardized, code-managed on-call with critical-alert acknowledgment measured in minutes instead of hours. They got there by evaluating the entire market, choosing Rootly on ease of use, scale, and price, standing it up fast with Terraform, and leaning on support that ships. For a regulated financial institution where a missed page is a regulatory and customer problem, that's not a tooling upgrade. It's the difference between hoping the right person sees the alert and knowing they will.
More customer stories