Behind the mind of a future thinking reliability expert

Designing reliability (and AI) for scale.

When you talk to Hannah Hammonds about reliability at Prolific, it becomes clear very quickly that she isn’t just “running incidents.” She’s designing how the entire organisation thinks about failure, response, and learning. And increasingly, how AI fits into that picture.

Hannah joined Prolific just over two and a half years ago as a service delivery lead and built the service delivery function from scratch. As Prolific grew, that function pivoted to service reliability and more recently the focus has been embedding SRE principles into service management rather than treating them as separate worlds. With more than a decade in service management and over 13 years in tech, she now owns a broad set of outcomes: SLAs and uptime, MTTR and incident response quality, on-call health and escalation design, and ongoing readiness for Cyber Essentials, SOC 2, and ISO 27001 audits. If it impacts customers or operational resilience, it’s in Hannah’s remit.

When “good” still felt like survival.

Before Rootly, a “great month” was defined by the basics: incidents were coordinated smoothly, people felt safe enough to raise incidents when something broke, communications didn’t slip, retros actually happened, and the team didn’t burn out from the same problems repeating. Even hitting that bar felt like pushing uphill. The process was heavily reliant on individuals doing the right thing under pressure, and there wasn’t enough time or structure left over to move from survival mode to a genuine gold-standard incident culture.

At the time, Prolific was using incident.io. It had helped them get started, but as Hannah tried to mature their approach, it began to feel like a ceiling. Different teams experienced different pain points (workflows felt limited, day-to-day usage wasn’t intuitive for everyone).

But the common thread was a lack of flexibility: it was difficult to customise the system around how Prolific actually worked.

The tool just didn’t give her the level of flexibility or depth needed to support the future she had in mind: a world where incident response was not only well-orchestrated, but also AI-assisted and reliability-driven by design.

The incident tooling pivot.

That future vision, including what would later become Rootly’s AI SRE, was one of the key reasons she decided they had to rethink their tooling altogether. Hannah wanted a platform that could go beyond logging and coordination to truly help teams detect, triage, and learn faster. She needed a foundation that could support automated guidance, richer use of observability data, and smart surfacing of runbooks and infrastructure context during the moments when engineers are under the most pressure.

The pain became hard to ignore: recurring incidents without a clear picture of why, missing context about which services were impacted and how often this was happening, and heavy manual effort just to keep the process running. Hannah stepped back and reframed the problem. She wasn’t just choosing another pager or incident dashboard; she was designing the backbone for Prolific’s reliability and AI-assisted incident strategy.

Choosing Rootly and betting on AI to root cause analysis.

She evaluated a range of options, including well-known tools like PagerDuty, and spent time thinking about what incident management should look like at Prolific in three to five years. That meant not just “does this work today?” but “will this still work when we’re routing more of the process through automation and AI?” She also knew any tool would fail culturally if it was seen as “just for engineering” or “just for service management.” It had to be something the entire company could see value in, engineers, managers, and stakeholders alike.

Quickly after discovering Rootly, Hannah realized the core product gave her what she’d been missing: highly configurable workflows, deep Slack integration, strong timelines and audit trails for compliance, and robust communications that elevated the way Prolific communicated both internally and externally. Just as importantly, Rootly had built, and was continuing building, the kind of AI-powered capabilities she had been imagining.

Ultimately, she led Prolific to displace their current incident response tool in favour of migrating to Rootly. It wasn’t a lateral move between similar tools; it was a strategic decision to partner with a platform—and a team—that matched her ambition for reliability and AI-assisted incident response. One she was not getting with the current incident response tool.

Building a repeatable system with Rootly.

The implementation with Rootly was intentionally full-stack. Hannah essentially rebuilt Prolific’s incident lifecycle: from the moment a system or someone raises an incident via a form, Rootly spins up the right channel, alerts the right people, starts building timelines, automates the root cause analysis, and scaffolds PIRs and follow-ups automatically. Training wasn’t kept inside service delivery; Hannah brought in engineering managers, business owners, and stakeholders early to ensure Rootly worked across the entire organisation.

That upfront investment paid off in consistency. Incidents no longer depended on who happened to be on call; they followed a clear, repeatable pattern. Alerts reached the right responders faster, time to acknowledge and resolve dropped, and automated communications reduced the risk of steps being missed when stress was high. Standardised workflows and templates made incidents feel predictable instead of ad hoc, which in turn made audits easier and gave leadership a clearer view into how the platform was being operated.

On-call life improved as well. Hannah helped design a primary and secondary escalation structure so no one is truly alone when something breaks. Rootly’s flexible escalation capabilities allow Prolific to bring in extra support with clear context—whether it’s pairing on a rollback or involving another team—and to communicate why the escalation is happening. Out-of-hours incidents have become more streamlined: alerts automatically create incidents, notify the right people, and ensure the critical steps are followed every time. Paired with a strong observability culture that treats false positives as opportunities to improve alerts, this has helped shift the mindset from “who’s on the hook” to “how do we make the system better.”

Going deeper than “human error”.

A big part of Hannah’s philosophy is that root cause analysis should be real, not a box-ticking exercise. At Prolific, they don’t stop at broad labels like “deployment” or “human error.” Once they know the category, they run a 5 Whys analysis and use post-incident reviews as collaborative sessions with engineers. The conversation is always anchored on how they could have detected the issue earlier, reduced its impact, and which improvements deliver the best value for the effort.

To make this sustainable, she helped design a workflow around Google Meet transcripts: when a PIR starts, the transcript is captured and automatically structured around their key focus questions. Engineers then refine this, which improves the accuracy of technical details and reduces the manual burden on the PIR lead. Lessons learned are distributed faster and feel more grounded in what actually happened during the incident.

Automated probable root cause.

The partnership deepened even further over the last six months through the development of Rootly’s AI SRE. For Hannah, this capability wasn’t a nice-to-have add-on; it was part of the original vision for why she and Prolific moved away from the competing incident tool. She needed incident response that didn’t just run smoother, but that actively helped teams think and act faster during high-pressure moments.

Rootly’s AI SRE connects to Prolific’s monitoring stack and analyzes relevant logs, alerts, and PRs, bringing probable root cause directly into the incident channel. In practice, that has shaved valuable minutes off the early triage window, cut down the time it takes to move from “something’s wrong” to “we know where to look,” and reduced the number of dead-end investigations. By bridging the gap with concrete next steps, and even drafting rollback pull requests when deployment issues are identified, Prolific sees incidents progress to meaningful mitigation faster and with fewer handoffs.

For Hannah’s teams, that has translated into quicker acknowledgements, more accurate first responses, and less variation between incidents. Outages that previously dragged on while people hunted for context now move more predictably through detection, triage, and rollback. Even when incidents are complex, Rootly’s AI SRE keeps everyone anchored on the same facts, which has helped reduce confusion, tighten alignment on the incident channel, and minimise unnecessary downtime.

This is especially powerful for out-of-hours support. When fewer people are online and context is thinnest, Rootly’s AI SRE helps close the gap, giving on-call engineers a structured set of steps so they’re not starting from a blank slate at 3 a.m. The result is faster stabilisation and fewer “wake the entire team” moments. Stakeholders benefit too: they can ask @Rootly what’s happening to get a clear, up-to-date answer without disrupting responders, which has boosted confidence in Prolific’s overall reliability process.

What’s important for Hannah is that Rootly’s AI SRE doesn’t replace their process; it reinforces it. The goal is for the incident lifecycle they’ve designed—from detection to response to learning—to run the same way every time, regardless of who is on call, how tired they are, or whether they happen to be the person who wrote the original runbook. By reducing cognitive load and automating the operational overhead, Rootly’s AI SRE frees engineers to focus on actually fixing the issue. That’s where Prolific has seen the biggest gains in speed, accuracy, and consistency.

Lessons in change and culture.

Looking back across the journey, Hannah is candid about the biggest “gotcha” she sees other teams fall into: assuming a tool alone will fix a messy process. Without a clear definition of workflows, ownership, and expectations, even the best platform will still produce chaotic incidents.

Her approach at Prolific has been to lead with understanding—listening to teams’ priorities and pain points, tying everything back to delivering a reliable service to customers, and positioning herself as an enabler rather than a gatekeeper. Clear communication, early involvement, and quick, visible wins are how she builds trust. Over time, those wins accumulate into lasting cultural change.

Today, Prolific has an incident-first culture, a deeply integrated Rootly deployment with AI-assisted incident capability in Rootly’s AI SRE that she and Prolific helped co-design. “Rootly has been more than a tooling upgrade. It’s a new operating model for how Prolific navigates failure—faster, with less friction, and with reliability treated as a shared responsibility rather than an afterthought.” said Hannah.

‍