What a good on-call migration looks like

Andre Yang

February 12, 2026

What a good on-call migration looks like

Getting the data across is the easy part. The teams that come through a migration in good shape also plan for the edge cases an importer skips and the people who have to trust the new pager. Here is how we handle all three, with you.

Every migration comes down to a single moment you cannot rehearse; the first real page after cutover. Either it reaches the right engineer on the new system, or it does not. Everything that happens before that moment exists to make sure it does.

We have run that moment thousands of times, and we learned years ago that the migrations that go well split from the ones that go badly on a few decisions made right at the start. The teams that come out in good shape do three things, with a partner who has done it before. They get the data right, they handle the edge cases importers quietly skip, and they get their people ready. Most of the attention goes to the first, the trouble almost always lives in the other two.

Here is how we work through all three.

We start by showing you what we found.

Before anything moves, we run a full read-only assessment of your PagerDuty or Opsgenie tenant and hand you a findings report, not a pitch. A clear map of what you have, so the decisions that follow are made by both of us with the same picture in view.

Here is what one looked like on a recent enterprise migration: Tens of thousands of contact methods and notification rules, more than ten paging touchpoints per responder. This is a heavily configured org by any measure.

Then the two numbers that told us how hard the job would be: zero duplicate policy names, and zero custom event-routing rules. Duplicate names mean rename work before anything can import. Custom event rules are the single largest source of migration risk at this scale. This tenant had neither, and more than 95% of the footprint was importable on day one.

A clean tenant gets a short report, a messy one gets a longer one. Either way, you see the true shape of the project before you commit to a date, which is what lets you set expectations with your own leadership without guessing.

The work that needs judgment is the long tail.

The bulk import is rarely where migrations get hard, the long tail is.

On that same tenant, 49 schedules out of more than 4000 used sub-24-hour rotations frequencies that required special handling. That is 1.3% of the schedules, and it is the kind of detail that slips through and resurfaces as a dropped page three weeks later. We have proven our logic from hundreds of enterprise migrations, and we prove it on real data during your trial before it runs anywhere near production.

A few more points from similar sized audits, because the pattern is the point:

318 services ran business-hours or dynamic urgency logic tied to a timezone. We map those to dynamic escalation paths and check the behavior service by service with the team that owns them. We do not put timezone logic through a standard import and hope.
632 integrations used formats the standard path does not cover. Each one gets mapped to the new schema and confirmed with you after import.
27 legacy rulesets had names like "test". Almost all of it was dead code. The right move is to keep what actually matters and let the rest go, not lovingly recreate someone's abandoned experiment.

None of this surprises us. The edge cases have names, counts, and a plan before the migration begins. That is what experience buys you. Not the absence of hard parts, but the absence of nasty ones.

Getting your people ready matters as much as getting the data right.

Here is the part that has nothing to do with data fidelity, and it is the one we watch most closely: whether your responders installed the new app and confirmed how they will be reached before cutover.

It is the most common issue across every migration we run. The schedules can be perfect and the escalation paths flawless, and a page still goes nowhere because the on-call engineer never opened the app, so it waits in a notification tray they will not check until morning. That is not a data problem. It is a human one, and it does not solve itself.

So we treat responder readiness as a real workstream, with the same care as the data. Two communication waves before cutover. In-product nudges, 30-minute responder training that respects people's time, and an admin dashboard that tracks install and verification per person, so you know exactly who is covered before you switch over, and so no engineer finds out the hard way.

Your data we can validate in a script. Your people we get ready alongside you. Both have to be true the first time a real page fires.

Something unexpected will surface, and we plan for that too.

No inventory catches everything. On a tenant of any real size, something eventually turns up that nobody documented, like a phone-routing setup PagerDuty's API will not export, so it has to be rebuilt by hand; a maintenance window someone scheduled two years ago and forgot; an escalation step that quietly leans on a webhook firing into an internal endpoint nobody owns anymore.

It happens on every migration, and pretending otherwise would do more to break your trust than the surprise itself. What we can control is when these things surface, and who is standing next to you when they do. We run the trial on two or three teams first, on real traffic, so the unknowns show up there, on a small surface, while PagerDuty or Opsgenie is still carrying the load. When one appears, it is ours to solve with you, not a ticket you file and wait on.

You set the pace, and nothing is one-way.

The rollout is phased and reversible, which matters as much to the person signing off as to the engineer carrying the pager. PagerDuty or Opsgenie stays live in parallel the whole time. There is no single moment where everything rides on Rootly being perfect, and each team unplugs the old setup when it is ready, not when a date forces it.

The high-level shape of it:

We scan and audit your environment: You provide a read-only key to start. We scan and automatically audit everything from schedules, escalation policies, orchestration, automated actions, integrations, and unknown unknowns that have long been forgotten about. You get a full assessment and findings report to make informed decisions with the inventory in front of you.‍
You get a tailored plan that starts with a trial. Based on the audit we create a tailored migration guide, complete with complexity, timelines, and phases that run in parallel; account setup, alerting sources, routing, integrations, onboarding, and final go-live. ‍
We migrate you: We work side-by-side with two or three teams of mixed complexity first. We test everything, trial live traffic, then go-live. We work with you to cutover the remaining teams in scheduled batches, under an hour each, fully trained end-to-end. By now the hard parts are solved, so this phase stays boring on purpose.

PagerDuty gets turned off when your team's trust the page.

‍We’re there for you through the end. A dedicated implementation engineer and a CSM stay with you through the cutover and well into you being a happy customer, with a shared Slack channel and an urgent escalation path. We do not call the migration done when the data lands. We call it done when your team trusts the pager. And even then we’re still here for you day and night.

Check out the PagerDuty and Opsgenie migration docs for more detail.

We have done this enough to mean it.

This confidence is not optimism. It comes from having met the things that break migrations at this scale, named them, and built a plan for each one before the work starts. We have run this across thousands of schedules and escalation policies, in tenants as large as 10,000+ responders, with the messiest configuration you can imagine.

The teams who come through it do not spend the next quarter cleaning up. They get the value they moved for. As Geoff Powell, Senior Technical Manager for SRE at Tripadvisor, put it: "Rootly has been our best investment in terms of ROI."

A migration off your on-call system is never small, and we will not pretend it is. What we will promise is that you will not run it alone, you will see every hard part before you reach it, and the old system stays live until Rootly has earned your team's trust. That is the whole job. It is the part we have done over and over, and it is the part we will do with you.

Book a free migration assessment and we will show you what is in your tenant, then walk through it together.