

How we used CRDTs to build Real-Time Collaborative Retrospectives
A dive into how we built collaborative software with real-time features, AI assistance, and workflow automation with Y.js CRDTs).
As CTO at Rootly, I've had to fundamentally rethink how we evaluate product engineers. The skills that separate great candidates from good ones have shifted, not because the bar is lower, but because AI has changed what "doing the work" actually looks like.
Our take-home evaluation used to be code-first. We'd look for:
The code told us a story about how someone thinks. For a long time, that was enough.
AI didn't lower the bar. It moved it.
When candidates have access to tools like Claude Code, Cursor, or Copilot, raw code output stops being the main differentiator. Someone with solid prompting skills can produce clean, well-structured code that passes tests, regardless of their seniority level.
How someone works with AI is now as revealing as the code they produce.
We started caring less about whether someone wrote every line themselves and more about:
Our take-home asks candidates to build a simplified version of Rootly's core workflow: a Ruby on Rails app with a Slack bot for managing incidents and a Web UI to view them. It's intentionally open-ended: basic requirements, room for candidates to make their own product and design decisions.
Here's a real before-and-after from our submission instructions.
AI transcripts went from not existing to being the second deliverable, right between the code and the demo. Not buried in a footnote, not an optional nice-to-have. A first-class submission artifact, because it tells us more about how someone engineers than the code itself does.
We also explicitly tell candidates in the challenge notes: "Using AI is expected and strongly encouraged." We're not testing whether they can code without AI: we're testing whether they can build great software with it.
Reading someone's AI transcript is like watching them think out loud:
We recently reviewed two submissions with nearly identical code quality: clean Rails, solid test coverage, working Slack integration. If we'd evaluated them the old way, just reading the code, it would have been a coin flip.
The transcripts told a completely different story. One candidate broke the problem into clear phases, asked the AI targeted questions about Slack's API edge cases, and course-corrected when the generated code didn't handle webhook retries properly. The other pasted the entire spec into a single prompt, accepted the output mostly as-is, and when something broke, re-prompted with "fix it" until it worked.
Same output. Completely different engineer.
In a recent Y Combinator Light Cone episode, Boris Cherny, the creator of Claude Code, was asked if he'd ever hire someone based on their AI coding transcript:
"You can figure out how someone thinks — whether they're looking at the logs or not, can they correct the agent if it goes off the rails, do they use plan mode, when they use plan mode do they make sure that there are tests… do they think about systems? Do they even understand systems? There's just so much embedded in that."
He also shared an example that stuck with me: an engineer on his team, instead of just implementing a new feature, first built a tool that lets the AI test arbitrary tools, then had the AI write its own tool using it. Leveraging AI as a force multiplier rather than a code generator. That's exactly the signal we're looking for.
Boris made a broader point about hiring that goes well beyond AI tooling:
"The biggest skill is people that can think scientifically and think from first principles… I sometimes ask 'what's an example of when you were wrong?' You can see if people can recognize their mistake in hindsight, if they can claim credit for the mistake, and if they learned something from it."
How someone approaches problems, handles failure, and iterates has always been a core dimension of engineering ability. AI just made it directly observable.
Here's roughly how our evaluation emphasis has shifted:
The weightings haven't just shifted: we've added entirely new dimensions that didn't exist before.
Some things hold regardless of tooling:
The industry hasn't converged on how to evaluate engineers who work with AI. Most interview processes are either pretending AI doesn't exist or banning it entirely. Neither makes sense.
AI is how engineers work now. The question isn't whether candidates use it: it's whether they use it well.
At Rootly, we'd rather see a candidate who uses AI thoughtfully and ships something great than one who writes every line by hand and ships something mediocre. The craft isn't gone. It's evolved. And our evaluation has to evolve with it.