Back to Blog
Back to Blog

February 24, 2026

7 mins

Re-thinking Candidates Take-homes in the AI Era: Transcripts Over Code

When everyone can ship clean code with AI, the transcript of how they got there becomes a key signal.

Quentin Rousseau
Written by
Quentin Rousseau
Re-thinking Candidates Take-homes in the AI Era: Transcripts Over CodeRe-thinking Candidates Take-homes in the AI Era: Transcripts Over Code

As CTO at Rootly, I've had to fundamentally rethink how we evaluate product engineers. The skills that separate great candidates from good ones have shifted, not because the bar is lower, but because AI has changed what "doing the work" actually looks like.

What we used to look for

Our take-home evaluation used to be code-first. We'd look for:

  • Common pitfalls: Did the candidate handle edge cases? Did they avoid obvious anti-patterns? Were there SQL N+1 queries, missing error handling, or security oversights?
  • Going beyond the requirements: Did they just meet the spec, or did they push further? Input validation we didn't ask for, tests, accessibility considerations.
  • Code quality as signal: Naming conventions, file structure, separation of concerns. The code was the primary artifact, and we read it line by line.

The code told us a story about how someone thinks. For a long time, that was enough.

What changed

AI didn't lower the bar. It moved it.

When candidates have access to tools like Claude Code, Cursor, or Copilot, raw code output stops being the main differentiator. Someone with solid prompting skills can produce clean, well-structured code that passes tests, regardless of their seniority level.

How someone works with AI is now as revealing as the code they produce.

We started caring less about whether someone wrote every line themselves and more about:

  • Decision-making and tradeoffs: Why did they choose this architecture? What did they consider and reject? Can they articulate why they went with approach A over approach B?
  • Editing and taste: Did they blindly accept AI output, or did they shape it? Did they refactor what the AI gave them? Did they spot when the AI was wrong?
  • Systems thinking: Do they understand how the pieces fit together, or are they just stitching together generated snippets?
  • Behavioral signals: When they hit a wall, did they change approach or just re-prompt the same thing 15 times?

A concrete example: how our submission instructions evolved

Our take-home asks candidates to build a simplified version of Rootly's core workflow: a Ruby on Rails app with a Slack bot for managing incidents and a Web UI to view them. It's intentionally open-ended: basic requirements, room for candidates to make their own product and design decisions.

Here's a real before-and-after from our submission instructions.

Before, two deliverables: code and a Loom.

  1. Share a private GitHub repo with a clear README and instructions on how to run your app
  2. Record a Loom demo of what you've built:
    • Show the Slack bot and Web UI, and talk through your design and decision making
    • Show the code paths and how it works

After, three deliverables: code, AI transcripts, and a Loom.

  1. GitHub Repo: Share a private repo with a clear README and instructions on how to run your app.
  2. AI Transcripts: Include your AI session transcripts in the repo. This helps us understand how you work with AI.
  3. Loom Demo (under 5 minutes): Walk us through what you built:
    • Demo the Slack bot and Web UI end-to-end.
    • Show key code paths and talk through your design decisions and tradeoffs.

AI transcripts went from not existing to being the second deliverable, right between the code and the demo. Not buried in a footnote, not an optional nice-to-have. A first-class submission artifact, because it tells us more about how someone engineers than the code itself does.

We also explicitly tell candidates in the challenge notes: "Using AI is expected and strongly encouraged." We're not testing whether they can code without AI: we're testing whether they can build great software with it.

Why AI transcripts matter

Reading someone's AI transcript is like watching them think out loud:

  • Problem decomposition: Did they break the problem into clear steps, or dump the entire spec into a prompt and hope for the best?
  • Course correction: When the AI went in the wrong direction, how quickly did they catch it? Did they course-correct with precision, or start over?
  • Domain understanding: Do their prompts show they understand what they're building, or are they just relaying instructions they don't fully grasp?
  • Iteration quality: Are they refining and improving, or stuck in a loop?

We recently reviewed two submissions with nearly identical code quality: clean Rails, solid test coverage, working Slack integration. If we'd evaluated them the old way, just reading the code, it would have been a coin flip.

The transcripts told a completely different story. One candidate broke the problem into clear phases, asked the AI targeted questions about Slack's API edge cases, and course-corrected when the generated code didn't handle webhook retries properly. The other pasted the entire spec into a single prompt, accepted the output mostly as-is, and when something broke, re-prompted with "fix it" until it worked.

Same output. Completely different engineer.

We're not the only ones thinking this way

In a recent Y Combinator Light Cone episode, Boris Cherny, the creator of Claude Code, was asked if he'd ever hire someone based on their AI coding transcript:

"You can figure out how someone thinks — whether they're looking at the logs or not, can they correct the agent if it goes off the rails, do they use plan mode, when they use plan mode do they make sure that there are tests… do they think about systems? Do they even understand systems? There's just so much embedded in that."

He also shared an example that stuck with me: an engineer on his team, instead of just implementing a new feature, first built a tool that lets the AI test arbitrary tools, then had the AI write its own tool using it. Leveraging AI as a force multiplier rather than a code generator. That's exactly the signal we're looking for.

Boris made a broader point about hiring that goes well beyond AI tooling:

"The biggest skill is people that can think scientifically and think from first principles… I sometimes ask 'what's an example of when you were wrong?' You can see if people can recognize their mistake in hindsight, if they can claim credit for the mistake, and if they learned something from it."

How someone approaches problems, handles failure, and iterates has always been a core dimension of engineering ability. AI just made it directly observable.

The new scorecard

Here's roughly how our evaluation emphasis has shifted:

Dimension Before Now
Code correctness High weight Still matters, but table stakes
Code style / structure High weight Medium, AI normalizes this
Going beyond requirements High signal Still high signal
Architecture decisions Evaluated in code review Evaluated in Loom + transcript
Problem-solving approach Inferred from code Directly observed in transcript
AI fluency N/A New dimension, high weight
Debugging & resilience Inferred from code Directly observed in transcript
Speed / velocity Hard to measure in take-home Observable in transcript timestamps
Scope management Did they over/under-build? Can see exactly when they chose to stop or expand
Testing strategy Did tests exist? Did they drive testing or just let AI generate them?
Error recovery Only saw the final result Can see how they reacted when things broke
Product instinct Inferred from UI/UX polish Visible in prompts: did they think about the user?

The weightings haven't just shifted: we've added entirely new dimensions that didn't exist before.

What hasn't changed

Some things hold regardless of tooling:

  • Product thinking: Does the candidate care about the user experience, or did they just build what the spec said? Do they think about the "why" behind the feature?
  • Communication: Can they explain their decisions clearly in the Loom? Do they anticipate questions?
  • Taste: Hard to quantify, but you know it when you see it. Does the solution feel considered and polished, or like the first thing that worked?

Looking ahead

The industry hasn't converged on how to evaluate engineers who work with AI. Most interview processes are either pretending AI doesn't exist or banning it entirely. Neither makes sense.

AI is how engineers work now. The question isn't whether candidates use it: it's whether they use it well.

At Rootly, we'd rather see a candidate who uses AI thoughtfully and ships something great than one who writes every line by hand and ships something mediocre. The craft isn't gone. It's evolved. And our evaluation has to evolve with it.