Merlin: The Tinder® Harness Around AI Coding Agents

Mayank Pahadia

Staff Software Engineer, Backend, Tinder

Share this post

AI coding agents can write code. Merlin turns that capability into a reliable workflow: planning the work, verifying changes, asking for help when needed, and producing results people can review.

The hardest part of building an AI engineering agent is not getting it to write code. It is getting it to stop, ask for help, and prove that the work actually does what the ticket asked.

At Tinder®, Merlin started from a practical observation: AI agents were getting better at individual coding tasks, but our real bottleneck was the workflow around them. A useful agent needs more than code access. It needs the ticket context, the product intent, the repository conventions, the right checks for the platform, a clear approval point, and evidence it can bring back to the user. Merlin is the system we built to connect those pieces into one request-to-review workflow.

Merlin is the Tinder-specific harness around AI coding agents: the interface, workflow state, approval gates, tool orchestration, pull request workflow, and guardrails that help us get useful, reviewable work out of the best available underlying AI tools.

What Makes Merlin Magical

We named it Merlin because the role felt obvious: an engineering wizard that helps teams turn intent into working software. But the "magic" is not that Merlin silently writes code and disappears. The real trick is that it knows how work moves through Tinder.

Merlin knows that a ticket needs a plan before implementation. It knows that a design-heavy change needs different evidence than a backend cleanup. It knows that a draft pull request should arrive with the trail a reviewer needs: what Merlin understood, what it changed, how it verified the result, and where user judgment is still required. It knows when to ask for approval, and it knows when uncertainty is high enough that continuing would create the wrong kind of confidence.

Merlin's job is to make the engineering workflow explicit enough for an AI agent to participate in it. It turns a broad request into a scoped plan, turns failures into actionable feedback, and turns a code change into a reviewable handoff.

The Work Around The Work

Every user of agentic coding tools has felt the same tension. The tool can be fast, but the user still manages the shape of the work:

Ask for a plan.
Check that the plan understood the product requirement.
Add the repository-specific constraints.
Tell it which checks matter.
Ask it to try again when the build fails.
Inspect the final change set carefully, because a confident answer is not always a correct one.

That workflow is useful, but it does not scale cleanly. Every ticket starts from scratch, and the user becomes the scheduler, reviewer, test runner, and memory layer for the agent. At some point, the work around the work starts to erase the benefit of the automation.

We started Merlin from a different premise: what if the agent could drive the ticket-to-draft-PR process, instead of the user manually driving the agent?

Early runs were promising in the way prototypes usually are. For narrow tickets, Merlin could often identify the right files, make a reasonable change, and open a draft pull request. But harder tickets exposed the real problem. The agent would compress a multi-surface change into a one-file edit. It would mark work complete after a successful build, before the behavior had actually been exercised. It would miss acceptance criteria hidden in linked context.

That was not only a prompting problem. It was a workflow problem.

Useful engineering work depends on everything around the diff: the context gathered, the plan followed, the checks run, the environment used, and the evidence returned. So we shifted our focus from "how do we prompt the agent?" to "how do we build a workflow the agent can safely operate inside?"

The Harness Around The Agent

The most important architectural choice in Merlin is the separation between the harness and the intelligence.

The AI coding agent is responsible for reasoning-heavy work: understanding requirements, reading code, proposing implementation paths, writing patches, explaining failures, and reviewing tradeoffs. It also uses tool integrations and skills to gather context, run checks, and verify behavior.

Merlin is responsible for the workflow that should be repeatable: accepting work, tracking state, deciding which phase runs next, making the right tools available at the right time, enforcing approval gates, and producing a reviewable handoff.

That separation gives us portability. AI tools will keep changing. Models will improve. Developer tooling will evolve. Merlin is designed so Tinder can benefit from that progress without rebuilding the workflow every time the underlying tool changes.

This is why Merlin builds on top of AI coding agents instead of trying to become the underlying tool itself. The agent brings general coding intelligence and uses the available tools. Merlin brings Tinder-specific judgment about how work should move from idea to review.

The Plan Is The First Checkpoint

The simplest product decision in Merlin is also the most important: before it edits code, it asks for approval.

A user gives Merlin a work item. Merlin reads the request, gathers relevant code and project context, and replies with a plan. The plan describes what Merlin believes the work requires, what it expects to change, what assumptions it is making, and how it will verify the result.

Then it stops.

If the plan is wrong, the user asks Merlin to revise it. If the scope is too broad, the user narrows it. If the ticket is not ready, the user rejects it. Only after explicit approval does Merlin start implementation.

That gate does more than reduce risk. It gives the rest of the system a stable reference point. The agent is no longer working from a broad ticket description with hidden assumptions. It is working from a plan a user has read and accepted.

The plan is also where the highest-leverage steering happens. It is much cheaper to correct a misunderstanding before code exists than after a large diff has been generated. A good plan makes hidden assumptions visible, turns broad product language into explicit acceptance criteria, and gives the user a chance to redirect the work while the cost of doing so is still low.

Once approved, Merlin makes changes in an isolated workspace, runs checks, captures verification evidence, and opens one or more draft pull requests for review. The same Slack thread becomes the running record of the work: plan, progress, questions, evidence, and final handoff.

Slack Is The Workspace

We also made an interface choice that turned out to matter: users talk to Merlin in Slack.

Slack is where work already starts. Tickets are discussed there. Questions are clarified there. Screenshots are shared there. Review context often lives there before it ever becomes code.

Putting Merlin on the other end of a Slack thread means we can use Slack as the interface for everything around the diff: the initial request, the plan to approve, the questions Merlin needs answered, the verification summary, and the eventual link to a draft pull request.

It also makes Merlin accessible to people whose role is not day-to-day implementation. Product managers, designers, data scientists, and other partners can participate in the early shape of the work through the same thread where they already discuss product intent.

From Request To Review

Here’s an example in the shape of a representative Merlin run.

A user sends Merlin a focused ticket. Merlin reads the ticket, inspects the relevant code, and posts a plan back to Slack. The plan identifies the primary implementation area, the test coverage it expects to update, and a downstream behavior the user has not called out explicitly.

The user corrects one assumption in the plan and asks Merlin to revise it. Merlin updates the plan, the user approves, and then Merlin starts implementation.

Merlin comes back with a draft pull request. The diff covers both the primary change and the downstream behavior. Targeted checks pass. The handoff message explains what was verified, what evidence was captured, and what still needs user review.

Feedback Loops, Not One Big Pass

The first version of Merlin worked best on narrow, well-scoped changes. If a bug fix touches one area and has clear acceptance criteria, a single plan-build-review loop is often enough.

Larger tickets behave differently. They span multiple surfaces. They hide requirements in linked context. They need product, platform, and testing judgment. In those cases, a single-pass agent tends to compress the work into something smaller than the ticket actually asks for.

So we split the work into distinct phases:

Plan defines scope and acceptance criteria.
Implement writes the code against that plan.
Build and test catch mechanical failures.
Verify checks the behavior with platform-specific evidence.
Evaluate compares the result back to the accepted plan.
Review prepares the draft pull request and handoff.

The true value lies in the feedback loop between phases. Build failures, verification misses, evaluator gaps, and review concerns become specific signals for the next implementation pass. Instead of asking the agent to "try again," Merlin can send back a failing check, an unmet acceptance criterion, or a missing evidence requirement.

That makes iteration more concrete. The agent is not starting over from a vague state, it is improving the change against a specific signal from the harness. Over time, that feedback loop is what turns Merlin from a code generator into a system that can converge on better results.

That same idea runs through the rest of Merlin: evidence, not assertion. A pass should come with something observable, whether that is a targeted test result, a build result, a screenshot, a simulator or service response, or a clear statement that user verification is still needed.

Different Platforms Need Different Evidence

"Verify the change" means something different across Tinder's engineering surfaces.

Some backend changes need contract-aware tests and service-level checks. Some mobile changes need build validation plus UI behavior checks that unit tests cannot fully capture. Some web changes need a running browser to catch issues that static checks miss. Design-heavy changes may require comparing an implementation to a design reference.

Merlin's harness is platform-aware: it routes work through the checks that produce useful evidence for the surface being changed, and it treats failures differently depending on whether the agent has enough signal to fix them.

That platform-specific layer is a large part of what makes Merlin a Tinder system, not just a wrapper around an AI coding agent.

Guardrails Make The System Useful

Merlin's guardrails are deliberately conservative:

It waits for plan approval before editing code.
It creates draft pull requests only.
It does not merge code.
It uses controlled environments for runtime verification.
It asks for help when the answer would change the implementation.
It hands work back with evidence, not just a summary.

What Surprised Us

Three lessons stand out.

First, the plan is often more valuable than the diff. We started Merlin assuming the main win was automated pull request generation. In practice, the highest-leverage moment is the plan. That is where misunderstandings are cheap to fix, scope is most negotiable, and users can steer the work before it becomes code.

Second, asking for help is a product design problem. An agent that asks too many questions becomes noise. An agent that asks none becomes risky. The useful middle ground is a clear plan up front, followed by targeted questions only when the answer would change the implementation.

Third, verification is where the compounding value shows up. The model can write code, but knowing whether the code does what the ticket asked is the harder problem. Verification turns the outcome into evidence, and evidence turns failures into a feedback loop the agent can use. That loop lets Merlin move from "I made a change" toward "I observed what happened, learned what is still wrong, and made the next pass better."

What's Next

Merlin is still evolving. The next phase is less about making the agent sound smarter and more about making the workflow more reliable: better evidence capture, stronger platform-specific verification, cleaner handoffs, improved handling of review comments, and smoother paths for prototypes.

We are also exploring how Merlin can help more people at Tinder evaluate ideas inside the product itself. A product manager or designer could describe a prototype in Slack, attach the relevant product context, answer Merlin's clarifying questions, and get a working version inside the Tinder app in a safe environment. That matters because the team can judge the flow, state transitions, text, and overall feel in the same product context where members would eventually experience it. That future still goes through engineering standards, review, testing, and release guidelines.

The broader lesson is the one we did not expect when we started. AI can carry far more of the engineering workflow when the workflow itself is explicit.

That is what Merlin gives us: not a replacement for engineering judgment, and not a replacement for AI tools, but a Tinder-shaped harness that helps both work better together.

Tags for this post:

No items found.

Merlin: The Tinder® Harness Around AI Coding Agents

What Makes Merlin Magical

The Work Around The Work

The Harness Around The Agent

The Plan Is The First Checkpoint

Slack Is The Workspace

From Request To Review

Feedback Loops, Not One Big Pass

Different Platforms Need Different Evidence

Guardrails Make The System Useful

What Surprised Us

What's Next

Undercover Agent: How We Built an AI That Tests Coverage While We Sleep

A Non-Engineer’s Translation Guide to Living With One

How We Decomposed Tinder’s Monolith

Merlin: The Tinder® Harness Around AI Coding Agents

What Makes Merlin Magical

The Work Around The Work

The Harness Around The Agent

The Plan Is The First Checkpoint

Slack Is The Workspace

From Request To Review

Feedback Loops, Not One Big Pass

Different Platforms Need Different Evidence

Guardrails Make The System Useful

What Surprised Us

What's Next

Read similar posts

Undercover Agent: How We Built an AI That Tests Coverage While We Sleep

A Non-Engineer’s Translation Guide to Living With One

How We Decomposed Tinder’s Monolith