Testing: where MCP genuinely changed how we work
Blog 2 of 3 by Larry, who builds products at Kintro.
In part two of the series on building Kintro, a UK shared digital wallet that lets friends and families pool, spend and save together. Larry writes about what actually shifted: not the unit tests (those were the easy win), but the harder end-to-end flows. Where Detox is great when it works and brutal when it doesn't.
Unit tests were the easy win. kycService.test.ts has six suites covering the applicant, address, and verification-create flows, as well as the polling logic. The auto-approve hack ("John Snow" verifies "John Doe" does not) is the kind of thing you would never ship in production logic, but it is gold for tests. With a Jest runner exposed as a tool, the model can write a new test, run the suite, watch it fail, and stop only when it passes. I have stopped writing the test scaffolding for these by hand. They are too mechanical.
The harder, more interesting case is the end-to-end one. We use Detox. Detox is great when it works and a real time sink when it does not. The flow in e2e/onboarding.test.ts navigates the welcome screen, taps sign up, enters a name and phone number, fills in the OTP fields, and lands on the address screen. Each step uses a by.id(...) query against a testID that someone had to remember to add to the component.
This is where a simulator-aware MCP server really pays off. I want the model to be able to say, “The test is failing because name-input is not reaching the rendered component. Here is the line in onboarding-name.tsx that is missing the prop.” Not a guess. Look. That turns Detox debugging from a vibes-based guessing game into a read the tree, find the missing id, and add it loop. That is a weekend’s worth of frustration off the calendar.
We are not all the way there yet. Our setup today is more “Claude can read the Detox logs and reason about them” than “Claude is driving the simulator”. Even the read-only version is worth it. The full loop is what I am building toward.
What I would tell another team
Three things, in order of how much they actually moved the needle.
Give the model access to the test runner before anything else. It is the single highest leverage tool. Generate, run, fix, repeat. That loop is genuinely faster than typing.
Treat your design source of truth as a tool, not an image. A Figma MCP that can read frames and tokens is worth ten “match this screenshot, please” prompts. The hex codes will be right the first time, and your design system will stop bleeding tokens.
Be honest about where the model still struggles. It does not have taste. It will happily produce a screen that is technically correct and visually sad. The rounded corners will be 12 instead of 32. The illustration will be the wrong shade of purple. You still need a designer in the loop; you just need them less often for the mechanical parts.
The thing I keep coming back to is that tool use is not really about the model getting smarter. It is about the model getting the same context as a human dev has when they sit down to work. The terminal, the test runner, the design file, and the simulator. Once it has those, the gap between “writes plausible code” and “ships working features” closes fast. And once you have watched that gap close on something real, like a KYC form that finally accepts O’Brien, you do not really go back.
Part 2 of 3. Follow for more on shipping UK fintech from Nairobi.