We search for it — and let logic, not an AI, decide. To show how, we rebuilt one of our biggest jobs to scale — trade by trade. Scroll to build it.
Our first build was the obvious one: a fleet of AI agents reading the drawings and making the call. It made a great demo. But an AI left to make the final call can be confidently wrong, and a confident guess has no place in a real bid. That gap is where the real project began.
So we kept experimenting — several different architectures, each dividing the work between AI and code a little differently. They all drifted toward the same answer: the AI couldn't be the one deciding. That route is what became the engine we build today.
So we inverted it. The AI got one narrow, swappable job — find the candidate words on the page. Nothing more.
The verdict moved to logic: tens of thousands of auditable rules that decide what those words actually prove. Same inputs, same answer — every time.
Every number in an estimate has to come from them. A bid is a promise to build exactly what's drawn, for a price — miss what the drawings say and you either lose the job or lose the margin. So we check every number against them, line by line.
Insulation is the first trade we've taken all the way — the same loop earns every feature that follows:
Every line is a claim — a location, a material, a thickness, an R-value. The engine checks each one against the drawings' own words. All confirmed → "Plan basis found." Only some → "Likely." Nothing → an honest "No basis found." Click a red layer to see its cited evidence.
When the engine confirms a line, it doesn't just say "supported" — it shows the exact words on the exact sheet that prove it: "R-21 BATT INSULATION (UNFACED)", quoted from sheet A-502. No citation, no claim.
We treat every line like a statistical test — and a test can err two ways: claim support that isn't there (a false positive), or miss support that is (a false negative). We deliberately tuned for precision first: drive false positives to zero, even at the cost of a few honest "needs review"s. A tool that lies quietly is dangerous; one that admits doubt, you can trust.
Every guard closes a specific way the tool could be fooled — traced to a real project, version-controlled, and tested. A few:
Zero fabricated evidence across 36 independently-audited projects, end-to-end. Of 1,631 takeoff lines, 1,331 are plan-grounded (~82%) — each re-grounds against the drawings' own words under strict coherence guards, and post-reconcile drops none. And it doesn't just stay quiet when unsure: on one job it flagged 11 real conflicts — e.g. the plans call for R-13 where the bid used R-11.
The logic carries the system. The AI never makes the call — it only helps locate words on the page, and every verdict is decided by deterministic code we wrote, test, and own. So we tested it: change the model, cheap to premium, same answers every time. The intelligence is in the pipeline, not the AI — proof the logic, not the model, does the real work.
Trust means owning your limits. A few honest boundaries — and each one builds trust rather than spends it:
• Honest gaps, not guesses. On scanned image-only drawings, or estimator codes that never appear on the plans, it reports "no evidence" — never a made-up number.
• It catches itself. An independent semantic audit of 357 lines surfaced one borderline over-reach class (~0.6%) — all on lines already flagged "likely / verify," never confirmed support — now closed in code. The system policing itself is the proof.
• The fix is retrieval, not a bigger AI. We measured that the model barely matters — so the roadmap is better reading of the plans (OCR + schedules), all gate-safe.
Every role a software company hires a whole team for — covered by two people in about three weeks. Over 100,000 lines of code: designed, built, tested, and deployed, end to end.
We searched the market: Togal, Kreo and Beam all race to measure drawings faster, then leave a person to QA the result. Not one publicly offers what we built — a system held to zero false evidence, that proves every line against the plans and refuses to fabricate. We went looking for another tool built that way. We didn't find one.
The engine does the line-by-line cross-checking, so a reviewer isn't combing the plans by hand — and bids move out the door faster. Every part of it is laying the foundation for the north star: a guided takeoff builder that assists estimating end to end.
A dependable estimation engine — proven across real projects, audited many ways, and ready to carry every product the platform builds next.