Our first build was the obvious one: a fleet of AI agents reading the drawings and making the call. It made a great demo. But an AI left to make the final call can be confidently wrong, and a confident guess has no place in a real bid. That gap is where the real project began.
So we kept experimenting — several different architectures, each dividing the work between AI and code a little differently. They all drifted toward the same answer: the AI couldn't be the one deciding. That route is what became the engine we build today.
So we inverted it. The AI got one narrow, swappable job — find the candidate words on the page. Nothing more.
The verdict moved to logic: tens of thousands of auditable rules that decide what those words actually prove. Same inputs, same answer — every time.
Every line in a bid is a claim about the page. Now each one carries the burden of proof — no claims stand unless the drawing or domain knowledge backs it up.
Insulation is the first trade we've taken all the way — the same loop earns every feature that follows:
Every line is a claim — a location, a material, a thickness, an R-value. The engine checks each one against the drawings' own words. All confirmed → "Plan basis found." Only some → "Likely." Nothing → an honest "No basis found." Click a red layer to see its cited evidence.
When the engine confirms a line, it doesn't just say "supported" — it shows the exact words on the exact sheet that prove it: "R-21 BATT INSULATION (UNFACED)", quoted from sheet A-502. No citation, no claim.
We treat every line like a statistical test — and a test can err two ways: claim support that isn't there (a false positive), or miss support that is (a false negative). We deliberately tuned for precision first: drive false positives to zero, even at the cost of a few honest "needs review"s. A tool that lies quietly is dangerous; one that admits doubt, you can trust.
Every guard closes a specific way the tool could be fooled — traced to a real project, version-controlled, and tested. A few:
Zero fabricated evidence across 35 independently-audited projects, end-to-end. Of 1,610 takeoff lines, 751 are plan-grounded (~47%) — each re-grounds against the drawings' own words under strict coherence guards, and post-reconcile drops none; the rest are honest gaps, never guesses (see the limits). And it doesn't just stay quiet when unsure: on one job it flagged 11 real conflicts — e.g. the plans call for R-13 where the bid used R-11.
Trust means owning your limits — and every gap is a row the engine refused to guess, reported as "basis not confirmed," never a made-up number.
Corridor, demising and interior partition walls whose insulation spec lives only in a schedule-cell image the text layer can't read — 62% of everything not yet grounded is this one case.
• Image-only drawings. Scanned or flattened sheets with no text layer — honest "no evidence." (OCR on the roadmap.)
• Estimator-only codes. Workbook codes that never appear on the plans — flagged, not invented.
• The fix is retrieval, not a bigger AI. We measured it — the model barely matters. Every lever is better reading of the plans.
The logic carries the system. The AI never makes the call — it only helps locate words on the page, and every verdict is decided by deterministic code we wrote, test, and own. So we tested it: change the model, cheap to premium, same answers every time. The intelligence is in the pipeline, not the AI — proof the logic, not the model, does the real work.
Every role a software company hires a whole team for — covered by two people in about three weeks. Over 100,000 lines of code: designed, built, tested, and deployed, end to end.
We searched the market: Togal, Kreo and Beam all race to measure drawings faster, then leave a person to QA the result. Not one publicly offers what we built — a system held to zero false evidence, that proves every line against the plans and refuses to fabricate. We went looking for another tool built that way. We didn't find one.
The engine does the line-by-line cross-checking, so a reviewer isn't combing the plans by hand — and bids move out the door faster. Every part of it is laying the foundation for the north star: a guided takeoff builder that assists estimating end to end.
The foundation for a dependable estimation engine — proven across real projects, audited many ways, and set up to carry every product the platform builds next.