AI-Assisted Estimating · Vermella @ Metropark, rebuilt to scale

The drawings already
hold the answer.

We search for it — and let logic, not an AI, decide. To show how, we rebuilt one of our biggest jobs to scale — trade by trade. Scroll to build it.

Trade 1 · Framing

We began with a
swarm of AI agents.

Our first build was the obvious one: a fleet of AI agents reading the drawings and making the call. It made a great demo. But an AI left to make the final call can be confidently wrong, and a confident guess has no place in a real bid. That gap is where the real project began.

So we kept experimenting — several different architectures, each dividing the work between AI and code a little differently. They all drifted toward the same answer: the AI couldn't be the one deciding. That route is what became the engine we build today.

Trade 2 · Insulation · the turning point

AI proposes.
The logic decides.

So we inverted it. The AI got one narrow, swappable job — find the candidate words on the page. Nothing more.

The verdict moved to logic: tens of thousands of auditable rules that decide what those words actually prove. Same inputs, same answer — every time.

Trade 3 · Drywall · why it matters

The plans are
the contract.

Every number in an estimate has to come from them. A bid is a promise to build exactly what's drawn, for a price — miss what the drawings say and you either lose the job or lose the margin. So we check every number against them, line by line.

Trade 4 · Doors & trim · the method behind it

Build it. Prove it.
Repeat.

Insulation is the first trade we've taken all the way — the same loop earns every feature that follows:

Build it→ Prove on 1 job→ Tune→ Harden on many→ Tune↻

The proof · one wall, line by line

A wall is a stack
of decisions.

Every line is a claim — a location, a material, a thickness, an R-value. The engine checks each one against the drawings' own words. All confirmed → "Plan basis found." Only some → "Likely." Nothing → an honest "No basis found." Click a red layer to see its cited evidence.

What "grounded" means

Every "yes" comes
with a receipt.

When the engine confirms a line, it doesn't just say "supported" — it shows the exact words on the exact sheet that prove it: "R-21 BATT INSULATION (UNFACED)", quoted from sheet A-502. No citation, no claim.

A statistical approach to trust

When unsure, it
fails loudly.

We treat every line like a statistical test — and a test can err two ways: claim support that isn't there (a false positive), or miss support that is (a false negative). We deliberately tuned for precision first: drive false positives to zero, even at the cost of a few honest "needs review"s. A tool that lies quietly is dangerous; one that admits doubt, you can trust.

Trustworthy by design

Roughly 60 guards,
each from a real case.

Every guard closes a specific way the tool could be fooled — traced to a real project, version-controlled, and tested. A few:

Interior ≠ exterior

An interior partition can't borrow the exterior wall's R-value.

Framing must fit

An 8-inch wall can't take a 6-inch-stud spec.

Not every "R-19"

A light-fixture part number like "SUMO-R-19" isn't an R-value.

Right material

A fire-stopping note isn't wall-cavity insulation.

Proven, not promised.

Zero fabricated evidence across 36 independently-audited projects, end-to-end. Of 1,631 takeoff lines, 1,331 are plan-grounded (~82%) — each re-grounds against the drawings' own words under strict coherence guards, and post-reconcile drops none. And it doesn't just stay quiet when unsure: on one job it flagged 11 real conflicts — e.g. the plans call for R-13 where the bid used R-11.

Engineered leverage, not a chatbot

Grounded in logic,
not AI.

The logic carries the system. The AI never makes the call — it only helps locate words on the page, and every verdict is decided by deterministic code we wrote, test, and own. So we tested it: change the model, cheap to premium, same answers every time. The intelligence is in the pipeline, not the AI — proof the logic, not the model, does the real work.

100%

Support decided by logic

the AI can flag doubt, never grant it

Swappable

The AI layer

$0.25 model matched the $0.44 one

What it can't do yet — and the plan

We know exactly
where the edges are.

Trust means owning your limits. A few honest boundaries — and each one builds trust rather than spends it:

• Honest gaps, not guesses. On scanned image-only drawings, or estimator codes that never appear on the plans, it reports "no evidence" — never a made-up number.
• It catches itself. An independent semantic audit of 357 lines surfaced one borderline over-reach class (~0.6%) — all on lines already flagged "likely / verify," never confirmed support — now closed in code. The system policing itself is the proof.
• The fix is retrieval, not a bigger AI. We measured that the model barely matters — so the roadmap is better reading of the plans (OCR + schedules), all gate-safe.

More than "AI software"

We built the
whole machine.

Every role a software company hires a whole team for — covered by two people in about three weeks. Over 100,000 lines of code: designed, built, tested, and deployed, end to end.

Product & UX

Designed

Review UI

React · 7k lines

Backend + API

Built

AI + logic engine

56k lines

Test suite

2,186 tests

Deployment

Live · Nginx / VPS

Where this sits in the market

They measure faster.
We made trust automatic.

We searched the market: Togal, Kreo and Beam all race to measure drawings faster, then leave a person to QA the result. Not one publicly offers what we built — a system held to zero false evidence, that proves every line against the plans and refuses to fabricate. We went looking for another tool built that way. We didn't find one.

The immediate impact

Right now? Bids
out the door faster.

The engine does the line-by-line cross-checking, so a reviewer isn't combing the plans by hand — and bids move out the door faster. Every part of it is laying the foundation for the north star: a guided takeoff builder that assists estimating end to end.

The receipts

What the audits found.

Fabricated evidence

across 36 audited projects

Projects audited

end-to-end · 1,631 lines

~60

Coherence guards

each tested & versioned

~82%

Plan-grounded

1,331 of 1,631 lines

The foundation

One engine,
every product.

Insulation review

Live now

Other trades

Same engine

Proposal review

Next up

Guided takeoffs

On the horizon

Search the drawings.
The logic decides.
Everything's built on it.

A dependable estimation engine — proven across real projects, audited many ways, and ready to carry every product the platform builds next.

The drawings alreadyhold the answer.

We began with aswarm of AI agents.

AI proposes.The logic decides.

The plans arethe contract.

Build it. Prove it.Repeat.

A wall is a stackof decisions.

Every "yes" comeswith a receipt.

When unsure, itfails loudly.

Roughly 60 guards,each from a real case.