AI-Assisted Estimating · Vermella @ Metropark

We're rebuilding how we bid —
one trade at a time.

Where it started

We began with a
swarm of AI agents.

Our first build was the obvious one: a fleet of AI agents reading the drawings and making the call. It made a great demo. But an AI left to make the final call can be confidently wrong, and a confident guess has no place in a real bid. That gap is where the real project began.

So we kept experimenting — several different architectures, each dividing the work between AI and code a little differently. They all drifted toward the same answer: the AI couldn't be the one deciding. That route is what became the engine we build today.

The turning point

AI proposes.
The logic decides.

So we inverted it. The AI got one narrow, swappable job — find the candidate words on the page. Nothing more.

The verdict moved to logic: tens of thousands of auditable rules that decide what those words actually prove. Same inputs, same answer — every time.

Why it matters

The plans are
the contract.

Every line in a bid is a claim about the page. Now each one carries the burden of proof — no claims stand unless the drawing or domain knowledge backs it up.

The method behind it

Build it. Prove it.
Repeat.

Insulation is the first trade we've taken all the way — the same loop earns every feature that follows:

Build it→ Prove on 1 job→ Tune→ Harden on many→ Tune↻

The proof · one wall, line by line

A wall is a stack
of decisions.

Every line is a claim — a location, a material, a thickness, an R-value. The engine checks each one against the drawings' own words. All confirmed → "Plan basis found." Only some → "Likely." Nothing → an honest "No basis found." Click a red layer to see its cited evidence.

What "grounded" means

Every "yes" comes
with a receipt.

When the engine confirms a line, it doesn't just say "supported" — it shows the exact words on the exact sheet that prove it: "R-21 BATT INSULATION (UNFACED)", quoted from sheet A-502. No citation, no claim.

A statistical approach to trust

When unsure, it
fails loudly.

We treat every line like a statistical test — and a test can err two ways: claim support that isn't there (a false positive), or miss support that is (a false negative). We deliberately tuned for precision first: drive false positives to zero, even at the cost of a few honest "needs review"s. A tool that lies quietly is dangerous; one that admits doubt, you can trust.

Trustworthy by design

Roughly 60 guards,
each from a real case.

Every guard closes a specific way the tool could be fooled — traced to a real project, version-controlled, and tested. A few:

Interior ≠ exterior

An interior partition can't borrow the exterior wall's R-value.

Framing must fit

An 8-inch wall can't take a 6-inch-stud spec.

Not every "R-19"

A light-fixture part number like "SUMO-R-19" isn't an R-value.

Right material

A fire-stopping note isn't wall-cavity insulation.

Preliminary Findings

Proven, not promised.

Zero fabricated evidence across 35 independently-audited projects, end-to-end. Of 1,610 takeoff lines, 751 are plan-grounded (~47%) — each re-grounds against the drawings' own words under strict coherence guards, and post-reconcile drops none; the rest are honest gaps, never guesses (see the limits). And it doesn't just stay quiet when unsure: on one job it flagged 11 real conflicts — e.g. the plans call for R-13 where the bid used R-11.

What it can't do yet — and the plan

We know exactly
where the edges are.

Trust means owning your limits — and every gap is a row the engine refused to guess, reported as "basis not confirmed," never a made-up number.

The biggest gap right now

~1 in 3 takeoff lines

Corridor, demising and interior partition walls whose insulation spec lives only in a schedule-cell image the text layer can't read — 62% of everything not yet grounded is this one case.

The planA schedule-cell reader — crop + vision on the wall-type table. Gate-safe, already specced (ADR-0006); closes the gap without touching precision.

Other edges

• Image-only drawings. Scanned or flattened sheets with no text layer — honest "no evidence." (OCR on the roadmap.)

• Estimator-only codes. Workbook codes that never appear on the plans — flagged, not invented.

• The fix is retrieval, not a bigger AI. We measured it — the model barely matters. Every lever is better reading of the plans.

Engineered leverage, not a chatbot

Grounded in logic,
not AI.

The logic carries the system. The AI never makes the call — it only helps locate words on the page, and every verdict is decided by deterministic code we wrote, test, and own. So we tested it: change the model, cheap to premium, same answers every time. The intelligence is in the pipeline, not the AI — proof the logic, not the model, does the real work.

100%

Support decided by logic

the AI can flag doubt, never grant it

Swappable

The AI layer

$0.25 model matched the $0.44 one

More than "AI software"

We built the
whole machine.

Every role a software company hires a whole team for — covered by two people in about three weeks. Over 100,000 lines of code: designed, built, tested, and deployed, end to end.

Product & UX

Designed

Review UI

React · 7k lines

Backend + API

Built

AI + logic engine

56k lines

Test suite

2,186 tests

Deployment

Live · Nginx / VPS

Where this sits in the market

They measure faster.
We made trust automatic.

We searched the market: Togal, Kreo and Beam all race to measure drawings faster, then leave a person to QA the result. Not one publicly offers what we built — a system held to zero false evidence, that proves every line against the plans and refuses to fabricate. We went looking for another tool built that way. We didn't find one.

The immediate impact

Right now? Bids
out the door faster.

The engine does the line-by-line cross-checking, so a reviewer isn't combing the plans by hand — and bids move out the door faster. Every part of it is laying the foundation for the north star: a guided takeoff builder that assists estimating end to end.

The receipts

What the audits found.

0

Fabricated evidence

across 35 audited projects

35

Projects audited

end-to-end · 1,610 lines

~60

Coherence guards

each tested & versioned

~47%

Plan-grounded

751 of 1,610 lines

The foundation

One engine,
every product.

Insulation review

Live now

Other trades

Same engine

Proposal review

Next up

Guided takeoffs

On the horizon

Search the drawings.
The logic decides.
Everything's built on it.

The foundation for a dependable estimation engine.

We're rebuilding how we bid —one trade at a time.

We began with aswarm of AI agents.

AI proposes.The logic decides.

The plans arethe contract.

Build it. Prove it.Repeat.

A wall is a stackof decisions.

Every "yes" comeswith a receipt.

When unsure, itfails loudly.

Roughly 60 guards,each from a real case.