Estimation
AI-Assisted Estimating · Vermella @ Metropark

We're rebuilding how we bid —
one trade at a time.

Where it started

We began with a
swarm of AI agents.

Our first build was the obvious one: a fleet of AI agents reading the drawings and making the call. It made a great demo. But an AI left to make the final call can be confidently wrong, and a confident guess has no place in a real bid. That gap is where the real project began.

So we kept experimenting — several different architectures, each dividing the work between AI and code a little differently. They all drifted toward the same answer: the AI couldn't be the one deciding. That route is what became the engine we build today.

The turning point

AI proposes.
The logic decides.

So we inverted it. The AI got one narrow, swappable job — find the candidate words on the page. Nothing more.

The verdict moved to logic: tens of thousands of auditable rules that decide what those words actually prove. Same inputs, same answer — every time.

Why it matters

The plans are
the contract.

Every line in a bid is a claim about the page. Now each one carries the burden of proof — no claims stand unless the drawing or domain knowledge backs it up.

The method behind it

Build it. Prove it.
Repeat.

Insulation is the first trade we've taken all the way — the same loop earns every feature that follows:

Build it Prove on 1 job Tune Harden on many Tune
The proof · one wall, line by line

A wall is a stack
of decisions.

Every line is a claim — a location, a material, a thickness, an R-value. The engine checks each one against the drawings' own words. All confirmed → "Plan basis found." Only some → "Likely." Nothing → an honest "No basis found." Click a red layer to see its cited evidence.

What "grounded" means

Every "yes" comes
with a receipt.

When the engine confirms a line, it doesn't just say "supported" — it shows the exact words on the exact sheet that prove it: "R-21 BATT INSULATION (UNFACED)", quoted from sheet A-502. No citation, no claim.

A statistical approach to trust

When unsure, it
fails loudly.

We treat every line like a statistical test — and a test can err two ways: claim support that isn't there (a false positive), or miss support that is (a false negative). We deliberately tuned for precision first: drive false positives to zero, even at the cost of a few honest "needs review"s. A tool that lies quietly is dangerous; one that admits doubt, you can trust.

Trustworthy by design

Roughly 60 guards,
each from a real case.

Every guard closes a specific way the tool could be fooled — traced to a real project, version-controlled, and tested. A few:

Interiorexterior
An interior partition can't borrow the exterior wall's R-value.
Framing must fit
An 8-inch wall can't take a 6-inch-stud spec.
Not every "R-19"
A light-fixture part number like "SUMO-R-19" isn't an R-value.
Right material
A fire-stopping note isn't wall-cavity insulation.
Preliminary Findings

Proven, not promised.

Zero fabricated evidence across 35 independently-audited projects, end-to-end. Of 1,610 takeoff lines, 751 are plan-grounded (~47%) — each re-grounds against the drawings' own words under strict coherence guards, and post-reconcile drops none; the rest are honest gaps, never guesses (see the limits). And it doesn't just stay quiet when unsure: on one job it flagged 11 real conflicts — e.g. the plans call for R-13 where the bid used R-11.

What it can't do yet — and the plan

We know exactly
where the edges are.

Trust means owning your limits — and every gap is a row the engine refused to guess, reported as "basis not confirmed," never a made-up number.

The biggest gap right now
~1 in 3 takeoff lines

Corridor, demising and interior partition walls whose insulation spec lives only in a schedule-cell image the text layer can't read — 62% of everything not yet grounded is this one case.

The planA schedule-cell reader — crop + vision on the wall-type table. Gate-safe, already specced (ADR-0006); closes the gap without touching precision.
Other edges

Image-only drawings. Scanned or flattened sheets with no text layer — honest "no evidence." (OCR on the roadmap.)

Estimator-only codes. Workbook codes that never appear on the plans — flagged, not invented.

The fix is retrieval, not a bigger AI. We measured it — the model barely matters. Every lever is better reading of the plans.

Engineered leverage, not a chatbot

Grounded in logic,
not AI.

The logic carries the system. The AI never makes the call — it only helps locate words on the page, and every verdict is decided by deterministic code we wrote, test, and own. So we tested it: change the model, cheap to premium, same answers every time. The intelligence is in the pipeline, not the AI — proof the logic, not the model, does the real work.

100%
Support decided by logic
the AI can flag doubt, never grant it
Swappable
The AI layer
$0.25 model matched the $0.44 one
More than "AI software"

We built the
whole machine.

Every role a software company hires a whole team for — covered by two people in about three weeks. Over 100,000 lines of code: designed, built, tested, and deployed, end to end.

Product & UX
Designed
Review UI
React · 7k lines
Backend + API
Built
AI + logic engine
56k lines
Test suite
2,186 tests
Deployment
Live · Nginx / VPS
Where this sits in the market

They measure faster.
We made trust automatic.

We searched the market: Togal, Kreo and Beam all race to measure drawings faster, then leave a person to QA the result. Not one publicly offers what we built — a system held to zero false evidence, that proves every line against the plans and refuses to fabricate. We went looking for another tool built that way. We didn't find one.

The immediate impact

Right now? Bids
out the door faster.

The engine does the line-by-line cross-checking, so a reviewer isn't combing the plans by hand — and bids move out the door faster. Every part of it is laying the foundation for the north star: a guided takeoff builder that assists estimating end to end.

The receipts

What the audits found.

0
Fabricated evidence
across 35 audited projects
35
Projects audited
end-to-end · 1,610 lines
~60
Coherence guards
each tested & versioned
~47%
Plan-grounded
751 of 1,610 lines
The foundation

One engine,
every product.

Insulation review
Live now
Other trades
Same engine
Proposal review
Next up
Guided takeoffs
On the horizon

Search the drawings.
The logic decides.
Everything's built on it.

The foundation for a dependable estimation engine.

What a cited match looks like
R-212×6 cavity batt→ A-502 ✓
⅝"Gypsum board→ A-502 ✓
Scroll to travel · Space / arrows to advance · click a red layer at the wall