Architecture · 2026

Decoder Architecture

How MimicLabs Decoder reads a contract. Five steps: pull the text, sort it, pick the right model, check the answers, show them. Plus the SOUL, which keeps it honest.

Status: Alpha Stack: Safari extension, Anthropic API Read: ~12 min
Developed within MIMICLABS

Click a stage to jump. The active stage highlights as you scroll.

01 · Extract

Get the document but not the elements around it

Decoder works on two kinds of input. PDFs the user has open, and web pages the user is reading. Each one has its own problem to solve.

For PDFs, Decoder uses PDFKit. The text comes out clean. The one thing it loses is tables. Spatial layout gets flattened into a single line of text. For most contracts that's fine, because the words carry the legal meaning, not the layout. Worth knowing about, though.

Web pages are harder. If you grab everything on the page, you also grab the menu, the header, the footer, the cookie banner, the ads, the related-content boxes. On a typical terms-of-service page, half of what you grab is page furniture, not the document.

The extractor does three things. First, it removes the obvious chrome: navigation bars, headers, footers, banners, anything tagged that way. Second, it looks for the main content area, preferring tags meant for it like main and article. If those aren't there, it scores blocks of text by how dense they are and how they fit into the page structure. Third, it keeps enough shape, headings and lists, that the model can see the document the way the reader sees it.

Every bit of leftover chrome costs money to send to the model and pulls its attention away from the document. Cleaning the input cuts the size by roughly a third to a half on most pages, and keeps the model focused on what's being read.

02 · Classify

Pick the right model with two simple checks

This step decides which model handles the document. It uses two signals. How long the document is, and how varied its legal vocabulary is. Then it combines them with one rule.

Signal one: length. The document's size decides which models are even allowed to handle it. Short documents can go to any model. Longer ones rule out the smaller models. Very long ones can only go to the biggest one, because that's the only one that can fit them in.

Every document is different as should be treated as such. By doing so, we are able to cut costs on unnecessary processing power or underperformance

Document size
Light
Standard
Heavy
Premium
Short
Medium
Long
Very long

Signal two: vocabulary. Decoder keeps a list of legal terms, each with a weight. It scans the document and adds up the weight of every distinct term it finds. A term only counts once, no matter how many times it appears.

Counting terms once was the right call. An earlier version multiplied weight by how often a term appeared. Documents that just repeated the same common terms got pushed up the ladder for no good reason. A short rental template kept getting routed to bigger models because it used a few common lease terms a lot. Counting unique terms measures what concepts the document actually touches, not how repetitive it is.

0.5
Baseline
Words you'd find in any agreement. Helpful for the in-product glossary, but doesn't tell us much about the document.
1
General legal
Tells us the document is doing real legal work, not just describing a transaction.
2
Moderately specialised
Real legal substance. Risk, remedies, structural protections.
3
Terms of art
Heavy specialised vocabulary. Usually negotiated documents, not templates.

The 0.5 tier exists because basic agreement vocabulary was inflating scores on simple templates. Demoting those words to half-weight fixed the calibration without dropping them from the list. They still earn their keep in the glossary.

The combining rule. The final tier is whichever is higher: the lowest tier the document's length allows, or the tier its vocabulary suggests. Length sets the floor. Vocabulary can push it up. The bias goes one way: if in doubt, go higher. Over-classifying costs a bit more money. Under-classifying gives the user a worse answer, and they can't tell.

03 · Route

Four model tiers, one provider

Same SDK, same login, same caching. Different models and different reasoning settings. Each step up the ladder buys a real capability the one below can't do reliably.

Tier
Model
Reasoning
Used for
LLight
Small
Off
Short, simple documents.
SStandard
Mid
Off
Most everyday consumer documents.
HHeavy
Mid
On
Documents that need careful synthesis.
PPremium
Frontier
On
Complex commercial documents.

Light to Standard is a step up in raw capability. Standard to Heavy adds reasoning, so the model thinks before answering. Heavy to Premium swaps in the strongest model. Each step exists because the one below it falls short on a real class of documents.

Why one provider. Mixing providers means more SDKs, more validation calibration, more behaviour to chase. The savings at the bottom end aren't worth the work for an alpha. One provider keeps the system simple. Although, a mixed providers was once initially thought about accordance to their each prominent features, but is deferred.

Caching. The system prompt is cached. After the first call in a session, the next calls reuse the cached version and pay much less for it. Free savings without losing quality.

04 · Validate

Drop anything that fails the rules

The model returns its answer as structured JSON. Before any of it gets shown, each finding has to pass a set of checks. If a check fails, the finding is dropped. Better to show the user fewer things than to show them something that's made up.

Every drop is logged with a reason. If validation drops everything, the result says so explicitly. That's a different state from the model deciding the document is out of scope, and a different state from the model finding nothing worth flagging. The user sees that the analysis ran but didn't produce anything usable, instead of getting a silent empty screen.

05 · Render

Show the result and what it had worked through

The validated result gets written to shared storage with the decode's job ID. The results page polls that storage and shows the result when it appears.

Every decode gets a unique job ID at the start. Each step it goes through, from created to processing to completed to rendered, or to failed or abandoned, gets logged against that ID. When the result is ready, the results page checks that the ID in storage matches the one it's expecting. Two decodes can never get crossed in the user's view, because the system enforces it. Not because we hope the user is only running one at a time.

In the rendered findings, terms from the lexicon are shown with a dotted underline. Hover or tap to see the plain-language explanation. The same dotted underline you've been hovering on in this article.

GUARDRAILS

SOUL: the rules behind the pipeline

Decoder has a SOUL document. It says what the product is, what it must never do, and how it behaves over time. The validator enforces the rules in code. The prompt asks the model to follow them. The categories themselves were designed around them.

S1
Translate, don't advise
Decoder explains what a clause does. It doesn't tell the user what to do about it.
S2
The reader decides
The user is the one making the call. Decoder helps them read better. It doesn't read for them.
S3
Less, with confidence
Better to miss a borderline thing than to flag too much. The cap and the validator both push the same way.
S4
Honest small over dishonest big
A smaller honest answer beats a bigger one that's been padded. Made-up findings are dropped, not patched.
S5
No advice voice
No prescriptions, no severity rankings, no nudging the user to act. The voice stays in description.

The rules live in governance files. A SOUL document, and scaffolding agents works side to side throughout sessions. When the codebase changes, the change is checked against the SOUL. When agents work on the product, they read these files first so they inherit the rules.

DEVELOPED

One lexicon, two jobs

The same weighted list of legal terms powers the classifier in step 2 and the in-product glossary in step 5. One list. Two places it shows up.

For the model
Classifier scoring
Weighted unique-term count produces the vocabulary signal that picks the model.
For the reader
In-product glossary
Terms get a dotted underline in the rendered findings. Hover or tap to see the plain explanation.

This wasn't planned from the start. It came out of noticing that finding legal terms and explaining them is the same job, just for two audiences. Building one list for both is less work than maintaining two, and it keeps the two surfaces in sync. A term worth scoring is, by construction, a term worth explaining.

Decoder is one example of a class

Five steps in the pipeline. Two disciplines that run across all of them. None of the parts is clever on its own. The classifier is two signals and a rule. The router is a lookup. The validator is a few simple checks that fail closed. The lexicon is a list with weights.

What it adds up to is a system that puts effort where it pays off and refuses to waste it where it doesn't. Every finding the user sees can be traced back to a span they can read for themselves. The product's stance toward the user, that it translates and doesn't advocate, is enforced in code. Not just in good intentions.

Decoder is the first thing built this way. The shape of it generalises. A domain-specific lexicon. Length-and-vocabulary classification. Tiered routing inside one provider. Fail-closed validation. Governance kept in files agents read first.