The enrichment trap
The Annual Procurement Plan Parser fills in missing identifiers for a thousand-plus records by searching an external portal and writing back what it finds. Enrichment jobs share a dangerous shape: they read and write the same data, in bulk, automatically. One bad transform or one off-by-one loop and you've quietly overwritten the very source you were trying to enrich — with no clean copy to roll back to.
Two databases, one direction
So the source is read-only and the destination is write-only. The job physically cannot mutate its input; enriched rows only ever flow one way, into a separate store. It costs a second connection, some network latency, and duplicated storage — and it buys back something I wasn't willing to trade: the source stays pristine, every enriched record has clear lineage, and the two sides back up and scale independently.
A thousand-plus records processed in production, zero corruption incidents. That number is the architecture working.
Two tiers, because coverage and cost pull apart
Searching the portal is where the real tension lives. Precise filters are fast and cheap but resolve only about 70% of records — real-world source data is too messy for one strict query. Broad searches cover everything but are expensive and noisy. Running broad-by-default would have hammered the portal for no reason.
The answer is a two-tier strategy: try the precise filters first (they handle roughly 80% of cases quickly), and fall back to a comprehensive search only when the first tier comes up empty. Coverage climbed from ~70% to 99%+ while most requests stayed on the cheap path.
Earning the insert
Coverage is worthless if it lets bad matches through, so every candidate passes a progressive gate: exact match → criteria validation → confidence scoring. Only high-confidence matches are written; ambiguous ones are rejected into an audit trail rather than guessed into the destination. The insert rate drops slightly — on purpose. A missing row is a TODO; a wrong row is a lie the rest of the system will believe.
The lesson
Every decision here traded simplicity or speed for integrity, and that was the right trade every time. When a system writes to a database automatically and at scale, the question isn't "how fast can it write?" — it's "what stops it from writing something wrong?" Read-only sources, tiered search, and confidence gates are three different answers to that one question.