How Sourzi verifies what it publishes

Last updated: 2026-05-17

Sourzi is built for procurement managers who are about to type a number into a customs entry, a wire instruction, or a landed-cost model. The number has to be right. The biggest single risk in a regulatory dataset like this one is the LLM-drafting pattern of a confidently-stated fact that no primary source supports. The discipline below is how we work to keep that pattern out of the published content.

The eight fragile-fact classes

On 9 May 2026 we ran a full audit of the regulatory dataset (500 substance-regime entries) against primary sources and surfaced thirteen confirmed factual errors. Every one clustered around one of eight specific patterns. We now call these the fragile-fact classes and verify or hedge each one explicitly before any new regulatory entry ships.

1. US AD/CVD case numbers

The format A-570-XXX (anti-dumping orders against China) and C-570-XXX (countervailing orders) plus the related 731-TA-XXXX USITC investigation number are the customs-blocking ones. An off-by-one digit at customs entry either fails the entry or applies the wrong cash-deposit rate. Every case-number assertion on Sourzi carries an inline link to access.trade.gov or usitc.gov, or the rate is hedged with "verify against access.trade.gov before invoicing". The cash-deposit rate at customs is the operative number, not the original investigation rate.

2. EU regulation references

Every Reg YYYY/XXXX or Directive YYYY/XX/XX reference carries an inline EUR-Lex URL or a hedge. The audit found a regulation number incorrectly attributed to acrylonitrile that turned out to cover PET (Reg 2023/2659 is the EU provisional anti-dumping regulation for polyethylene terephthalate). One propagated wrong premise lands across multiple sub-fields fast. We verify the citation against eur-lex.europa.eu before publishing.

3. REACH Annex XIV authorisation-list claims

The Annex XIV authorisation list (roughly 60 substances) is NOT the SVHC candidate list (roughly 250 substances). Conflating the two was the most damaging error we found: acrylonitrile is on the SVHC candidate list since 13 January 2010 but not on Annex XIV. The two regimes have different operational consequences (Article 33 documentation duty vs sunset-date market-access prohibition). Every Annex XIV claim now points at the ECHA authorisation list; every SVHC claim points at the ECHA candidate list.

4. California Proposition 65 listing dates

The OEHHA Proposition 65 list is the canonical source. Every listing date on Sourzi carries an inline link to oehha.ca.gov or is hedged. The audit found mismatched listing dates across half a dozen carcinogens where the year had been guessed or carried over from older drafting.

5. OSHA permissible exposure limits (PELs)

Every PEL on Sourzi carries the 29 CFR section number plus a link to the OSHA annotated PEL table. Wrong PEL means wrong workplace health and safety program on the operator side: the substance gets handled at concentrations the SDS does not authorise, the JHA misclassifies the exposure, and the audit finds the gap later.

6. IARC Group classifications

Every IARC Group claim names the Monograph volume and links to monographs.iarc.who.int. The audit found acrylonitrile incorrectly cited to Volume 121 (which covers styrene). The correct citation is Volume 136 (2025). Volume number matters because the assessment in each volume is the published basis for the classification; quoting the wrong volume invalidates the reference even if the Group designation is right.

7. Percentage rates of all kinds

AD margin, countervailing margin, EU AD ad valorem rate, capacity share, market share, VAT export rebate, FTA preference rate. Each gets a primary-source citation or a hedge of the form "verify per ICIS / S&P / access.trade.gov / fta.go.kr" or similar. Rates change with administrative reviews and with the calendar; the rate quoted in a static dataset is the rate at the moment of drafting, not the rate at the moment the operator types it into a customs entry.

8. Producer plus city pairs

When Sourzi names a producer at a specific city (AdvanSix Hopewell Virginia, Wanhua Yantai, Sinopec Baling Hunan), the row carries the company-website URL or is hedged. A divested site, a renamed facility, or a wrong city sounds small but breaks trust fast in this audience. The audit spot-checked five pairs against company sites and corrected the misattributions.

The pattern flag

The audit also surfaced a structural signature of LLM-drafting failure. Any single specific fact (a case number, a regulation number, a percentage, a sunset date) repeated across more than three sub-fields of an entry is the propagation signature of a wrong premise. If the source is not locked, all repetitions are wrong. We verify or hedge ruthlessly across every reference, not just the first.

Per-substance time budget

The cadence that produced the audited errors was approximately five minutes per substance. The new floor is twelve to fifteen minutes per substance for primary-source verification across the eight fragile-fact classes above. Some hazard-light substances (no active anti-dumping case, no SVHC, no harmonised CLP) clear faster; high-density substances (Carc 1A or 1B with multiple precursor scheduling regimes and multiple producer rank claims) take longer.

The hedge language template

When a specific fact cannot be verified within reasonable lookup time, we replace the specific number, date, or case identifier with prose of the form "(verify against current [primary source name] before invoicing)". The hedge is the deliberate move: we will not assert a number we cannot source. Operators get a smaller dataset that they can trust, not a larger one that they cannot.

The Pro tools take the same approach. The US HTS lookup at /tools/pricing-and-quoting/us-hts-section-301-lookup asserts the MFN tariff lines verbatim from the USITC API and hedges Section 301 list assignments to ustr.gov. The China VAT export rebate calculator hedges every per-HS rate to chinatax.gov.cn rather than asserting a rate from an out-of-date circular. The Korea KCS lookup hedges MFN, KCFTA, and RCEP rates to UNI-PASS and fta.go.kr. The AU and US AD/CVD lookups hedge cash-deposit rates to the ADC and ACCESS portals respectively, where the binding number lives.

What this means for you

We hedge rather than guess so you do not invoice off a wrong number. On a single container of HS 7616.99 aluminium goods from a non-cooperative Chinese exporter under a current ADC measure, the wrong cash-deposit rate can wipe out months of margin in one customs entry. On a TSCA Section 5 PMN filing, a misclassified substance can hold a container for weeks at port while the importer scrambles to produce the paperwork. The discipline above is what keeps that risk on the right side of the wire.

The full audit history sits in /corrections-policy. The voice and sourcing policy sit in /editorial-policy.

How we update when regulations move

Regulations move on cadences that the dataset cannot match in real time, but we work to keep the gap to days rather than weeks. When a substance regime changes (a new sunset date on Annex XIV, a fresh MOF circular cutting an export rebate to 0 per cent, an ITC continuation determination, a new OEHHA Prop 65 listing), the regulatory entry is updated within seven days. The cron-refreshed Pro tool datasets (US AD/CVD, EU REACH SVHC and Annex XIV, US HTS chapters 28 to 39, AU anti-dumping measures, AICIS inventory, EU CBAM scope, Korea KCS tariff) carry a last-verified timestamp on every result row so the operator can see at a glance how fresh the dataset is.

When the upstream primary source goes through a structural change, we surface the gap rather than back-fill stale data. The DFAT consolidated list and the ECHA portal have both returned 403 responses to our refresh script at different times; both are disclosed in the seed-header drafting notes for the relevant Pro tool. We hold the snapshot date visible and direct operators to the primary source for the binding state. No silent staleness.

What we will not publish

We will not publish a specific anti-dumping cash-deposit rate without a primary-source link. We will not publish a REACH Annex XIV claim without an ECHA URL. We will not publish a Prop 65 listing date without an OEHHA reference. We will not publish a producer plus city pair without a corporate website. Where a fact does not clear that bar, the entry hedges or the entry does not ship. The published number is one the operator can defend in front of a compliance audit, a customs broker, or a finance review.