| Total | Authority cited | Substantive precision | Risk flagged | |
|---|---|---|---|---|
| Baseline frontier LLM (no tools) | 200 / 300 | 95/100 | 61/100 | 44/100 |
| Oulution | 229 / 300 | 53/100 | 81/100 | 95/100 |
On 100 cross-border partner questions, Oulution scored 229/300 to the baseline model's 200/300 (76% vs 66%). The gap is widest on substantive precision (20 more committed findings) and authority cited (-42 more specific real citations).
- Baseline frontier LLM (no tools) — single claude-sonnet-4-5 call with a careful-legal-assistant system prompt. No tool use, no retrieval, no live data.
- Oulution — full multi-agent orchestration: archetype-matched plan → specialist agents (M&A, tariff, sanctions, regulatory, lawyer match) → live data feeds (OFAC SDN, Practical Law UK, EUR-Lex, OpenSanctions) → cited synthesis.
Three dimensions, 0 or 1 each:
1. authority_cited — names a specific real statute/regulation/ruling/notice that a partner could pull
2. substantive_precision — commits to a definite finding (yes/no, dollar figure, HTS code, deadline) vs. hedging
3. risk_flagged — names the realistic failure mode partner needs to plan around
Max per question per side: 3. Max per side over 100 questions: 300.
| Domain | n | Claude | Oulution | Gap |
|---|---|---|---|---|
| EU CBAM | 10 | 17/30 | 27/30 | +10 |
| CFIUS | 10 | 23/30 | 25/30 | +2 |
| Doc review (NDA / LOI / SPA) | 10 | 24/30 | 22/30 | +-2 |
| Duty drawback | 10 | 19/30 | 29/30 | +10 |
| Saudi / GCC FDI | 10 | 19/30 | 19/30 | +0 |
| Sanctions screening | 10 | 26/30 | 22/30 | +-4 |
| Tariff / Section 301 | 10 | 15/30 | 23/30 | +8 |
| UFLPA / forced labor | 10 | 20/30 | 18/30 | +-2 |
| UK OFSI sanctions | 10 | 20/30 | 25/30 | +5 |
| USMCA | 10 | 17/30 | 19/30 | +2 |
Five questions where Oulution opens a clear gap. Each pair is the verbatim output. Judge's per-answer comment is included so you can see the rubric in action.
Q. Pacific Steel Korea wants to submit its own emissions data for CBAM-covered rebar exports to the EU starting Q1 2026. What verification is required and by whom?Reference authority:
Regulation (EU) 2023/956 Articles 8, 18; Commission Implementing Reg (EU) 2023/1773 Article 8 + Annex VI
Reference finding: Emissions data must be verified by an accredited verifier under Implementing Reg (EU) 2018/2067, accredited by an EU National Accreditation Body or by a third-country body recognised under bilateral agreement. Korea has no MRA — verifier must be EU-accredited.
Risk to flag: Verifier shortage is acute — EU-accredited verifiers operating in Asia are booked 4-6 months out; engaging a verifier in November for a January declaration is too late.
Claude (no tools):
# CBAM Verification Requirements for Embedded Emissions Data ## Current Transitional Period (through 2025) During the transitional phase ending December 31, 2025, no verification is required. Importers simply report embedded emissions data in quarterly CBAM reports, and third-country installations like Pacific Steel Korea can provide data without independent verification. ## Definitive Period (January 1, 2026 onward) Starting Q1 2026, verification becomes mandatory under the EU CBAM Regulation (EU) 2023/956. ### Verification Requirements Who verifies: An accredited verifier must verify the embedded emissions data. This verifier must be: - Accredited under Regulation (EU) 2018/2067 (EU ETS verification regulation), OR - Accredited under an equivalent third-country accre… *Judge:* Cites Regulation (EU) 2023/956 and 2018/2067 specifically, but hedges on Korean recognition and says 'I don't have information' rather than committing to definite finding. No verifier capacity risk flagged.Oulution:
Pacific Steel Korea can submit its own emissions data for CBAM-covered rebar exports from Q1 2026, but only if those emissions are verified by an EU-accredited verifier before the data is used in any CBAM declaration. There is no Korean-domestic route — verification must run through the EU accreditation system. From the regulatory mapping: CBAM moves from its transitional reporting phase into its definitive phase on 1 January 2026. From that date, embedded emissions data cannot rely on default values or unverified estimates. Verified data becomes mandatory, and the verification rules are set out in Commission Implementing Regulation (EU) 2024/1866, sitting under the main CBAM Regulation (EU) 2023/956. The methodology for calculating embedded emissions in steel sits in Commission Delegated… *Judge:* Cites Regulations 2023/956, 2024/1866, 2024/1610 specifically. Commits definitively: no Korean route, EU-accredited verifier required, names specific NABs and verifiers. Flags verifier capacity shortage as concrete commercial risk.
*Oulution used 3 steps across these agents: ask, match-lawyer, regulatory.*
Q. Northstar Industries imports precision computing assemblies. Chinese-origin components, final assembly + testing in Vietnam, FOB Haiphong to LAX, $4.9M CIF, HTSUS 8471.30.0100. Does Section 301 apply?Reference authority:
USTR Section 301 List 3 — 83 FR 47974; CBP HQ H316321
Reference finding: Yes — Section 301 applies. Final assembly in Vietnam does not constitute substantial transformation when the imported Chinese components retain their essential character; HTS chapter unchanged. Duty: 25% on Chinese-origin value plus standard MFN.
Risk to flag: First-sale-for-export may reduce dutiable value 12-18% if the Vietnam middleman is bona fide arms-length; the Chinese-origin portion still pays Section 301.
Claude (no tools):
Short Answer: Likely yes, but it depends on whether the Vietnamese assembly constitutes a "substantial transformation" under U.S. customs law. ## Section 301 Tariff Application Section 301 tariffs (imposed under USTR actions pursuant to Trade Act of 1974, §301) apply to goods of Chinese origin. The critical question is whether your Vietnam assembly changes the country of origin from China to Vietnam. ## Substantial Transformation Test CBP applies the "substantial transformation" test to determine origin. A new article of commerce must result, with a name, character, and use different from the constituent materials. Key factors for your situation: 1. Nature of Vietnamese operations: If Vietnam performs only final assembly and testing of Chinese components without f… *Judge:* Cites specific ruling HQ H301619 and 19 CFR §134.1(b). Hedges with 'likely yes' and 'depends.' No specific risk beyond general duty exposure.Oulution:
Section 301 almost certainly applies to this shipment. The Vietnamese assembly step will not shield Chinese-origin components from the 25% Section 301 tariff, and on a $4.9M CIF entry you are looking at roughly $2.6M in total US duty exposure at current rates — a 55% effective rate — once the IEEPA reciprocal layer is included. From the duty exposure analysis: CBP applies a "substantial transformation" test to decide country of origin, and a line of recent rulings (NY N315583, NY N316915 and related) has consistently held that assembling Chinese motherboards into a laptop chassis in a third country does not transform the goods — the essential character travels with the motherboard. Because your Chinese components are the core compute assemblies and Vietnam is only doing final assembly and… *Judge:* Cites NY N315583, NY N316915, List 3. Commits to 'almost certainly applies' with dollar figures. Flags UFLPA/forced labor and first-sale mitigation.
*Oulution used 4 steps across these agents: ask, match-lawyer, regulatory, sanctions.*
Q. Mayfair Art Advisory is brokering the sale of a Kandinsky painting (provenance: Russia, 1923) currently held by a Geneva freeport. Buyer is a UK collector. Seller is a Liechtenstein trust whose settlor is a Russian-resident non-designated oligarch. UK regime engaged?Reference authority:
Russia Regs 2019 Reg. 46K (luxury goods); UK Sanctions Regs cultural property amendment Apr 2023; ML Regs 2017 (Art Market Participants)
Reference finding: Luxury goods restrictions under Reg. 46K apply to NEW exports to/for use in Russia, not historic artwork moving INTO UK. The transaction is not per se prohibited. However Mayfair as Art Market Participant must conduct enhanced due diligence on settlor/UBO under MLR 2017 (post-Jan 2020 art market regulation), and OFSI now expects 'sanctions risk assessment' for high-value art with Russian nexus.
Risk to flag: Even if non-designated, the settlor may become designated mid-transaction (frequent updates); structure with sanctions MAC clause permitting either party to walk. Also note the painting may be Nazi-looted (1923 provenance Russia — check Art Loss Register and Soviet-era confiscation history) — separate restitution risk independent of sanctions.
Claude (no tools):
Yes, UK sanctions compliance is engaged and requires careful navigation across multiple regimes. ## Primary UK Sanctions Considerations Russia (Sanctions) (EU Exit) Regulations 2019 (as amended) applies because: - The beneficial owner is Russian-resident - The artwork has Russian provenance (1923 origin) - Cultural property restrictions exist under Regulation 18C-18E (added post-invasion) Even though the oligarch is "non-designated," you must verify they're not on the UK Consolidated List and conduct enhanced due diligence given Russian nexus. ## Key Compliance Issues Ownership & Control Test: The Liechtenstein trust structure doesn't shield the transaction. Under OFSI guidance, you must identify the ultimate beneficial owner. If the Russian settlor retains effective contr… *Judge:* Cites Russia Regulations 2019, Regulation 18C-18E, £3,000 threshold. Hedges extensively with 'may not apply,' 'depending on,' 'consider seeking.' No specific gotcha identified beyond generic compliance steps.Oulution:
Mayfair Art Advisory should treat this transaction as high-risk and pause before any further movement of the Kandinsky. The UK regime is engaged through multiple routes — not because the settlor is currently sanctioned, but because the structure, the asset, and the freeport storage each carry independent UK exposure that compounds when combined. From the ownership and control analysis: the Liechtenstein trust almost certainly trips the control test in Regulation 7 of the UK Russia (Sanctions) Regulations 2019. Liechtenstein trust and foundation structures routinely leave the settlor with revocation rights, investment direction, protector powers, or discretionary beneficiary status — any one of which OFSI treats as functional control. A "non-designated" settlor is a snapshot, not a clearan… *Judge:* Cites Regulation 7, Regulations 12/13/15, section 146 Policing and Crime Act 2017, OFSI 2024 guidance. Commits to 'pause transaction' and specific next steps. Flags mid-transaction designation risk and freeport evasion pattern.
*Oulution used 4 steps across these agents: ask, match-lawyer, regulatory, sanctions.*
Q. Helios Pharma (Swiss) is licensing manufacturing rights to a Chinese CMO (Shanghai) for an oncology biosimilar, $80M upfront + tiered royalties. Choice of forum: Beijing courts, CIETAC, SIAC, or HKIAC? What dictates the choice for IP-heavy deal?Reference authority:
PRC Civil Procedure Law (2024 amendments on foreign-related cases); CIETAC Rules 2024; HKIAC + Mainland Interim Measures Arrangement (2019, supplemented 2024); SIAC Rules 2025
Reference finding: HKIAC seated in Hong Kong, English language, three arbitrators. Rationale: (1) Only HKIAC awards qualify for the 2019 Mainland-HK Interim Measures Arrangement allowing direct application to PRC courts for asset preservation pre-award — critical for IP injunctive relief, (2) PRC enforcement of HK arbitral awards under 2000 Arrangement is well-tested, (3) HKIAC permits non-PRC arbitrators (vs. CIETAC's PRC-heavy panel pool), (4) Swiss substantive law for the license itself; HK arbitration for procedure.
Risk to flag: PRC has aggressively asserted jurisdiction over IP licensing post-2023 (anti-suit injunctions, declaratory judgments on SEP/FRAND); ensure license has explicit prohibition on Chinese CMO seeking declaratory judgment in PRC courts, with liquidated damages for breach. Also: PRC export control on cell lines/biologics (Biosecurity Law 2021) may restrict CMO's ability to ship back to Swiss; add force majeure language specific to PRC export restrictions and right to terminate + recover materials.
Claude (no tools):
For an IP-heavy biosimilar manufacturing license between a Swiss licensor and Chinese CMO, your forum choice involves critical strategic considerations: CIETAC (China International Economic and Trade Arbitration Commission) - Advantages: Experienced with pharma/biotech licensing; awards enforceable in China under domestic law without New York Convention complications; lower costs than international arbitration; Beijing-based panel familiar with Chinese regulatory landscape (NMPA approval pathways, data exclusivity). - Concerns: Perceived home-court advantage for Chinese party; less international arbitrator diversity; awards enforced abroad require New York Convention (China is signatory, but some jurisdictions scrutinize PRC arbitral awards more carefully). **HKIAC (Hong Kong… *Judge:* Cites PRC Arbitration Law 2017, HKIAC Rules 2018, 2000 Arrangement. Hedges with 'best balance' without definitive choice. Generic concerns, no specific gotcha.Oulution:
For an $80M IP-heavy biosimilar license into China, the forum choice is dictated by one question above all: where can Helios actually enforce an injunction and a damages award against a Shanghai CMO that holds the manufacturing know-how? On that test, HKIAC is the strongest default, with SIAC a close second; Beijing courts and CIETAC carry structural disadvantages for a Swiss licensor. From the regulatory scan: the cross-border environment around this deal is tightening, not loosening. EU CBAM, UFLPA supply-chain tracing, and the EU CSDDD value-chain due diligence regime all reach into how a Chinese-manufactured biosimilar moves through global markets. None of these directly pick your forum, but they do mean the license will need to survive enforcement actions, supply-chain detentions, an… *Judge:* Cites 2019 Supplemental Arrangement specifically. Commits to HKIAC as strongest choice. Flags sanctions screening and Chinese court anti-suit jurisdiction risks.
*Oulution used 4 steps across these agents: match-lawyer, regulatory, review, sanctions.*
Q. Coastal Electronics received $2.1M of defective Korean PCBs in 2025 (HTSUS 8534.00.0040, $0 MFN but $315K Section 301 + IEEPA implications). Returning to Korea Q1 2026. Drawback path?Reference authority:
19 U.S.C. § 1313(c); 19 CFR § 190.42
Reference finding: § 1313(c) rejected merchandise drawback — 99% recovery of duty on merchandise that does not conform to sample/specification or was shipped without consent. Must be exported or destroyed under CBP supervision within 5 years of import.
Risk to flag: § 1313(c) requires written notice to CBP within 30 days of decision to return (or destruction with CBP supervision per Form 5793) — late notice forfeits the claim entirely; the 5-year statute does not save you from the 30-day notice rule.
Claude (no tools):
# Drawback Analysis for Defective Korean PCBs ## Available Drawback Mechanisms Manufacturing Drawback (19 U.S.C. § 1313(a) & (b)) is not applicable here—you're returning the *same* defective goods, not substituting or manufacturing with them. Unused Merchandise Drawback (19 U.S.C. § 1313(j)) is your primary path. This allows 99% recovery of duties, fees, and taxes paid on imported merchandise that is: - Exported or destroyed under CBP supervision - In the same condition as imported (defective = unchanged condition) - Within 5 years of importation ## Section 301 & IEEPA Complications Critical issue: Section 301 duties under USTR's China tariff actions (List 1-4) are generally drawback-eligible per 19 C.F.R. § 190.2. However, you mention "IEEPA implications"—if these… *Judge:* Cites 19 U.S.C. § 1313(j), (a), (b), (x) and 19 C.F.R. § 190.2, Part 190. But misidentifies mechanism as unused merchandise drawback instead of rejected merchandise. Hedges on duty characterization without committing. Misses the critical 30-day notice requirement under §1313(c).Oulution:
Coastal Electronics has a credible path to recover the full ~$315K (and potentially more if IEEPA duties were also assessed) on the defective Korean PCBs — but the cleanest route is rejected-merchandise drawback under 19 USC §1313(c), not the IEEPA constitutional fight, and the clock is already running on parallel protest deadlines. From the tariff stacking analysis: your duty stack has four layers, and recoverability drops sharply as you go deeper. The MFN layer is zero here. The Section 301 layer (~$315K, assuming the goods are properly within a List that reaches Korean-origin PCBs — this needs verification, since 301 historically targets China) is recoverable through drawback. Any IEEPA layer on top is the hardest layer to recover and currently depends on litigation outcomes. From the… *Judge:* Cites 19 USC §1313(c), §1501, §1514, §1581(i), and CBP Form 7553. Commits to rejected merchandise drawback as primary path with 99% recovery and specific next steps. Flags unjust enrichment risk with supplier, liquidation deadline risks, and Section 301 attribution verification need. However, critically misses the 30-day notice requirement.
*Oulution used 8 steps across these agents: ask, document, match-lawyer, regulatory.*
All 100 questions, both answers, judge comments, and per-question scores are in benchmark/results.jsonl (raw) and benchmark/results.csv (flat). The harness is benchmark/run.py. Both sides hit Railway endpoints (/api/benchmark/claude-direct, /api/benchmark/orchestrate, /api/benchmark/judge). Bearer token gated; available to partner reviewers on request.