ATRSA Production-Readiness Audit

Full multi-reviewer convergence audit · mvp_v1 @ 1dbe5e9 · 2026-05-20 · v2

Reviewer Panel

#ReviewerMethodVerdict
1Codex GPT-5.5R1: 110 findings, 6 categoriesNO-GO
2Grok 4.3R1: 5 blockers + 5 systemic risks → R3 bindingGO-W/C (R1)NO-GO (R3)
3Claude main sessionR1: manual security red-teamGO-WITH-CONDITIONS
4code-quality agentarch + type safety + schemadefects
5test-coverage agent80-file suite vs 598-file codebasedefects
6deploy-infra agentDockerfile + fly + Sentry + DBdefects
7Gemini 2.5-proBLOCKED (monthly cap)
8feature-readiness agentFEATURES_STATUS.md vs reality (retry)defects
9/red-team Agent 1Input validation & injection5C / 7H
10/red-team Agent 2Process & resource safety5C / 7H
11/red-team Agent 3Trust boundary & identity (6 verified + 19 new)defects
12/red-team Agent 4Dependency & supply chain3C / 8H
13deepseek-r1:32b (local)Replaced Gemini quotaNO-GO
R3 Codex co-CTO bindingAll 21 blockers boundNO-GO
R3 Grok co-CTO bindingAll 21 blockers boundNO-GO

The 21 Bound Blockers

Each blocker bound by both co-CTOs independently. Where they disagreed on severity, the higher severity is taken (most-conservative rule for a regulated financial product). All file:line citations source-verified.

CRITICAL (12)

B1 · API v2 transfer IDOR — no owner filter
Codex: CRITICAL · Grok: CRITICAL · BIND
app/api/v2/transfers/[id]/route.ts:62-63, 186-188findById(id, environment) with no owner check. Any API key with transfers:read can read any transfer in the same environment; transfers:write can accept/reject/retry any transfer.
B2 · Customer erasure uses transfers:write scope, ignores LegalHold
Codex: CRITICAL · Grok: CRITICAL · BIND
app/api/v2/customers/[id]/erase/route.ts:83-93, 123 — irreversible KYC destruction authorized by transfers:write; LegalHold model never consulted. grep confirms zero LegalHold references in the entire file.
B3 · Veriscope inbound — unsigned, state-mutating, no body cap, zip-bomb DoS
Codex: CRITICAL · Grok: CRITICAL · BIND
app/api/system/trp/veriscope/incoming/route.ts:391-421 mutates KYC templates without sender verification; middleware.ts:96-97 + csrf.ts:151-153 bypass both auth and CSRF; attestation-decoder.ts uses unbounded inflateSync on attacker payload (zip-bomb DoS with worker concurrency-1 → entire compliance pipeline stalls).
B5 · Compliance checks run AFTER IVMS exchange (regulatory sequencing failure)
Codex: CRITICAL · Grok: HIGH → bound CRITICAL · BIND
app/lib/jobs/transfer-queue.ts:211-213 — the engineer's own comment: "Compliance checks do NOT run here. They run after both IVMS are exchanged." Originator PII transmits to counterparty BEFORE sanctions screening. If a transfer would be sanctions-blocked, the disclosure already happened.
B7 · Middleware admin gate dead code + withRBAC fails open + admin pages no RBAC
Codex: CRITICAL · Grok: HIGH → bound CRITICAL · BIND
Triple failure: (a) middleware.ts:184-193 gates non-existent /api/v1/*, (b) rbac.ts:91-95 returns allowed:true for unmapped paths (17 routes in ROUTE_PERMISSIONS vs 22 admin sections + 17 API routes), (c) admin server components call only requireAuth(). Net effect: a viewer-role user can read every admin page.
B10 · NO tenant/ownership model — STRUCTURAL ★ UNANIMOUS #1
Codex: CRITICAL · Grok: CRITICAL · BIND · both co-CTOs' top priority
prisma/schema.prismaTravelRuleTransfer and Customer have NO userId ownership field. ATRSA's authorization model assumes ONE trust domain per environment. This is the structural gap behind every IDOR finding. B1, B14, B2 cannot be fixed correctly without B10 first.
B11 · AuditLog has no hash chain + v2 routes skip AuditLog writes
Codex: CRITICAL · Grok: HIGH → bound CRITICAL · BIND
prisma/schema.prisma:194-211 — no prevHash / recordHash / signature columns. AUDIT_SIGNING_KEY + AUDIT_PUBLIC_KEY declared in env but never used (grep: 0 hits outside env.ts). /api/v2/** — 10 of 10 v2 routes write only Activity, not AuditLog. FEATURES_STATUS.md falsely claims "Immutable Audit Trail — Hash chain, content hashing, auto-lock" + "Digital Signatures — RSA-SHA256 signed audit exports."
B12 · Peer-controlled SSRF via TrustAnchor API_URL (cloud-metadata exfil)
Codex: CRITICAL · Grok: CRITICAL · BIND
app/lib/adapters/veriscope/webhook-dispatcher.ts:94 — outbound peer dispatcher SKIPS the SSRF guard that webhook-service.ts:76-135 applies. A peer who publishes a TrustAnchor API_URL pointing at 169.254.169.254 (AWS metadata service) extracts encrypted IVMS via the outbound webhook. The encryption is irrelevant — the attacker controls the destination.
B13 · Crypto-proof validator fails OPEN when Python verifier missing
Codex: CRITICAL · Grok: HIGH → bound CRITICAL · BIND
app/lib/adapters/veriscope/transition-validation.ts:299-301 — when the external Python verifier process is unavailable (likely on fly.io alpine container), the code silently APPROVES every BE_CRYPTO_PROOF_VERIFIED transition. Combined with B16 (env-leak via execFile), this is a compound failure: validator present = secrets leak; absent = validation fail-open.
B14 · Customer deactivate/reactivate — same IDOR + wrong-scope as B2
Codex: CRITICAL · Grok: CRITICAL · BIND
app/api/v2/customers/[id]/deactivate/route.ts + reactivate/route.ts — parallel pattern to B2. transfers:write scope, no owner filter, environment-only lookup.
B15 · Key rotation misses 3 of 7 encrypted entity classes — PERMANENT DATA LOSS
Codex: CRITICAL · Grok: CRITICAL · BIND
app/lib/utils/key-rotation.ts:519-537performKeyRotation covers TravelRuleProvider config, Integration config, User 2FA, TravelRuleTransfer IVMS. NOT covered: Customer.keys (per-customer secp256k1), webhook signing secrets, CTR regulatory fields (customerName, customerDob, customerIdNumber, conductingPersonName). After rotation + old-key removal, all customer veriscope keypairs become permanently undecryptable. Script reports success.
B4 · 2FA secret asymmetry — broken 2FA in production
Codex: HIGH · Grok: HIGH (bound HIGH but production-breaking) · BIND
app/lib/actions/auth.ts:127,134 signs with NEXTAUTH_SECRET; :184 verifies with TWO_FA_JWT_SECRET || NEXTAUTH_SECRET. In production where TWO_FA_JWT_SECRET is required AND different from NEXTAUTH_SECRET, sign uses key A and verify expects key B → tokens fail. Source-verified via direct read. NOTE: auth/config.ts:48-92 has a parallel 2FA path with consistent secret usage — two code paths exist.

HIGH (8)

B6 · CSRF v4 cookie names; v5 uses authjs.* (runtime verify required)
BIND
csrf.ts:197-202
B8 · Migration advisory lock illusory (separate psql sessions)
BIND
start.sh:27-40
B9 · Login rate-limit fail-open on Redis (DB fallback in limiter mitigates)
BIND
auth/config.ts:297-306 — limiter has DB fallback per Agent 2 correction; fix the auth-route .catch(()=>null), not the limiter.
B16 · Secrets leak to Python subprocess via execFile (full env passed)
BIND
NEXTAUTH_SECRET, CONFIG_ENCRYPTION_KEY, DATABASE_URL all leak on every cross-VASP transition.
B18 · Sentry replays 1.0 + no PII scrubber + raw process.env read
BIND
instrumentation-client.ts:11 — captures IVMS PII forms on every error event. No DPIA.
B19 · V1 doc-vs-code drift — FEATURES_STATUS describes 13 non-existent routes
BIND
Doc inverts V1/V2 maturity claim. V1 endpoints in doc all 404.
B20 · LOG_SHIPPING_S3_* prod-required but unused
BIND
4 prod-required env vars; 0 code references. "FATF 7-year retention" claim fictional.
B21 · ws@8.17.1 CVE GHSA-58qx-3vcg-4xpx silenced by --audit-level=high
BIND
Known CVE bypassed in CI.

MEDIUM (1)

B17 · @auth/prisma-adapter@2.11.1 peer-dep does not declare Prisma 7 support
BIND
Adapter behavior silently undefined with @prisma/client@7.8.0.

Co-CTO Top Priorities

Codex GPT-5.5Grok 4.3Convergence
#1 must-fixB10 (tenant model)B10 (tenant model)B10 ✓ unanimous
#2 must-fixB3 (Veriscope)B1 (IDOR)split — both bound CRITICAL
#3 must-fixB5 (compliance order)B12 (SSRF)split — both bound CRITICAL

Sprint Roadmap (revised v2)

Sprint 0 — Hold deploy

0-2 days

Sprint 1 — Stop the bleeding (12 CRITICAL)

1-2 weeks · no external traffic until done

Sprint 2 — Cover the cliff

2 weeks · test coverage on 15 untested API routes

Sprint 3 — Compliance + Production hygiene (8 HIGH)

1-2 weeks

Sprint 4 — Architectural cleanup

ongoing · debt

Sprint 5 — Regulator readiness

2+ weeks

Optimization Opportunities (for your CTO, beyond blockers)

  1. De-duplicate v2 POST transfers — re-implements 540 LOC that exists in transfer-workflow.ts. PATCH route already migrated.
  2. CI integration job spins up Postgres+Redis to run 2 test files — invest more or simplify.
  3. Coverage gate is currently meaninglesscoverage.all: false excludes untested files. 1-line fix.
  4. Server-actions-first is a strength — but the doc describes a REST product. 1-2 day reconciliation.
  5. Single Fly process group — split web and worker for independent scaling.
  6. Sentry session replay 1.0 captures customer PII — disable until DPIA approves.
  7. Rate-limiter has DB fallback (Agent 2 correction) — limiter itself is sound; fix the auth-route catch.
  8. Test/Live env isolation has real teeth — API-key environment binding closes the env-switch attack. Keep intact through any refactor.

Co-CTO Conditions (Verbatim)

No external traffic until tenancy/ownership is modeled and enforced, Veriscope inbound authentication/body limits are fixed, compliance checks gate exchange before IVMS release, destructive customer actions honor legal hold and correct scopes, RBAC is enforced on real routes/admin pages, outbound webhook SSRF protection is applied, crypto-proof validation fails closed, audit logging is immutable and complete, and key rotation covers every encrypted entity class with recovery tests.
— Codex GPT-5.5 (R3 binding)
B10 tenant model + B1/B12/B15 fixes + B3 inbound hardening required before any prod traffic or on-chain data.
— Grok 4.3 (R3 binding)

Closing — re-evaluated post-red-team

ATRSA is technically impressive for one engineer. The defects are not laziness — they are the architectural drift that happens when one person writes 125K LOC over months without a second pair of eyes. However, the multi-reviewer convergence surfaced systemic gaps that the engineer's own FEATURES_STATUS.md does NOT acknowledge:

These are not polish items. They are structural decisions that need a 1-2-week sprint to fix correctly before any external user touches the system. The good news: most fixes are well-scoped. The bad news: B10 requires a schema change that ripples through every repository.