ATRSA Production-Readiness Audit

Multi-reviewer convergence audit · mvp_v1 @ 1dbe5e9 · 2026-05-20
Verdict: NO-GO for external VASP traffic until the 9 production blockers below are closed. GO-WITH-CONDITIONS internally.

Reviewer panel

ReviewerMethodFindingsVerdict
Codex (GPT-5.5) Direct repo read, sandboxed, 110 categorized findings across 6 categories 8 CRIT 26 HIGH NO-GO
Grok 4.3 Curated brief, CTO production-readiness review 5 named blockers + 5 systemic risks GO-WITH-CONDITIONS
Claude (main session) Manual line-by-line read of auth + middleware + RBAC + CSRF + encryption 7 CRIT 10 HIGH GO-WITH-CONDITIONS
Code-quality agent Architecture + type safety + schema + React hooks 2 CRIT 9 HIGH 12 MED Defects-only
Test-coverage agent Mapping __tests__/ against production surfaces (598 files) 6 CRIT 9 HIGH Defects-only
Deploy-infra agent Dockerfile, fly.toml, Sentry, BullMQ, Prisma migrations, Redis 3 CRIT 9 HIGH Defects-only
Gemini 2.5-pro BLOCKED — monthly spending cap exceeded
Feature-readiness agent 11.5min run, 127 tools, crashed mid-stream — coverage folded into Codex partial CRASHED

The 9 Production Blockers

Flagged by 2+ reviewers independently. All must close before external VASP traffic. Counts of findings per category appear in the heatmap below.

BLOCKER-1 · API v2 transfer object-level authorization missing (IDOR)
CRITICAL Codex SEC-C10 + SEC-C11 · single highest production risk
GET and PATCH at app/api/v2/transfers/[id]/route.ts:61-63, 186-190 look up transfers by id + environment only. No owner / API-key / tenant check. Any transfers:read key in an environment can read ANY transfer by ID. Any transfers:write key can accept, reject, or retry someone else's transfer.
BLOCKER-2 · Customer erasure exposed through transfers:write, ignores legal holds
CRITICAL Codex SEC-C13 + FEAT-C3 · irreversible, regulator-facing
app/api/v2/customers/[id]/erase/route.ts:83-93, 123 authorizes irreversible KYC/customer destruction with transfers:write. Checks "active transfers" only; never consults LegalHold, RetentionPolicy, or RetentionExecutionLock models that exist in the schema.
BLOCKER-3 · Veriscope inbound webhook is unauthenticated AND CSRF-bypassed AND mutates persistent state
CRITICAL Codex SEC-C1 + SEC-C2 + SEC-C3 · largest external attack surface
The 1334-LOC app/api/system/trp/veriscope/incoming/route.ts persists peer-supplied TA-account, callback, and identity fields into KYC templates (lines 391-421) without verifying sender. The stock Veriscope protocol is unsigned by design — but the route also mutates persistent state. Skipped by both middleware.ts:94-107 and csrf.ts:151-153. Zero integration tests. Only ingress controls are 120/min IP rate limit + Zod schema validation.
BLOCKER-4 · 2FA token secret mismatch — second-factor BROKEN in production
CRITICAL Codex SEC-C5 · run integration test to confirm
app/lib/actions/auth.ts:127-135 signs the 2FA JWT with NEXTAUTH_SECRET. app/lib/actions/auth.ts:181-185 verifies with TWO_FA_JWT_SECRET || NEXTAUTH_SECRET. In production where TWO_FA_JWT_SECRET is required and different from NEXTAUTH_SECRET, sign uses key A, verify expects key B — tokens fail. Either 2FA is broken or silently bypassed. Note: this contradicts what I read in auth/config.ts:48-92 which uses a consistent getTwoFaJwtSecret(). The two files differ — run the integration test.
BLOCKER-5 · Compliance checks run AFTER Travel Rule data exchange
CRITICAL Codex FEAT-C1 · regulatory sequencing failure
Worker comment at app/lib/jobs/transfer-queue.ts:211-213 explicitly states compliance checks (sanctions, blockchain screening) do NOT run before provider cascade. They run AFTER both IVMS payloads are exchanged. Sensitive originator IVMS101 data (names, addresses) is transmitted to counterparty VASPs before sanctions screening. If a transfer would have been sanctions-blocked, the disclosure already happened.
BLOCKER-6 · CSRF session-cookie detection uses v4 names — bypassable on Auth.js v5
CRITICAL Codex SEC-C6 + Claude SEC-C3 + Grok blocker #2 · runtime verify in 5 min
csrf.ts:197-202 checks next-auth.session-token (v4 names). Auth.js v5 defaults to authjs.session-token. If v5 uses authjs.* names, any browser-session + arbitrary x-api-key header request bypasses CSRF entirely (skip logic at csrf.ts:147-149 triggers because hasSessionCookie returns false). Cross-origin attacker against a logged-in admin → state-changing action runs as victim. Verify in dev session via document.cookie.
BLOCKER-7 · Middleware admin gate is dead code; withRBAC fails OPEN on unmapped routes
CRITICAL 4-reviewer convergence · highest-leverage fix per Grok
Three independent failures stack: (a) middleware.ts:184-193 gates /api/v1/admin/* and /api/v1/users/* — these paths do not exist. (b) rbac.ts:91-95 returns {allowed: true} when path missing from ROUTE_PERMISSIONS (17 routes vs 79 admin sections + 17 API routes). (c) Server components like app/admin/users/page.tsx:18-35 call only requireAuth() — no permission check. A viewer user can hit every admin page and read every user/transfer/customer record.
BLOCKER-8 · Migration advisory lock is illusory — concurrent migrations can corrupt _prisma_migrations
HIGH Codex INFRA-C1 + deploy-infra DI-C2
start.sh:27-40: each psql -c "SELECT pg_advisory_lock(42)" is a separate session — the session-level lock is released the instant psql exits. npx prisma migrate deploy runs with no lock held. Comment at L26 claims the opposite. On Fly scale-up, two machines can race the migration.
BLOCKER-9 · Login + general rate-limiting fails OPEN on Redis outage
HIGH Claude SEC-C5 + deploy-infra DI-H5 + Grok blocker #4
auth/config.ts:297-306 and redis/client.ts:14-30 use .catch(() => null) on every rate-limit call. Redis down or slow → null → treated as allowed. Brute-force throttling vanishes during the exact outage windows when credential-stuffing campaigns happen.

Risk Heatmap

Category
CRIT
HIGH
Notable
Security & Auth
9
14
BLOCKERS 1-4, 6, 7; AES-GCM AAD gap; PII key sharing; 2FA bypass; race conditions on backup codes / reset tokens
Compliance / Regulatory
2
5
Sequencing failure (BLOCKER-5); FATF R16 beneficiary; IVMS101 optional identifiers; jurisdiction fail-open
Infrastructure / Deploy
4
9
Migration race (BLOCKER-8); single Fly process; no backups; Sentry PII; env-var drift; PII-in-Redis
Code Quality / Architecture
2
9
God files; STR fire-and-forget; outbox audit gaps; v2 duplicated logic; type-any creep
Tests
6
6
Coverage gate broken; 15/17 API routes no integration test; ECIES untested; middleware untested
Features vs claims
1
5
STR not auto-generated; API-key expiry false claim; retention models present but not enforced
Total unique
24
48
Multi-reviewer convergence counted once

Sprint Roadmap

Sprint 1 — Stop the bleeding

1-2 weeks · No external traffic until done

Sprint 2 — Cover the cliff

2 weeks · Test coverage on the 15 untested API routes

Sprint 3 — Production hygiene

1-2 weeks · Operational and observability

Sprint 4 — Compliance correctness

2 weeks · Regulator-facing

Sprint 5 — Architecture cleanup

Ongoing · debt repayment

What I would tell the engineer

The middleware admin gate and RBAC map are the highest-leverage fixes; everything else is secondary until those are closed.

Rate-limit and CSRF paths must not fail open on transient infra problems.

Treat the 1862-line transfer consolidator and the ECIES implementation as security-critical and require tests before any production keys.

Stop updating FEATURES_STATUS.md until the claims can be proven by running tests or schema inspection.

The solo-build scope is impressive; the remaining work is now focused hardening rather than new features.
— Grok 4.3, CTO review

Closing observation

ATRSA is, on net, an unusually clean hand-built MVP for a regulated product. Zero TODO: / FIXME: in app/, structured pino logging everywhere, strong env validation, AES-GCM at rest, double-submit CSRF, single-use 2FA JWTs, audit logging, header spoofing prevention, PII scrubbers. The defects are not laziness — they are the kind of architectural drift that happens when one person writes 125K LOC over months without a second pair of eyes. The blockers above are mostly one engineer-day each to fix individually; the test coverage gap is the only thing that takes real calendar time.

Reviewer disagreements (verify before acting)

  1. CI workflows existence: test-coverage agent reports .github/workflows/test.yml exists with a 60% coverage gate. Deploy-infra agent reports "no CI workflows in repo." Run ls -la .github/workflows/ to resolve.
  2. 2FA secret consistency: Codex says sign/verify use different secrets in app/lib/actions/auth.ts. Direct read of auth/config.ts showed consistent secret usage. The two files differ — run an integration test against the live 2FA flow.
  3. CSRF cookie name (BLOCKER-6): three reviewers flagged this on static read. Open a browser session and inspect document.cookie in devtools to confirm.