# Gate 2 — AI Match Reliability Refactor + Dashboard Metrics Wiring

## Scope

Stabilize and harden AI Match for Provider/Investor flows while preserving current UX surfaces. Wire reliability and quality metrics into admin observability.

## Objectives

- Eliminate brittle JSON parsing failures from model output.
- Split deterministic scoring from LLM narrative explanation.
- Add fallback hierarchy so matching never returns empty on transient model failure.
- Instrument end-to-end reliability and quality KPIs.

## Currently Broken vs Improved

### Broken / Risky Today

- AI Match depends on strict JSON model response shape.
- Single-call path; failures degrade UX abruptly.
- Hardcoded prompts and weak schema validation.
- Limited observability on parse failures and degraded responses.

### Improvement in Gate 2

- Deterministic rules + weighted score engine as source of truth.
- LLM narrative attached as non-critical enhancer.
- Structured schema validation with tolerant parser.
- Resilient fallback chain:
  1) primary model
  2) cheaper model
  3) cached prior match set
  4) deterministic-only match response

## Estimated Effort

- Full Gate 2: 14-22 hours + 4-6 hours QA
- Tuesday-safe slice (2A): 4-6 hours
- Post-Tuesday completion (2B): 10-16 hours

## Files Touched (Expected)

- `provider.html` (AI match modal/service calls, parsing, fallback UX)
- `investor.html` and/or `icp.html` (input parity and reliability hooks)
- `api/ai-*` (new or updated AI match endpoints)
- shared telemetry utilities (`czTrack`, worker fetch wrappers)
- admin metrics route/view for match reliability and quality stats

## Tuesday Criticality

- Core Tuesday critical path: Gate 1.
- Gate 2 is partially critical only if AI Match is central to demo.
- Recommended sequence:
  - Tuesday: Gate 2A reliability patch only.
  - Post-Tuesday: Gate 2B full model and analytics expansion.

## Breaking Change Risk

- Target is non-breaking behavior externally.
- Existing UX and endpoint contracts remain compatible where possible.
- Expected visible difference: more deterministic ranking and fewer blank/failed match refreshes.

## Verification / Proof It Works

1. AI Match refresh succeeds under model timeout/failure through fallback.
2. No malformed JSON crash paths in provider/investor surfaces.
3. Match response includes `score_source` (`deterministic` vs `hybrid`).
4. Reliability dashboard exposes:
   - request success rate
   - parse failure rate
   - fallback usage rate
   - p95 latency
5. Acceptance feedback loop events captured for future rank tuning.

## Definition of Done

- AI Match never hard-fails to empty state on transient upstream errors.
- Deterministic baseline ranking available independently of LLM.
- Reliability telemetry visible in admin.
- Backward-compatible UX and API behavior for existing users.

