Case Study / Collective Health

A Shelved Concept Becomes the Foundation for Claims AI

I originated the concept and guided a contract designer through the first version. It was rejected and shelved. Several months later it became the design foundation for Collective Health's claims AI initiative, and I drove it into active development as sole product designer.

RoleSenior Product Designer

DurationFall 2025–Present

CompanyCollective Health

StatusPilot in progress

A Shelved Concept Becomes the Foundation for Claims AI

The scale of the problem

19.6%

of all claims pended for manual review in 2025

1.3M

pended claims processed manually in 2025

873

active pend rules, updated multiple times per week

2,050h

MCA hours per month on Phase 1 pend rules alone

At a glance

The problem

COB claims require MCAs to cross-reference multiple systems, follow complex decision trees, and adjudicate each claim line individually. The cognitive load is structural, not a training problem. The company target is 90% auto-adjudication. The actual rate is 77 to 80%.

The insight

Adding fields to the screen saves minutes. An AI agent that follows the SOP and surfaces structured recommendations changes the work entirely. COB was the entry point. All complex claims were always the goal.

The outcome

The concept outlasted a competing prototype, a rejection, and an engineering-built alternative. It is now in active development with a pilot underway and a roadmap covering 77,000 claims per month.

Context

A gap the claims engine could not close on its own

When Collective Health's claims engine cannot process a claim automatically, it pends the claim for manual review by a Member Claims Associate. The company target is 90% auto-adjudication. In 2025 the actual rate was 77 to 80%. That gap represents over 1.3 million claims handled manually every year, and it grows as the member population scales.

The traditional path to closing that gap is engineering new logic into the claims engine directly. Each pend rule requires its own product and engineering effort, competes against other priorities, and can take quarters to ship. With 873 active pend rules and the list updated multiple times per week, the backlog is not shrinking through that approach alone.

Coordination of Benefits claims are the most complex category. They require MCAs to determine which insurance is primary, cross-reference member eligibility across external portals, and follow multi-step decision trees in internal SOPs. MCAs handle 60 per day versus 120 for standard claims.

MCAs were the workaround for a system that could not handle COB automatically. The question was not whether to fix that eventually. It was what to do while the fix was years away.

How it unfolded

This project moved through three distinct organizational moments. The design did not change fundamentally. The organizational context around it changed entirely.

2025
First attempt

An AI agent for COB, rejected for the right reasons.

The initial response to slow COB adjudication was to ask what data could be added to the existing screen. I observed the workflow and reached a different diagnosis. MCAs were not slow because they lacked information. They were slow because the process required constantly switching between the claim, an open SOP reference page, and multiple external systems, following a complex decision tree step by step with no system-level support.

I guided a contract designer toward an AI agent that would work through the SOP decision tree and surface structured findings and recommendations. The vision was always broader than COB. Complex claims of all types were the goal. COB was the entry point because it was the most clearly structured and most painful workflow.

Adding fields to the screen saves minutes. An agent that follows the SOP on the MCA's behalf changes the work entirely.

The Director of Engineering raised a direct objection: the SOPs were a workaround for a system that could not handle COB automatically. Building an AI to navigate those SOPs faster optimized the workaround rather than fixing the root cause. The concept was shelved. The objection was legitimate. It was also a multi-year engineering commitment with no committed timeline.

January 2026
Resurfaced

The organization arrived at the same problem statement independently.

A separate initiative formed to address the auto-adjudication gap at the company level. A tiger team proposed a prompt interface: a blank text input where MCAs could ask the AI questions about a claim. It was designed exactly as most AI products look today.

My manager caught wind of the project while I was on leave and recognized what the prompt interface got wrong. She repitched the agent concept with a direct argument: why make users ask questions when the AI already has access to everything it needs? We had the claim data and the SOPs. The agent answered the question before the MCA had to ask it.

The first presentation asked whether AI was right for COB. The second walked into a room that had already decided AI was the direction. That is a different conversation.

The concept was accepted. The formal initiative launched with cross-functional ownership across product, engineering, claims ops, and content design.

2025–2026
Defense

A competing prototype revealed why domain knowledge matters in this space.

Engineering built their own version of the agent in parallel. When I returned from leave and reviewed their prototype, the central problem was immediately visible. The interface presented a prominent DENY recommendation at the claim level. The assumption was that a claim has one outcome: paid or denied.

That assumption does not hold across many claim types. A claim can contain multiple claim lines, each representing a separate service. Denying one line while paying the rest is a normal adjudication outcome, not an edge case. A claim-level DENY recommendation does not represent that accurately and leads MCAs toward the wrong action, with financial and compliance consequences.

Engineering first concept: chat prompt interface with large red DENY panel — A prompt interface where MCAs type questions about a claim. The DENY panel on the right shows a claim-level recommendation in a prominent red header.

Engineering second iteration: sequential logic trail still leading with claim-level DENY — A sequential logic trail was added, but the interface still led with a claim-level DENY tag. The fundamental assumption about claim structure remained unchanged.

I recognized the flaw immediately because I already understood how claims are structured. The engineering team was learning the problem space in real time.

There was a second issue. A previous AI tool at Collective Health had produced inaccurate results and eroded user trust significantly. A design that gave confident recommendations with no visible reasoning would repeat that pattern. Legal scrutiny of AI in health insurance decisions added a compliance dimension: any system recommending claim decisions needed full auditability of how it reached its conclusions.

I documented both issues and defended the agent design in working sessions. The goal was to reframe what the agent needed to do: operate at the claim line level, show its reasoning, and earn trust through transparency.

2026
Current

From chat interface to structured decision support.

The original lo-fi concept had read like a chat. The AI narrated each step in sequence. For a workflow where MCAs move through 60 claims per day, that format was too slow and too verbose. The redesign organized the interface around pend rules instead.

Each pend rule became a decision unit. The AI surfaces a summary of findings with a recommendation per rule. Full reasoning detail is available on demand but not forced. The logic was direct: people use AI for summaries, not narrated walkthroughs. The agent should compress the SOP decision tree into a finding, not recreate it step by step.

Current design showing pend rule expanded with findings summary and recommendation — Current design: findings and recommendation organized per pend rule, with expandable detail on demand

The feedback mechanism required its own design attention. The previous AI tool had made feedback optional. We had seen what that produced: no reliable signal for improvement, no accountability when the model was wrong. This time feedback was required. Working with content design, the challenge was making it feel like completing the task rather than answering a survey. The dropdown captures four specific failure categories, not a binary rating.

01Copilot did not follow the SOP correctly

02Copilot followed the SOP, but the SOP has incorrect or missing information

03Copilot used incorrect data sources

04Other (with open field)

Each category points to a different type of problem. A model error, a gap in the SOP, and a bad data source require different fixes. The feedback design was built to make that distinction visible rather than collapsing everything into a single thumbs up or down.

Feedback mechanism modal showing four specific failure category options — Feedback mechanism: required, not optional. Four specific failure categories that surface to different teams for different types of fixes.

The pilot is in progress with two initial pend rules. The Phase 1 roadmap covers approximately 77,000 claims and 2,050 MCA hours per month. I am currently designing the next phase where the agent acts on its recommendations directly, without requiring the MCA to execute each step manually.

Key Decision

Prompt interface versus AI agent

The difference between these two approaches is not a design preference. It is a claim about what claims adjudication actually requires. Domain knowledge made that difference visible before any user tested either version.

Prompt interface

Ask the AI a question. A blank input where MCAs type questions about a claim. Familiar pattern. Puts the burden of knowing what to ask on the user.

Problem: Pended claims adjudication is not an information retrieval problem. MCAs need guidance through a structured decision process, not answers to questions they may not know to ask.

AI agent

The agent does the reasoning. The agent works through the SOP, identifies relevant pend rules, and surfaces findings with recommendations organized by decision unit.

Result: MCAs review and verify rather than reason from scratch. The design shows its work, which builds trust rather than demanding it.

The Tradeoff

Optimizing the workaround was the right call

The engineering director's original objection was not wrong. The SOPs exist because the claims engine cannot handle COB automatically. An AI agent that navigates them faster is optimizing the workaround rather than fixing the system.

That tension was named directly in working sessions. A senior claims ops lead saw validity in both positions. The decision to move forward was a deliberate bet: MCAs are doing this work today, and fixing the adjudication logic is a multi-year effort with no committed timeline. Supporting the people doing the work now does not prevent fixing the system later.

Phase 1 covers roughly 77,000 claims and 2,050 MCA hours per month. If the agent handles even a portion of that accurately, the operational case is clear. The pilot will test the assumption.

100% of AI-adjudicated claims in the pilot will be reviewed by an MCA before any decision is committed. As confidence in model accuracy builds, that percentage decreases.

The pilot is measuring MCA acceptance rate of Copilot recommendations, handle time per pended claim, and the proportion of claims processed with AI assistance versus manually.

The goal is not to replace MCA judgment. It is to reserve it for the cases that actually require it.

Reflection

What domain knowledge actually buys you

My contribution to this project was not continuous. I guided the initial concept and mentored the contractor who built the first version. My manager resurfaced it while I was on leave. I returned, took over as sole product designer, and have driven it to its current state.

What stayed constant across all three moments was the same core insight. Pended claims adjudication is a structured decision problem, not an information retrieval problem. That insight is what the prompt interface got wrong. It is what the binary claim-level recommendation got wrong. It is what the pend rule structure gets right.

Domain knowledge does not replace design skill. But in a space this complex, it is what lets you recognize the wrong answer before anyone has tested it. That is what happened in the working sessions where the design had to be defended, and it is why the agent design is what engineering is now building.