Summary

Goals

The two main goals of this project are:

Identify as many stoploss provisions as possible — find and group clauses using price transparency and claims data.
Scope the expected universe — define a denominator so coverage is quantified and gaps are clear.

Using a 3-step phased approach with different datasets:

Extract — have high confidence in our MRF mined notes.
Validate — use CC data to understand provision shapes, solidify our classification system, and confirm found provisions.
Infer — identify trends in claims remits and allowables data to fill in gaps. Can also explore inferring from extracted notes to healthsystem level etc.

Organized around four key metrics:

Coverage — Track provisions against the full universe of payer/provider/network combos we expect to exist and be important.
Classification — Categorize the types of provisions and pieces of the formula.
Transparent Methodology — Link provisions to the rule(s) that found it.
Confidence — Group provisions by confidence and use this to drive quality and QA.

Implementation

Coverage —

We measure coverage against the full universe of stoploss-eligible payer-provider combinations from MRF data. Current weighted national coverage is ~10–15% by net patient revenue ($19.9 billion of $133.2 billion total NPR covered). Coverage is NPR-weighted — a 50% coverage rate in Texas matters more than 50% in Vermont because Texas represents far more net patient revenue. See the Scoping page for the full state-by-state breakdown and interactive coverage map.

Classification — There are four field types we are using to classify the different levers of stoploss structure.

Stoploss Type — what portion of the claim the rate applies to
- First dollar — entire claim
- Second dollar — only the amount above the threshold
Threshold — what triggers the provision to kick in
- Dollar-based (e.g., claim exceeds $100,000)
- LOS-based (e.g., stay is longer than 5 days)
Reimbursement — the rate that applies once the threshold is crossed
- Percentage of charges
- Per diem dollar amount
Cap — an upper bound on reimbursement, when present
- Daily per diem
- None

Transparent Methodology — While there are many different rules we are using to increase our stoploss capture rate, there are 3 main buckets that they fall into.

Keyword-anchored — patterns triggered by stoploss-specific language (e.g., "exceeds", "threshold", "outlier")
Dollar, percentage, or day value — numeric extraction for thresholds, reimbursement rates, and per diem amounts
Known patterns from auto-generated notes — common formats that require specific patterns for extraction

Confidence - We are using points to rate records, but this will need to be refined to account for manual review, as well as the introduction of other data sources and methodologies.

Each record receives a confidence score, which is the sum of the scores of all rules that fired for that record.
Higher-confidence patterns contribute more points. Values that fall within plausible ranges for a real stoploss provision add bonus points.
The higher the total score, the more trustworthy the extraction. Scores are used to prioritize manual review and drive QA.

Status

Phase 1 (MRF Extraction) — In progress. Mining stoploss provisions from hospital price transparency MRF files using regex-based extraction pipeline.

Current table: tq_dev.internal_dev_provisions.provisions_stoploss_aggregated_2026_01_v3

Next steps:

Phase 2 (CC Validation) — Use claims/cost data to validate extracted provisions and refine classification
Phase 3 (Inference) — Fill coverage gaps using statistical inference from validated provisions and claims remit patterns
Expand scoping to track coverage gains across phases

Goals​

Implementation​

Status​

On this page:

Goals

Implementation

Status