Summary
Goals
The two main goals of this project are:
- Identify as many stoploss provisions as possible — find and group clauses using price transparency and claims data.
- Scope the expected universe — define a denominator so coverage is quantified and gaps are clear.
Using a 3-step phased approach with different datasets:
- Extract — have high confidence in our MRF mined notes.
- Validate — use CC data to understand provision shapes, solidify our classification system, and confirm found provisions.
- Infer — identify trends in claims remits and allowables data to fill in gaps. Can also explore inferring from extracted notes to healthsystem level etc.
Organized around four key metrics:
- Coverage — Track provisions against the full universe of payer/provider/network combos we expect to exist and be important.
- Classification — Categorize the types of provisions and pieces of the formula.
- Transparent Methodology — Link provisions to the rule(s) that found it.
- Confidence — Group provisions by confidence and use this to drive quality and QA.
Implementation
Coverage —
We measure coverage against the full universe of stoploss-eligible payer-provider combinations from MRF data. Current weighted national coverage is ~10–15% by net patient revenue ($19.9 billion of $133.2 billion total NPR covered). Coverage is NPR-weighted — a 50% coverage rate in Texas matters more than 50% in Vermont because Texas represents far more net patient revenue. See the Scoping page for the full state-by-state breakdown and interactive coverage map.
Classification — There are four field types we are using to classify the different levers of stoploss structure.
- Stoploss Type — what portion of the claim the rate applies to
- First dollar — entire claim
- Second dollar — only the amount above the threshold
- Threshold — what triggers the provision to kick in
- Dollar-based (e.g., claim exceeds $100,000)
- LOS-based (e.g., stay is longer than 5 days)
- Reimbursement — the rate that applies once the threshold is crossed
- Percentage of charges
- Per diem dollar amount
- Cap — an upper bound on reimbursement, when present
- Daily per diem
- None
Transparent Methodology — While there are many different rules we are using to increase our stoploss capture rate, there are 3 main buckets that they fall into.
- Keyword-anchored — patterns triggered by stoploss-specific language (e.g., "exceeds", "threshold", "outlier")
- Dollar, percentage, or day value — numeric extraction for thresholds, reimbursement rates, and per diem amounts
- Known patterns from auto-generated notes — common formats that require specific patterns for extraction
Confidence - We are using points to rate records, but this will need to be refined to account for manual review, as well as the introduction of other data sources and methodologies.
- Each record receives a confidence score, which is the sum of the scores of all rules that fired for that record.
- Higher-confidence patterns contribute more points. Values that fall within plausible ranges for a real stoploss provision add bonus points.
- The higher the total score, the more trustworthy the extraction. Scores are used to prioritize manual review and drive QA.
Status
Phase 1 (MRF Extraction) — In progress. Mining stoploss provisions from hospital price transparency MRF files using regex-based extraction pipeline.
Current table: tq_dev.internal_dev_provisions.provisions_stoploss_aggregated_2026_01_v3
Next steps:
- Phase 2 (CC Validation) — Use claims/cost data to validate extracted provisions and refine classification
- Phase 3 (Inference) — Fill coverage gaps using statistical inference from validated provisions and claims remit patterns
- Expand scoping to track coverage gains across phases