Skip to main content

Summary

Goals

The two main goals of this project are:

  1. Identify as many stoploss provisions as possible — find and group clauses using price transparency and claims data.
  2. Scope the expected universe — define a denominator so coverage is quantified and gaps are clear.

Using a 3-step phased approach with different datasets:

  1. Extract — have high confidence in our MRF mined notes.
  2. Validate — use CC data to understand provision shapes, solidify our classification system, and confirm found provisions.
  3. Infer — identify trends in claims remits and allowables data to fill in gaps. Can also explore inferring from extracted notes to healthsystem level etc.

Organized around four key metrics:

  1. Coverage — Track provisions against the full universe of payer/provider/network combos we expect to exist and be important.
  2. Classification — Categorize the types of provisions and pieces of the formula.
  3. Transparent Methodology — Link provisions to the rule(s) that found it.
  4. Confidence — Group provisions by confidence and use this to drive quality and QA.

Implementation

Coverage

We measure coverage against the full universe of stoploss-eligible payer-provider combinations from MRF data. Current weighted national coverage is ~10–15% by net patient revenue ($19.9 billion of $133.2 billion total NPR covered). Coverage is NPR-weighted — a 50% coverage rate in Texas matters more than 50% in Vermont because Texas represents far more net patient revenue. See the Scoping page for the full state-by-state breakdown and interactive coverage map.

Classification — There are four field types we are using to classify the different levers of stoploss structure.

  • Stoploss Type — what portion of the claim the rate applies to
    • First dollar — entire claim
    • Second dollar — only the amount above the threshold
  • Threshold — what triggers the provision to kick in
    • Dollar-based (e.g., claim exceeds $100,000)
    • LOS-based (e.g., stay is longer than 5 days)
  • Reimbursement — the rate that applies once the threshold is crossed
    • Percentage of charges
    • Per diem dollar amount
  • Cap — an upper bound on reimbursement, when present
    • Daily per diem
    • None

Transparent Methodology — While there are many different rules we are using to increase our stoploss capture rate, there are 3 main buckets that they fall into.

  • Keyword-anchored — patterns triggered by stoploss-specific language (e.g., "exceeds", "threshold", "outlier")
  • Dollar, percentage, or day value — numeric extraction for thresholds, reimbursement rates, and per diem amounts
  • Known patterns from auto-generated notes — common formats that require specific patterns for extraction

Confidence - We are using points to rate records, but this will need to be refined to account for manual review, as well as the introduction of other data sources and methodologies.

  • Each record receives a confidence score, which is the sum of the scores of all rules that fired for that record.
  • Higher-confidence patterns contribute more points. Values that fall within plausible ranges for a real stoploss provision add bonus points.
  • The higher the total score, the more trustworthy the extraction. Scores are used to prioritize manual review and drive QA.

Status

Phase 1 (MRF Extraction) — In progress. Mining stoploss provisions from hospital price transparency MRF files using regex-based extraction pipeline.

Current table: tq_dev.internal_dev_provisions.provisions_stoploss_aggregated_2026_01_v3

Next steps:

  • Phase 2 (CC Validation) — Use claims/cost data to validate extracted provisions and refine classification
  • Phase 3 (Inference) — Fill coverage gaps using statistical inference from validated provisions and claims remit patterns
  • Expand scoping to track coverage gains across phases