Remits + Health System Scoping

Overview

This page documents our methodology for estimating the number of stoploss payer-provider combinations identifiable from remits data and the additional coverage gained by extrapolating known combinations across health systems.

Source	Raw	Realistic	Basis
Remits scoring	~3,000	~780	Remits-based scoring identified ~3,000 high-confidence payer-provider combinations. A sample of 50 cross-referenced against Clear Rates found only 26% had no plausible percent-of-charge explanation, giving a realistic estimate of ~780.
Health system extrapolation	+8,962	+~4,104	8,962 additional combinations inferred by extrapolating MRF-extracted stoploss provisions to other providers in the same health system. Discounted by the 45.8% health system sharing rate.
Total	~11,962	~4,884

Assumptions

Scoring

Rate range filter — rates outside 25%–95% are excluded as implausible stoploss values
Revenue code weighting — non-drug, non-implant revenue codes are weighted 3X vs. drug and implant codes
Cluster selection — when multiple rate clusters exist for a provider/payer, the highest-confidence one is selected
Year-over-year cap — rate increases >5% year-over-year break the cross-year signal
Normalized scores are relative, not absolute — a score of 10 means "most stoploss-like in this population," not "confirmed stoploss." If the dataset contains a large share of percent-of-charge contracts, the top scorers may simply be the best percent-of-charge contracts

Contract Type

Percent-of-charge ambiguity — flat percent-of-charge contracts are indistinguishable from stoploss in remits. Both produce a dominant flat rate consistent across years and rev codes
No threshold identification — the scoring identifies the likely stoploss rate but not the dollar threshold that triggers it
Second dollar coverage — the scoring is calibrated for first dollar stoploss

Health System Extrapolation

45.8% sharing rate — based on MRF stoploss data, 45.8% of health system/payer/network/setting groups with more than one provider have identical stoploss terms across all providers in the group. We apply this 45.8% rate to the 8,962 raw health system combinations to arrive at the ~4,104 realistic estimate

Dataset (inherited from remits pipeline)

$150K fallback threshold — when no MRF threshold exists, claims are filtered to total billed ≥ $150K
Unmatched providers and payers are dropped — lines that cannot be matched to the hospital spine or payer map are excluded
Payment ratio filter (10–90%) — lines with allowed / billed outside this range are dropped
Network-level threshold detail is lost — MRF thresholds are averaged across networks per provider/payer combo

Remits Scoring Methodology

Recent table: tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_remits_2026_01_v5

Rate clusters are scored in two passes: within a single year, and then across years.

Within-Year Scoring

For each provider/payer/year combination:

Cluster rates — payment rates within 0.03 of each other (e.g., 64% and 66%) are treated as the same rate cluster.
Score each cluster — the cluster score is the number of distinct revenue codes paid at that rate multiplied by the total claim lines behind it.
Revenue code weighting — non-drug, non-implant revenue codes count 3x more than drug or implant revenue codes.

Cross-Year Scoring

Rate clusters are then compared across years:

Match tolerance — a slightly wider 0.05 tolerance is used (to account for annual contract increases).
Year multiplier — the number of years the rate appears in is multiplied into the score. A rate seen in all 4 years gets 4x the credit of a rate seen in only 1 year.
Selection — the highest-scoring rate cluster per provider/payer is selected as the final result.
Normalization — all scores are normalized to a 1–10 scale across the full dataset.

Scripts

Step 1 — Pull Raw Data

Pulls line-level remits data from remits_lines_by_year_2026_01, grouped by provider, payer, revenue code, year, and payment rate.

SQL

select
    provider_id,
    tq_payer_id,
    revenue_code,
    remit_year,
    line_perc_allowed,
    lines_paid_at_this_percentage,
    total_lines,
    round(lines_paid_at_this_percentage * 1.0 / total_lines, 2) as pct_lines
from tq_dev.internal_dev_mmalhotra_provisions.remits_lines_by_year_2026_01
order by 1, 2, 3, 4 asc

Step 2 — Score

Revenue code classification, within-year clustering, cross-year scoring, and confidence tier assignment.

Python

import numpy as np
import pandas as pd

# Revenue code classification
# Drug: pharmacy (025x), IV solutions (026x), drugs requiring detail (063x, 064x)
# Implant: medical/surgical supplies (027x)
# Other: all non-drug, non-implant codes
DRUG_PREFIXES = ('025', '026', '063', '064')
IMPLANT_PREFIXES = ('027',)

def classify_rev_code(rc):
    rc = str(rc)
    if rc.startswith(DRUG_PREFIXES):
        return 'drug'
    elif rc.startswith(IMPLANT_PREFIXES):
        return 'implant'
    else:
        return 'surgical'

TYPE_WEIGHTS = {'surgical': 3.0, 'drug': 1.0, 'implant': 1.0}

r['rev_code_type'] = r['revenue_code'].apply(classify_rev_code)
r['weighted_lines'] = r['lines_paid_at_this_percentage'] * r['rev_code_type'].map(TYPE_WEIGHTS)
r['surgical_rev_code'] = r['revenue_code'].where(r['rev_code_type'] == 'surgical')

# Filter to plausible stoploss rate range
r = r[r['line_perc_allowed'].between(0.25, 0.95)]

# Greedy 1D clustering: start a new cluster whenever the gap between consecutive rates exceeds tolerance
RATE_TOL = 0.03

def cluster_rates(rates, tol=RATE_TOL):
    sorted_rates = sorted(set(rates))
    label = {}
    cluster_start = sorted_rates[0]
    for rate in sorted_rates:
        if rate - cluster_start > tol:
            cluster_start = rate
        label[rate] = cluster_start
    return label

def assign_rate_bucket(group, tol=RATE_TOL):
    label_map = cluster_rates(group['line_perc_allowed'].values, tol=tol)
    group = group.copy()
    group['rate_bucket'] = group['line_perc_allowed'].map(label_map)
    return group

def scale_1_to_10(series):
    logged = np.log1p(series)
    mn, mx = logged.min(), logged.max()
    if mx == mn:
        return series * 0 + 10
    return ((logged - mn) / (mx - mn) * 9 + 1).round(1)

# Step 1: within-year clustering (0.03 tolerance)
r = r.groupby(['provider_id', 'tq_payer_id', 'remit_year'], group_keys=False).apply(assign_rate_bucket)

confidence = (
    r.groupby(['provider_id', 'tq_payer_id', 'remit_year', 'rate_bucket'])
    .agg(
        n_rev_codes=('revenue_code', 'nunique'),
        n_surgical_codes=('surgical_rev_code', 'nunique'),
        total_weighted_lines=('weighted_lines', 'sum'),
        total_lines=('lines_paid_at_this_percentage', 'sum'),
    )
    .reset_index()
    .rename(columns={'rate_bucket': 'line_perc_allowed'})
)
confidence['confidence'] = scale_1_to_10(
    confidence['n_rev_codes'] * np.log1p(confidence['total_weighted_lines'])
)

# Step 2: cross-year scoring (0.05 tolerance, n_years multiplier)
r2 = r.groupby(['provider_id', 'tq_payer_id'], group_keys=False).apply(
    lambda g: assign_rate_bucket(g, tol=0.05)
)
confidence2 = (
    r2.groupby(['provider_id', 'tq_payer_id', 'rate_bucket'])
    .agg(
        n_years=('remit_year', 'nunique'),
        n_rev_codes=('revenue_code', 'nunique'),
        n_surgical_codes=('surgical_rev_code', 'nunique'),
        total_weighted_lines=('weighted_lines', 'sum'),
        total_lines=('lines_paid_at_this_percentage', 'sum'),
    )
    .reset_index()
    .rename(columns={'rate_bucket': 'line_perc_allowed'})
)
confidence2['confidence'] = scale_1_to_10(
    confidence2['n_years'] * confidence2['n_rev_codes'] * np.log1p(confidence2['total_weighted_lines'])
)

# Step 3: select highest-scoring rate per provider/payer
best_rate = confidence2.sort_values(
    ['provider_id', 'tq_payer_id', 'confidence'], ascending=[True, True, False]
).groupby(['provider_id', 'tq_payer_id']).first().reset_index()

# Step 4: assign confidence tiers
bins = [0, 2, 4, 6, 8, 10]
labels = ['very low', 'low', 'medium', 'high', 'very high']
best_rate['confidence_tier'] = pd.cut(best_rate['confidence'], bins=bins, labels=labels)

Scoring Examples

Scores are normalized relative to the full dataset — a 10 means most stoploss-like in this population, not confirmed stoploss. Very high scores with large line counts (tens of thousands) are more likely percent-of-charge; smaller line counts with multi-year consistency and broad rev code coverage are stronger signals.

provider_id	payer_id	rate	years	rev codes	non-drug/implant codes	total lines	score	tier	notes
651	392	54%	4	112	96	67,692	10.0	very high	Large line count — possibly percent-of-charge
6546	522	37%	4	95	88	34,723	9.8	very high	Large line count — possibly percent-of-charge
2201	567	25%	4	91	76	21,890	9.7	very high	Large line count — possibly percent-of-charge
3520	705	49%	4	98	86	15,098	9.7	very high	Large line count — possibly percent-of-charge
6646	522	47%	4	93	82	11,860	9.6	very high	Large line count — possibly percent-of-charge
1164	111	61%	4	46	36	107	8.1	very high	Small line count, multi-year, broad rev code coverage — stronger signal
1088	42	76%	2	57	48	759	8.0	high	Two years reduces cross-year multiplier but still scores high

Clear Rates Validation

A sample of high-confidence remits combos was cross-referenced against Clear Rates to assess what share are likely stoploss vs. percent-of-charge.

Sample Findings (n=50)

Category	Count	%	Included in Realistic Estimate
Could be stoploss — no close % of charge match in Clear Rates	13	26%	Yes
Likely % of charge — rate matches common Clear Rates % of charge contract	8	16%	No
Not in Clear Rates — cannot determine	25	50%	No
Nothing > 1 rate score — weak signal	4	8%	No

Only combos where Clear Rates shows no plausible percent-of-charge explanation are carried forward. Combos not in Clear Rates are excluded.

"Could Be Stoploss" Examples

Selected from the 13 combos with no close percent-of-charge match in Clear Rates:

provider_id	payer_id	rate	years	rev codes	non-drug/implant codes	lines at rate	score	notes
4812	643	25%	4	66	58	1,097	8.9	Most common percent-of-charge in Clear Rates is 37%; also seeing 21%
5297	169	25%	4	39	32	462	8.2	Most common percent-of-charge contracts in Clear Rates are 59% and 53%
1225	354	31%	4	25	18	193	7.5	No percent-of-charge canonical contract methodologies in Clear Rates
5889	43	25%	4	28	24	67	7.5	No percent-of-charge canonical contract methodologies; seeing 25% non-canonical
2245	643	25%	3	38	32	49	7.5	Only 100% canonical percent-of-charge rates in Clear Rates
2376	272	50%	3	44	36	154	7.9	Most common percent-of-charge contracts in Clear Rates are 63% and 87%
1091	229	87%	2	32	29	39	6.8	Most common percent-of-charge in Clear Rates is 79%
1155	643	31%	2	46	38	142	7.4	Only 100% canonical percent-of-charge rates in Clear Rates
1564	643	37%	2	22	17	94	6.5	Most common percent-of-charge contracts in Clear Rates are 72% and 59%
976	958	31%	2	27	24	29	6.5	No canonical percent-of-charge contracts; top non-canonical is 33%
1204	174	25%	4	10	8	113	6.5	Only 100% percent-of-charge in Clear Rates
3036	151	25%	3	17	16	39	6.5	No percent-of-charge canonical contract methodologies in Clear Rates
1273	56	73%	1	35	30	35	6.1	No Clear Rates percent-of-charge rates close to this

Health System Extrapolation

Methodology

Starting from all known provider/payer combinations (from both remits and MRF-extracted provisions):

Find every health system that contains at least one provider from the known set.
For each health system / payer pair identified, find all other providers in that system where the combination is not already known.
Each of these becomes an inferred provider/payer combination.

This yields 8,962 raw additional combinations.

45.8% — of health system/payer/network/setting groups with more than one provider, 45.8% have identical stoploss terms across all providers in the group.

SQL

WITH expected_universe AS (
    -- Build the hospital provider universe and attach health system identity.
    SELECT
        p.provider_id,
        p.provider_name,
        hs.provider_id AS healthsystem_id,
        hs.provider_name AS healthsystem_name
    FROM tq_production.spines.spines_provider p
    INNER JOIN tq_production.spines.spines_provider_healthsystems hs ON
        hs.provider_id = p.provider_healthsystem_id
        AND p.provider_type = 'Hospital'
),
data AS (
    -- Attach stoploss rows to providers.
    SELECT
        eu.provider_id,
        eu.provider_name,
        eu.healthsystem_id,
        eu.healthsystem_name,
        s.payer_name,
        s.network_id,
        s.setting,
        s.stoploss_type,
        s.dollar_threshold,
        s.per_diem_threshold,
        s.per_diem_cap,
        s.percentage_reimbursement,
        s.per_diem_reimbursement,
        s.stoploss_threshold_type,
        s.stoploss_reimbursement_type,
        s.stoploss_cap_type,
        s.canonical_source_note
    FROM expected_universe eu
    LEFT JOIN tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_aggregated_2026_01_v5 s ON
        s.provider_id = eu.provider_id
),
healthsystem_payers AS (
    -- Find all payers that appear anywhere within each health system
    -- for inpatient stoploss rows.
    -- Right now - this assumes that all payers have a contract with all providers at a health system
    SELECT DISTINCT
        d.healthsystem_id,
        d.healthsystem_name,
        d.payer_name
    FROM data d
    WHERE
        d.setting = 'Inpatient' AND
        d.payer_name IS NOT NULL
),
provider_payer_scaffold AS (
    -- Create the full set of provider + payer combinations
    -- implied by the health system.
    SELECT
        eu.healthsystem_id,
        eu.healthsystem_name,
        eu.provider_id,
        eu.provider_name,
        hp.payer_name
    FROM expected_universe eu
    INNER JOIN healthsystem_payers hp ON
        eu.healthsystem_id = hp.healthsystem_id
),
provider_payer_status AS (
    -- For each provider + payer combination, summarize whether there is
    -- any direct inpatient row with a non-null canonical source note.
    SELECT
        pps.healthsystem_id,
        pps.healthsystem_name,
        pps.provider_id,
        pps.provider_name,
        pps.payer_name,
        max(
            CASE
                WHEN d.setting = 'Inpatient'
                    AND d.canonical_source_note IS NOT NULL
                THEN 1
                ELSE 0
            END
        ) AS has_direct_stoploss,
        max(
            CASE
                WHEN d.setting = 'Inpatient' THEN 1
                ELSE 0
            END
        ) AS has_any_inpatient_row
    FROM provider_payer_scaffold pps
    LEFT JOIN data d ON
        pps.provider_id = d.provider_id AND
        pps.payer_name = d.payer_name
    GROUP BY
        pps.healthsystem_id,
        pps.healthsystem_name,
        pps.provider_id,
        pps.provider_name,
        pps.payer_name
),
healthsystem_payer_donor AS (
    -- Choose one donor example per health system + payer.
    SELECT
        x.healthsystem_id,
        x.healthsystem_name,
        x.payer_name,
        x.donor_provider_id,
        x.donor_provider_name,
        x.donor_network_id,
        x.donor_setting,
        x.donor_stoploss_type,
        x.donor_dollar_threshold,
        x.donor_per_diem_threshold,
        x.donor_per_diem_cap,
        x.donor_percentage_reimbursement,
        x.donor_per_diem_reimbursement,
        x.donor_stoploss_threshold_type,
        x.donor_stoploss_reimbursement_type,
        x.donor_stoploss_cap_type,
        x.donor_canonical_source_note
    FROM (
        SELECT
            d.healthsystem_id,
            d.healthsystem_name,
            d.payer_name,
            d.provider_id AS donor_provider_id,
            d.provider_name AS donor_provider_name,
            d.network_id AS donor_network_id,
            d.setting AS donor_setting,
            d.stoploss_type AS donor_stoploss_type,
            d.dollar_threshold AS donor_dollar_threshold,
            d.per_diem_threshold AS donor_per_diem_threshold,
            d.per_diem_cap AS donor_per_diem_cap,
            d.percentage_reimbursement AS donor_percentage_reimbursement,
            d.per_diem_reimbursement AS donor_per_diem_reimbursement,
            d.stoploss_threshold_type AS donor_stoploss_threshold_type,
            d.stoploss_reimbursement_type AS donor_stoploss_reimbursement_type,
            d.stoploss_cap_type AS donor_stoploss_cap_type,
            d.canonical_source_note AS donor_canonical_source_note,
            row_number() OVER (
                PARTITION BY
                    d.healthsystem_id,
                    d.payer_name
                ORDER BY
                    d.dollar_threshold DESC,
                    d.percentage_reimbursement DESC
            ) AS rn
        FROM data d
        WHERE
            d.setting = 'Inpatient' AND
            d.canonical_source_note IS NOT NULL AND
            d.payer_name IS NOT NULL
    ) x
    WHERE
        x.rn = 1
)
SELECT DISTINCT
    pps.healthsystem_id,
    pps.healthsystem_name,
    pps.provider_id AS recipient_provider_id,
    pps.provider_name AS recipient_provider_name,
    pps.payer_name AS recipient_payer_name,
    pps.has_any_inpatient_row,
    pps.has_direct_stoploss,
    d.donor_provider_id,
    d.donor_provider_name,
    d.payer_name AS inherited_payer_name,
    d.donor_network_id AS inherited_network_id,
    d.donor_canonical_source_note AS inherited_canonical_source_note,
    d.donor_stoploss_type AS inherited_stoploss_type,
    d.donor_dollar_threshold AS inherited_dollar_threshold,
    d.donor_per_diem_threshold AS inherited_per_diem_threshold,
    d.donor_per_diem_cap AS inherited_per_diem_cap,
    d.donor_percentage_reimbursement AS inherited_percentage_reimbursement,
    d.donor_per_diem_reimbursement AS inherited_per_diem_reimbursement,
    d.donor_stoploss_threshold_type AS inherited_stoploss_threshold_type,
    d.donor_stoploss_reimbursement_type AS inherited_stoploss_reimbursement_type,
    d.donor_stoploss_cap_type AS inherited_stoploss_cap_type
FROM provider_payer_status pps
INNER JOIN healthsystem_payer_donor d ON
    pps.healthsystem_id = d.healthsystem_id AND
    pps.payer_name = d.payer_name
WHERE
    pps.has_direct_stoploss = 0 AND
    pps.provider_id <> d.donor_provider_id

Overview​

Assumptions​

Scoring​

Contract Type​

Health System Extrapolation​

Dataset (inherited from remits pipeline)​

Remits Scoring Methodology​

Within-Year Scoring​

Cross-Year Scoring​

Scripts​

Step 1 — Pull Raw Data​

Step 2 — Score​

Scoring Examples​

Clear Rates Validation​

Sample Findings (n=50)​

"Could Be Stoploss" Examples​

Health System Extrapolation​

Methodology​

Health System Sharing Rate​

SQL​

On this page:

Overview

Assumptions

Scoring

Contract Type

Health System Extrapolation

Dataset (inherited from remits pipeline)

Remits Scoring Methodology

Within-Year Scoring

Cross-Year Scoring

Scripts

Step 1 — Pull Raw Data

Step 2 — Score

Scoring Examples

Clear Rates Validation

Sample Findings (n=50)

"Could Be Stoploss" Examples

Health System Extrapolation

Methodology

Health System Sharing Rate

SQL