Remits + Health System Scoping
Overview
This page documents our methodology for estimating the number of stoploss payer-provider combinations identifiable from remits data and the additional coverage gained by extrapolating known combinations across health systems.
| Source | Raw | Realistic | Basis |
|---|---|---|---|
| Remits scoring | ~3,000 | ~780 | Remits-based scoring identified ~3,000 high-confidence payer-provider combinations. A sample of 50 cross-referenced against Clear Rates found only 26% had no plausible percent-of-charge explanation, giving a realistic estimate of ~780. |
| Health system extrapolation | +8,962 | +~4,104 | 8,962 additional combinations inferred by extrapolating MRF-extracted stoploss provisions to other providers in the same health system. Discounted by the 45.8% health system sharing rate. |
| Total | ~11,962 | ~4,884 |
Assumptions
Scoring
- Rate range filter — rates outside 25%–95% are excluded as implausible stoploss values
- Revenue code weighting — non-drug, non-implant revenue codes are weighted 3X vs. drug and implant codes
- Cluster selection — when multiple rate clusters exist for a provider/payer, the highest-confidence one is selected
- Year-over-year cap — rate increases >5% year-over-year break the cross-year signal
- Normalized scores are relative, not absolute — a score of 10 means "most stoploss-like in this population," not "confirmed stoploss." If the dataset contains a large share of percent-of-charge contracts, the top scorers may simply be the best percent-of-charge contracts
Contract Type
- Percent-of-charge ambiguity — flat percent-of-charge contracts are indistinguishable from stoploss in remits. Both produce a dominant flat rate consistent across years and rev codes
- No threshold identification — the scoring identifies the likely stoploss rate but not the dollar threshold that triggers it
- Second dollar coverage — the scoring is calibrated for first dollar stoploss
Health System Extrapolation
- 45.8% sharing rate — based on MRF stoploss data, 45.8% of health system/payer/network/setting groups with more than one provider have identical stoploss terms across all providers in the group. We apply this 45.8% rate to the 8,962 raw health system combinations to arrive at the ~4,104 realistic estimate
Dataset (inherited from remits pipeline)
- $150K fallback threshold — when no MRF threshold exists, claims are filtered to total billed ≥ $150K
- Unmatched providers and payers are dropped — lines that cannot be matched to the hospital spine or payer map are excluded
- Payment ratio filter (10–90%) — lines with
allowed / billedoutside this range are dropped - Network-level threshold detail is lost — MRF thresholds are averaged across networks per provider/payer combo
Remits Scoring Methodology
Recent table: tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_remits_2026_01_v5
Rate clusters are scored in two passes: within a single year, and then across years.
Within-Year Scoring
For each provider/payer/year combination:
- Cluster rates — payment rates within 0.03 of each other (e.g., 64% and 66%) are treated as the same rate cluster.
- Score each cluster — the cluster score is the number of distinct revenue codes paid at that rate multiplied by the total claim lines behind it.
- Revenue code weighting — non-drug, non-implant revenue codes count 3x more than drug or implant revenue codes.
Cross-Year Scoring
Rate clusters are then compared across years:
- Match tolerance — a slightly wider 0.05 tolerance is used (to account for annual contract increases).
- Year multiplier — the number of years the rate appears in is multiplied into the score. A rate seen in all 4 years gets 4x the credit of a rate seen in only 1 year.
- Selection — the highest-scoring rate cluster per provider/payer is selected as the final result.
- Normalization — all scores are normalized to a 1–10 scale across the full dataset.
Scripts
Step 1 — Pull Raw Data
Pulls line-level remits data from remits_lines_by_year_2026_01, grouped by provider, payer, revenue code, year, and payment rate.
SQL
select
provider_id,
tq_payer_id,
revenue_code,
remit_year,
line_perc_allowed,
lines_paid_at_this_percentage,
total_lines,
round(lines_paid_at_this_percentage * 1.0 / total_lines, 2) as pct_lines
from tq_dev.internal_dev_mmalhotra_provisions.remits_lines_by_year_2026_01
order by 1, 2, 3, 4 asc
Step 2 — Score
Revenue code classification, within-year clustering, cross-year scoring, and confidence tier assignment.
Python
import numpy as np
import pandas as pd
# Revenue code classification
# Drug: pharmacy (025x), IV solutions (026x), drugs requiring detail (063x, 064x)
# Implant: medical/surgical supplies (027x)
# Other: all non-drug, non-implant codes
DRUG_PREFIXES = ('025', '026', '063', '064')
IMPLANT_PREFIXES = ('027',)
def classify_rev_code(rc):
rc = str(rc)
if rc.startswith(DRUG_PREFIXES):
return 'drug'
elif rc.startswith(IMPLANT_PREFIXES):
return 'implant'
else:
return 'surgical'
TYPE_WEIGHTS = {'surgical': 3.0, 'drug': 1.0, 'implant': 1.0}
r['rev_code_type'] = r['revenue_code'].apply(classify_rev_code)
r['weighted_lines'] = r['lines_paid_at_this_percentage'] * r['rev_code_type'].map(TYPE_WEIGHTS)
r['surgical_rev_code'] = r['revenue_code'].where(r['rev_code_type'] == 'surgical')
# Filter to plausible stoploss rate range
r = r[r['line_perc_allowed'].between(0.25, 0.95)]
# Greedy 1D clustering: start a new cluster whenever the gap between consecutive rates exceeds tolerance
RATE_TOL = 0.03
def cluster_rates(rates, tol=RATE_TOL):
sorted_rates = sorted(set(rates))
label = {}
cluster_start = sorted_rates[0]
for rate in sorted_rates:
if rate - cluster_start > tol:
cluster_start = rate
label[rate] = cluster_start
return label
def assign_rate_bucket(group, tol=RATE_TOL):
label_map = cluster_rates(group['line_perc_allowed'].values, tol=tol)
group = group.copy()
group['rate_bucket'] = group['line_perc_allowed'].map(label_map)
return group
def scale_1_to_10(series):
logged = np.log1p(series)
mn, mx = logged.min(), logged.max()
if mx == mn:
return series * 0 + 10
return ((logged - mn) / (mx - mn) * 9 + 1).round(1)
# Step 1: within-year clustering (0.03 tolerance)
r = r.groupby(['provider_id', 'tq_payer_id', 'remit_year'], group_keys=False).apply(assign_rate_bucket)
confidence = (
r.groupby(['provider_id', 'tq_payer_id', 'remit_year', 'rate_bucket'])
.agg(
n_rev_codes=('revenue_code', 'nunique'),
n_surgical_codes=('surgical_rev_code', 'nunique'),
total_weighted_lines=('weighted_lines', 'sum'),
total_lines=('lines_paid_at_this_percentage', 'sum'),
)
.reset_index()
.rename(columns={'rate_bucket': 'line_perc_allowed'})
)
confidence['confidence'] = scale_1_to_10(
confidence['n_rev_codes'] * np.log1p(confidence['total_weighted_lines'])
)
# Step 2: cross-year scoring (0.05 tolerance, n_years multiplier)
r2 = r.groupby(['provider_id', 'tq_payer_id'], group_keys=False).apply(
lambda g: assign_rate_bucket(g, tol=0.05)
)
confidence2 = (
r2.groupby(['provider_id', 'tq_payer_id', 'rate_bucket'])
.agg(
n_years=('remit_year', 'nunique'),
n_rev_codes=('revenue_code', 'nunique'),
n_surgical_codes=('surgical_rev_code', 'nunique'),
total_weighted_lines=('weighted_lines', 'sum'),
total_lines=('lines_paid_at_this_percentage', 'sum'),
)
.reset_index()
.rename(columns={'rate_bucket': 'line_perc_allowed'})
)
confidence2['confidence'] = scale_1_to_10(
confidence2['n_years'] * confidence2['n_rev_codes'] * np.log1p(confidence2['total_weighted_lines'])
)
# Step 3: select highest-scoring rate per provider/payer
best_rate = confidence2.sort_values(
['provider_id', 'tq_payer_id', 'confidence'], ascending=[True, True, False]
).groupby(['provider_id', 'tq_payer_id']).first().reset_index()
# Step 4: assign confidence tiers
bins = [0, 2, 4, 6, 8, 10]
labels = ['very low', 'low', 'medium', 'high', 'very high']
best_rate['confidence_tier'] = pd.cut(best_rate['confidence'], bins=bins, labels=labels)
Scoring Examples
Scores are normalized relative to the full dataset — a 10 means most stoploss-like in this population, not confirmed stoploss. Very high scores with large line counts (tens of thousands) are more likely percent-of-charge; smaller line counts with multi-year consistency and broad rev code coverage are stronger signals.
| provider_id | payer_id | rate | years | rev codes | non-drug/implant codes | total lines | score | tier | notes |
|---|---|---|---|---|---|---|---|---|---|
| 651 | 392 | 54% | 4 | 112 | 96 | 67,692 | 10.0 | very high | Large line count — possibly percent-of-charge |
| 6546 | 522 | 37% | 4 | 95 | 88 | 34,723 | 9.8 | very high | Large line count — possibly percent-of-charge |
| 2201 | 567 | 25% | 4 | 91 | 76 | 21,890 | 9.7 | very high | Large line count — possibly percent-of-charge |
| 3520 | 705 | 49% | 4 | 98 | 86 | 15,098 | 9.7 | very high | Large line count — possibly percent-of-charge |
| 6646 | 522 | 47% | 4 | 93 | 82 | 11,860 | 9.6 | very high | Large line count — possibly percent-of-charge |
| 1164 | 111 | 61% | 4 | 46 | 36 | 107 | 8.1 | very high | Small line count, multi-year, broad rev code coverage — stronger signal |
| 1088 | 42 | 76% | 2 | 57 | 48 | 759 | 8.0 | high | Two years reduces cross-year multiplier but still scores high |
Clear Rates Validation
A sample of high-confidence remits combos was cross-referenced against Clear Rates to assess what share are likely stoploss vs. percent-of-charge.
Sample Findings (n=50)
| Category | Count | % | Included in Realistic Estimate |
|---|---|---|---|
| Could be stoploss — no close % of charge match in Clear Rates | 13 | 26% | Yes |
| Likely % of charge — rate matches common Clear Rates % of charge contract | 8 | 16% | No |
| Not in Clear Rates — cannot determine | 25 | 50% | No |
| Nothing > 1 rate score — weak signal | 4 | 8% | No |
Only combos where Clear Rates shows no plausible percent-of-charge explanation are carried forward. Combos not in Clear Rates are excluded.
"Could Be Stoploss" Examples
Selected from the 13 combos with no close percent-of-charge match in Clear Rates:
| provider_id | payer_id | rate | years | rev codes | non-drug/implant codes | lines at rate | score | notes |
|---|---|---|---|---|---|---|---|---|
| 4812 | 643 | 25% | 4 | 66 | 58 | 1,097 | 8.9 | Most common percent-of-charge in Clear Rates is 37%; also seeing 21% |
| 5297 | 169 | 25% | 4 | 39 | 32 | 462 | 8.2 | Most common percent-of-charge contracts in Clear Rates are 59% and 53% |
| 1225 | 354 | 31% | 4 | 25 | 18 | 193 | 7.5 | No percent-of-charge canonical contract methodologies in Clear Rates |
| 5889 | 43 | 25% | 4 | 28 | 24 | 67 | 7.5 | No percent-of-charge canonical contract methodologies; seeing 25% non-canonical |
| 2245 | 643 | 25% | 3 | 38 | 32 | 49 | 7.5 | Only 100% canonical percent-of-charge rates in Clear Rates |
| 2376 | 272 | 50% | 3 | 44 | 36 | 154 | 7.9 | Most common percent-of-charge contracts in Clear Rates are 63% and 87% |
| 1091 | 229 | 87% | 2 | 32 | 29 | 39 | 6.8 | Most common percent-of-charge in Clear Rates is 79% |
| 1155 | 643 | 31% | 2 | 46 | 38 | 142 | 7.4 | Only 100% canonical percent-of-charge rates in Clear Rates |
| 1564 | 643 | 37% | 2 | 22 | 17 | 94 | 6.5 | Most common percent-of-charge contracts in Clear Rates are 72% and 59% |
| 976 | 958 | 31% | 2 | 27 | 24 | 29 | 6.5 | No canonical percent-of-charge contracts; top non-canonical is 33% |
| 1204 | 174 | 25% | 4 | 10 | 8 | 113 | 6.5 | Only 100% percent-of-charge in Clear Rates |
| 3036 | 151 | 25% | 3 | 17 | 16 | 39 | 6.5 | No percent-of-charge canonical contract methodologies in Clear Rates |
| 1273 | 56 | 73% | 1 | 35 | 30 | 35 | 6.1 | No Clear Rates percent-of-charge rates close to this |
Health System Extrapolation
Methodology
Starting from all known provider/payer combinations (from both remits and MRF-extracted provisions):
- Find every health system that contains at least one provider from the known set.
- For each health system / payer pair identified, find all other providers in that system where the combination is not already known.
- Each of these becomes an inferred provider/payer combination.
This yields 8,962 raw additional combinations.
Health System Sharing Rate
45.8% — of health system/payer/network/setting groups with more than one provider, 45.8% have identical stoploss terms across all providers in the group.
SQL
SQL
WITH expected_universe AS (
-- Build the hospital provider universe and attach health system identity.
SELECT
p.provider_id,
p.provider_name,
hs.provider_id AS healthsystem_id,
hs.provider_name AS healthsystem_name
FROM tq_production.spines.spines_provider p
INNER JOIN tq_production.spines.spines_provider_healthsystems hs ON
hs.provider_id = p.provider_healthsystem_id
AND p.provider_type = 'Hospital'
),
data AS (
-- Attach stoploss rows to providers.
SELECT
eu.provider_id,
eu.provider_name,
eu.healthsystem_id,
eu.healthsystem_name,
s.payer_name,
s.network_id,
s.setting,
s.stoploss_type,
s.dollar_threshold,
s.per_diem_threshold,
s.per_diem_cap,
s.percentage_reimbursement,
s.per_diem_reimbursement,
s.stoploss_threshold_type,
s.stoploss_reimbursement_type,
s.stoploss_cap_type,
s.canonical_source_note
FROM expected_universe eu
LEFT JOIN tq_dev.internal_dev_mmalhotra_provisions.provisions_stoploss_aggregated_2026_01_v5 s ON
s.provider_id = eu.provider_id
),
healthsystem_payers AS (
-- Find all payers that appear anywhere within each health system
-- for inpatient stoploss rows.
-- Right now - this assumes that all payers have a contract with all providers at a health system
SELECT DISTINCT
d.healthsystem_id,
d.healthsystem_name,
d.payer_name
FROM data d
WHERE
d.setting = 'Inpatient' AND
d.payer_name IS NOT NULL
),
provider_payer_scaffold AS (
-- Create the full set of provider + payer combinations
-- implied by the health system.
SELECT
eu.healthsystem_id,
eu.healthsystem_name,
eu.provider_id,
eu.provider_name,
hp.payer_name
FROM expected_universe eu
INNER JOIN healthsystem_payers hp ON
eu.healthsystem_id = hp.healthsystem_id
),
provider_payer_status AS (
-- For each provider + payer combination, summarize whether there is
-- any direct inpatient row with a non-null canonical source note.
SELECT
pps.healthsystem_id,
pps.healthsystem_name,
pps.provider_id,
pps.provider_name,
pps.payer_name,
max(
CASE
WHEN d.setting = 'Inpatient'
AND d.canonical_source_note IS NOT NULL
THEN 1
ELSE 0
END
) AS has_direct_stoploss,
max(
CASE
WHEN d.setting = 'Inpatient' THEN 1
ELSE 0
END
) AS has_any_inpatient_row
FROM provider_payer_scaffold pps
LEFT JOIN data d ON
pps.provider_id = d.provider_id AND
pps.payer_name = d.payer_name
GROUP BY
pps.healthsystem_id,
pps.healthsystem_name,
pps.provider_id,
pps.provider_name,
pps.payer_name
),
healthsystem_payer_donor AS (
-- Choose one donor example per health system + payer.
SELECT
x.healthsystem_id,
x.healthsystem_name,
x.payer_name,
x.donor_provider_id,
x.donor_provider_name,
x.donor_network_id,
x.donor_setting,
x.donor_stoploss_type,
x.donor_dollar_threshold,
x.donor_per_diem_threshold,
x.donor_per_diem_cap,
x.donor_percentage_reimbursement,
x.donor_per_diem_reimbursement,
x.donor_stoploss_threshold_type,
x.donor_stoploss_reimbursement_type,
x.donor_stoploss_cap_type,
x.donor_canonical_source_note
FROM (
SELECT
d.healthsystem_id,
d.healthsystem_name,
d.payer_name,
d.provider_id AS donor_provider_id,
d.provider_name AS donor_provider_name,
d.network_id AS donor_network_id,
d.setting AS donor_setting,
d.stoploss_type AS donor_stoploss_type,
d.dollar_threshold AS donor_dollar_threshold,
d.per_diem_threshold AS donor_per_diem_threshold,
d.per_diem_cap AS donor_per_diem_cap,
d.percentage_reimbursement AS donor_percentage_reimbursement,
d.per_diem_reimbursement AS donor_per_diem_reimbursement,
d.stoploss_threshold_type AS donor_stoploss_threshold_type,
d.stoploss_reimbursement_type AS donor_stoploss_reimbursement_type,
d.stoploss_cap_type AS donor_stoploss_cap_type,
d.canonical_source_note AS donor_canonical_source_note,
row_number() OVER (
PARTITION BY
d.healthsystem_id,
d.payer_name
ORDER BY
d.dollar_threshold DESC,
d.percentage_reimbursement DESC
) AS rn
FROM data d
WHERE
d.setting = 'Inpatient' AND
d.canonical_source_note IS NOT NULL AND
d.payer_name IS NOT NULL
) x
WHERE
x.rn = 1
)
SELECT DISTINCT
pps.healthsystem_id,
pps.healthsystem_name,
pps.provider_id AS recipient_provider_id,
pps.provider_name AS recipient_provider_name,
pps.payer_name AS recipient_payer_name,
pps.has_any_inpatient_row,
pps.has_direct_stoploss,
d.donor_provider_id,
d.donor_provider_name,
d.payer_name AS inherited_payer_name,
d.donor_network_id AS inherited_network_id,
d.donor_canonical_source_note AS inherited_canonical_source_note,
d.donor_stoploss_type AS inherited_stoploss_type,
d.donor_dollar_threshold AS inherited_dollar_threshold,
d.donor_per_diem_threshold AS inherited_per_diem_threshold,
d.donor_per_diem_cap AS inherited_per_diem_cap,
d.donor_percentage_reimbursement AS inherited_percentage_reimbursement,
d.donor_per_diem_reimbursement AS inherited_per_diem_reimbursement,
d.donor_stoploss_threshold_type AS inherited_stoploss_threshold_type,
d.donor_stoploss_reimbursement_type AS inherited_stoploss_reimbursement_type,
d.donor_stoploss_cap_type AS inherited_stoploss_cap_type
FROM provider_payer_status pps
INNER JOIN healthsystem_payer_donor d ON
pps.healthsystem_id = d.healthsystem_id AND
pps.payer_name = d.payer_name
WHERE
pps.has_direct_stoploss = 0 AND
pps.provider_id <> d.donor_provider_id