Skip to main content

version 1

Link to PRD

Due: 1/23/2026

Final Table: tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_2025_10

Goal

Goal for v1 is to create a table that stores implant carveout percentages using payer MRF data AND hospital MRF data.

To incorporate hospital MRF data, we'll need to consider Clear Rates plan bridge usage. Also, hospital MRF will offer an opportunity for provision validation.

Scope

  • Data Source:
    • payer MRF (core_rates)
    • hospital MRF (hospital_rates), filtered using Clear Rates plan_bridge
  • Providers:
    • All Hospitals
  • Payers / Networks:
    • All payers and networks via Clear Rates, supplemented with those not in Clear Rates
  • Codes:
    • Revenue codes:
      • 0275, 0276, 0277, 0278
    • Billing codes:
      • C1713, C1716, C1760, C1762, C1763, C1764, C1767, C1772, C1776, C1777, C1778, C1779, C1874, C1876, C1881, C2615, C2627, L8600, L8610, L8680, L8699

Methodology

  1. Filter hospital_rates:
    1. plan_bridge (networks/payers):
      1. CTE unioning hospital plan_bridge with rc_plan_bridge.
      2. These tables can be joined directly to hospital_rates on the id field.
    2. billing_class / setting:
      1. billing_class = 'Facility' or null.
      2. Split setting on & delimiter into separate rows (e.g., "Inpatient & Outpatient" becomes two rows).
    3. billing_codes:
      1. v1 will just be Revenue Codes, v2 will grab HCPCS.
    4. billing_code_type:
      1. Exclude billing_code_type = 'APC'
    5. contract_methodology:
      1. contract_methodology = 'percent of total billed charges'
      2. v2 will include fee schedule / case rate / negotiated.
    6. payer_class_name:
      1. payer_class_name = 'Commercial'
    7. Exclude payer_id = 12
  2. Pre-Processing:
    1. percentage calculation (CASE evaluated in order):
      1. If negotiated_percentage < 1: assume decimal form, multiply by 100
      2. If negotiated_percentage > 100: exclude (set to NULL)
      3. If negotiated_dollar and gross_charge are both not NULL and > 0: calculate as (negotiated_dollar / gross_charge) * 100
      4. Else: use negotiated_percentage as-is
      5. Round to 2 decimal places. This was causing false rate differences.
    2. Generate network_id:
      1. If network exists in network_spine: use network_id from spine
      2. Else: generate fallback network_id using xxhash64(payer_id || plan_name)
  3. Analysis:
    1. For each payer, network, provider, compute the following summary statistics using the "percentage":
      1. mode (calculated where percentage BETWEEN 1 AND 101)
      2. median
      3. min
      4. max
      5. average
      6. count(distinct percentages)
    2. Determine thresholds for handling multiple different rates:
      1. if source of multiple rates is within contract-revenue_code, select the max()
      2. if source of multiple rates is between revenue_codes within a contract, hierarchy to prefer 0278
    3. Final output: each row represents a unique combination of payer_id, network_id, provider_id, billing_code, and setting.

SQL

Create base tables from Hospital Rates for each revenue code, then union them together. Each row represents a unique payer, network, provider, revenue_code, and setting combination.

tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_{rc}

Hospital Rates Base Table Code (per revenue code)
cld_version = "v2_3_1"
cld_subversion = "2025_10"
core_rate_month = "2025_10"

revenue_codes = ['275', '276', '277', '278']

for rc in revenue_codes:
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_{rc} AS
WITH
plan_bridge AS (
SELECT network_name, id, provider_id, payer_id, network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_rc_plan_bridge_{cld_subversion}
UNION ALL
SELECT network_name, id, provider_id, payer_id, network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_plan_bridge_{cld_subversion}
),
network_spine AS (
SELECT payer_id, network_id, network_name, network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion}
),
provider_spine AS (
SELECT provider_id
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
WHERE provider_id IS NOT NULL
AND provider_type = 'Hospital'
)
SELECT
hr.payer_id,
hr.payer_name,
COALESCE(
ns.network_id,
from_big_endian_64(
xxhash64(
CAST(CAST(hr.payer_id AS VARCHAR) || hr.plan_name AS varbinary)
)
)
) AS network_id,
COALESCE(ns.network_name, hr.plan_name) AS network_name,
ns.network_name IS NOT NULL AS in_clear_rates,
hr.provider_id,
hr.revenue_code AS billing_code,
TRIM(u.setting_split) AS setting,
hr.id,
ROUND(
CASE
WHEN negotiated_percentage < 1 THEN negotiated_percentage * 100
WHEN negotiated_percentage > 100 THEN NULL
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END,
2
) AS percentage
FROM glue.hospital_data.hospital_rates hr
LEFT JOIN plan_bridge pb
ON hr.id = pb.id
AND hr.provider_id = pb.provider_id
AND hr.payer_id = CAST(pb.payer_id AS INT)
LEFT JOIN network_spine ns
ON ns.payer_id = pb.payer_id
AND ns.network_name = pb.network_name
AND ns.network_type = pb.network_type
JOIN provider_spine prvs
ON prvs.provider_id = hr.provider_id
LEFT JOIN UNNEST(SPLIT(hr.setting, '&')) AS u(setting_split)
ON TRUE
WHERE
hr.billing_code_type != 'APC'
AND hr.revenue_code = '{rc}'
AND (hr.billing_class = 'Facility' OR hr.billing_class IS NULL)
AND hr.contract_methodology = 'percent of total billed charges'
AND hr.payer_id != 12
AND hr.payer_class_name = 'Commercial'
""", con=trino_conn)

Union all base tables together.

tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL

Union Base Tables Code
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL AS
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_275
UNION ALL
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_276
UNION ALL
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_277
UNION ALL
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_278
""", con=trino_conn)

Calculate mode rates per contract.

tq_dev.internal_dev_mmalhotra.tmp_implant_mode_{cld_subversion}

Mode Calculation Code
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_mode_{cld_subversion} AS
SELECT
payer_id,
network_id,
provider_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT
payer_id,
network_id,
provider_id,
percentage,
COUNT(*) AS cnt
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL
WHERE percentage BETWEEN 1 AND 101
GROUP BY 1,2,3,4
)
GROUP BY 1,2,3
""", con=trino_conn)

Create final implant rates table with summary statistics.

tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}

Final Implant Rates Table Code
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion} AS
SELECT
CAST(hr.payer_id AS VARCHAR)
|| CAST(hr.network_id AS VARCHAR)
|| hr.provider_id
|| hr.billing_code AS unique_id,

CAST(hr.payer_id AS VARCHAR)
|| CAST(hr.network_id AS VARCHAR)
|| hr.provider_id AS contract_id,

hr.payer_id,
hr.payer_name,
hr.network_id,
hr.network_name,
hr.in_clear_rates,
hr.provider_id,
hr.billing_code,
hr.setting,

COUNT(DISTINCT hr.id) AS hospital_rate_id_count,
MIN(hr.percentage) AS min_rate,
MAX(hr.percentage) AS max_rate,
ROUND(AVG(hr.percentage), 2) AS avg_rate,
approx_percentile(hr.percentage, 0.5) AS median_rate,
ANY_VALUE(m.mode_rate) AS mode_rate,
COUNT(DISTINCT hr.percentage) AS num_rates

FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL hr
LEFT JOIN tq_dev.internal_dev_mmalhotra.tmp_implant_mode_{cld_subversion} m
ON m.payer_id = hr.payer_id
AND m.network_id = hr.network_id
AND m.provider_id = hr.provider_id

GROUP BY
1,2,3,4,5,6,7,8,9,10
""", con=trino_conn)

Analysis

Q1: How often does a contract-revenue_code pair have multiple different rates?

There are 124,942 payer, network, provider, revenue code (275-278), setting combinations across 40,274 contracts. 75% of the time, this contract-revenue_code pair has ≤ 2 distinct percentages. However, there is a long tail with max of 8,250 distinct rates.

num_rates
count124942
num_contracts40274
p251
p501
p752
p905
p959
p9945
max8250
Code
df = pd.read_sql(f"""
SELECT
COUNT(*) AS num_objects,
COUNT(DISTINCT contract_id) AS num_contracts,
COUNT(DISTINCT billing_code) AS num_billing_codes,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.25)) AS INTEGER) AS p25,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.50)) AS INTEGER) AS p50,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.75)) AS INTEGER) AS p75,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.90)) AS INTEGER) AS p90,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.95)) AS INTEGER) AS p95,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.99)) AS INTEGER) AS p99,
MAX(num_rates) AS max_rates
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
""", con=trino_conn)
df.T

Evaluate Samples with Multiple Rates:

The rate differences are often small (e.g., 38.00 vs 38.44), suggesting rounding precision may be the cause.

Code
df = pd.read_sql(f"""
WITH
samples AS (
SELECT *
FROM
tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
WHERE num_rates > 1
ORDER BY RANDOM()
LIMIT 10
)
SELECT
*
FROM samples cr
""", con=trino_conn)
df

Q2: Does the distribution change when rounded to whole integers?

When rounding percentages to whole integers, the distribution improves significantly. 95% of contract-revenue_code pairs have ≤ 3 distinct rates (vs 9 without rounding). Max drops from 8,250 to 1,322.

num_rates
num_objects89476
p251
p501
p751
p902
p953
p9924
max1322
Code
df_int = pd.read_sql(f"""
WITH rounded_source AS (
SELECT
payer_id,
network_id,
provider_id,
billing_code,
ROUND(percentage, 0) AS percentage_int
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL
),
agg AS (
SELECT
payer_id,
network_id,
provider_id,
billing_code,
COUNT(DISTINCT percentage_int) AS num_rates
FROM rounded_source
GROUP BY 1,2,3,4
)
SELECT
COUNT(*) AS num_objects,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.25)) AS INTEGER) AS p25,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.50)) AS INTEGER) AS p50,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.75)) AS INTEGER) AS p75,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.90)) AS INTEGER) AS p90,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.95)) AS INTEGER) AS p95,
CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.99)) AS INTEGER) AS p99,
MAX(num_rates) AS max_rates
FROM agg
""", con=trino_conn)
df_int.T

Q3: How often do contracts contain multiple implant rev codes with different rates?

Only 57% of contracts have no rate difference across revenue codes. About 37% have a difference of 0-10 percentage points. This is much higher variability than v0 (which had 94% with no difference).

diff_bucketnum_contractspct_contracts
no_diff2283156.69
0-101483736.84
10-2516053.99
25-505091.26
50+740.18
unknown4191.04
Code
df_diff = pd.read_sql(f"""
WITH
df AS (
SELECT
contract_id,
min(avg_rate) AS min_rate,
max(avg_rate) AS max_rate,
max(avg_rate) - min(avg_rate) AS rate_diff
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
GROUP BY 1
)
SELECT
CASE
WHEN rate_diff = 0 THEN 'no_diff'
WHEN rate_diff > 0 AND rate_diff <= 10 THEN '0-10'
WHEN rate_diff > 10 AND rate_diff <= 25 THEN '10-25'
WHEN rate_diff > 25 AND rate_diff <= 50 THEN '25-50'
WHEN rate_diff > 50 THEN '50+'
ELSE 'unknown'
END AS diff_bucket,
COUNT(*) AS num_contracts
FROM df
GROUP BY 1
ORDER BY
CASE
WHEN diff_bucket = 'no_diff' THEN 1
WHEN diff_bucket = '0-10' THEN 2
WHEN diff_bucket = '10-25' THEN 3
WHEN diff_bucket = '25-50' THEN 4
WHEN diff_bucket = '50+' THEN 5
ELSE 6
END
""", con=trino_conn)
df_diff['pct_contracts'] = 100 * df_diff['num_contracts'] / df_diff['num_contracts'].sum()
print(df_diff.to_markdown(index=False))

Q4: How does the rate diff distribution look with whole integer rounding?

When rounding to whole integers, 99% of contracts have no difference. This confirms that the rate variability in Q3 is largely due to decimal precision, not actual rate differences.

diff_bucketnum_contractspct_contracts
no_diff3985698.96
unknown4191.04
Code
df_diff_int = pd.read_sql(f"""
WITH
rounded_source AS (
SELECT
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id AS contract_id,
ROUND(percentage, 0) AS percentage_int
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL
),
agg AS (
SELECT
contract_id,
ROUND(AVG(percentage_int), 2) AS avg_rate
FROM rounded_source
GROUP BY 1
),
df AS (
SELECT
contract_id,
min(avg_rate) AS min_rate,
max(avg_rate) AS max_rate,
max(avg_rate) - min(avg_rate) AS rate_diff
FROM agg
GROUP BY 1
)
SELECT
CASE
WHEN rate_diff = 0 THEN 'no_diff'
WHEN rate_diff > 0 AND rate_diff <= 10 THEN '0-10'
WHEN rate_diff > 10 AND rate_diff <= 25 THEN '10-25'
WHEN rate_diff > 25 AND rate_diff <= 50 THEN '25-50'
WHEN rate_diff > 50 THEN '50+'
ELSE 'unknown'
END AS diff_bucket,
COUNT(*) AS num_contracts
FROM df
GROUP BY 1
ORDER BY
CASE
WHEN diff_bucket = 'no_diff' THEN 1
WHEN diff_bucket = '0-10' THEN 2
WHEN diff_bucket = '10-25' THEN 3
WHEN diff_bucket = '25-50' THEN 4
WHEN diff_bucket = '50+' THEN 5
ELSE 6
END
""", con=trino_conn)
df_diff_int['pct_contracts'] = 100 * df_diff_int['num_contracts'] / df_diff_int['num_contracts'].sum()
print(df_diff_int.to_markdown(index=False))

Notes

  • payer_id = 12 is the unsorted payer ID so excluded it.
  • Schema changes from core_rates to hospital_rates:
    • billing_class = 'institutional' → billing_class = 'Facility' or NULL
    • negotiated_type → contract_methodology
    • product_network_label → plan_name
    • billing_code → revenue_code (separate column)
  • hospital_rates does not have service_code, so setting is used instead and split into separate rows for IP vs OP.
  • The mode filter uses BETWEEN 1 AND 101 to account for unusual negotiated_dollar / gross_charge calculated values.
  • Only Commercial payer_class is included.
  • Rounding was updated to 2 decimal places to address false rate differences.
  • There are placeholder 0.01 values in hospital_rates for compliance.

Open Items

  • May need to use historical hospital versioning to match all IDs.
  • COMPLETE: Need to determine if billing_code_types besides APC should be excluded (NF: anything not NULL or HCPCS).
  • COMPLETE: Revenue codes will need reformatting to match between datasets (e.g., '275' vs '0275').
  • COMPLETE: setting is NULL for payer data in v0, so will need to add logic when combining sources.
  • COMPLETE: ~~~~Negotiated percentages at 100 are currently included but may be updated to exclude (cap at 99).~~ Prioritized non-100 rates.
  • Individual hospital IDs are not surfaced in the final table, so will need a traceability table/feature once added to a DAG.
  • There is some duplication of unique_id because of the IP / OP split, need to add that to the hash. Besides these, there are a handful of duplicate unique_ids (0.17%) because of issues in hospital rates (null plan name, PPO not mapping in plan bridge). Added IP/OP to the hash, plus in_clear_rates as a bit at the end. Also, filtered out null plan names.