version 1

Due: 1/23/2026

Final Table: tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_2025_10

Goal

Goal for v1 is to create a table that stores implant carveout percentages using payer MRF data AND hospital MRF data.

To incorporate hospital MRF data, we'll need to consider Clear Rates plan bridge usage. Also, hospital MRF will offer an opportunity for provision validation.

Scope

Data Source:
- payer MRF (core_rates)
- hospital MRF (hospital_rates), filtered using Clear Rates plan_bridge
Providers:
- All Hospitals
Payers / Networks:
- All payers and networks via Clear Rates, supplemented with those not in Clear Rates
Codes:
- Revenue codes:
  - 0275, 0276, 0277, 0278
- ~~Billing codes:~~
  - ~~C1713, C1716, C1760, C1762, C1763, C1764, C1767, C1772, C1776, C1777, C1778, C1779, C1874, C1876, C1881, C2615, C2627, L8600, L8610, L8680, L8699~~

Methodology

Filter hospital_rates:
1. plan_bridge (networks/payers):
  1. CTE unioning hospital plan_bridge with rc_plan_bridge.
  2. These tables can be joined directly to hospital_rates on the id field.
2. billing_class / setting:
  1. billing_class = 'Facility' or null.
  2. Split setting on & delimiter into separate rows (e.g., "Inpatient & Outpatient" becomes two rows).
3. billing_codes:
  1. v1 will just be Revenue Codes, v2 will grab HCPCS.
4. billing_code_type:
  1. Exclude billing_code_type = 'APC'
5. contract_methodology:
  1. contract_methodology = 'percent of total billed charges'
  2. v2 will include fee schedule / case rate / negotiated.
6. payer_class_name:
  1. payer_class_name = 'Commercial'
7. Exclude payer_id = 12
Pre-Processing:
1. percentage calculation (CASE evaluated in order):
  1. If negotiated_percentage < 1: assume decimal form, multiply by 100
  2. If negotiated_percentage > 100: exclude (set to NULL)
  3. If negotiated_dollar and gross_charge are both not NULL and > 0: calculate as (negotiated_dollar / gross_charge) * 100
  4. Else: use negotiated_percentage as-is
  5. Round to 2 decimal places. This was causing false rate differences.
2. Generate network_id:
  1. If network exists in network_spine: use network_id from spine
  2. Else: generate fallback network_id using xxhash64(payer_id || plan_name)
Analysis:
1. For each payer, network, provider, compute the following summary statistics using the "percentage":
  1. mode (calculated where percentage BETWEEN 1 AND 101)
  2. median
  3. min
  4. max
  5. average
  6. count(distinct percentages)
2. Determine thresholds for handling multiple different rates:
  1. if source of multiple rates is within contract-revenue_code, select the max()
  2. if source of multiple rates is between revenue_codes within a contract, hierarchy to prefer 0278
3. Final output: each row represents a unique combination of payer_id, network_id, provider_id, billing_code, and setting.

SQL

Create base tables from Hospital Rates for each revenue code, then union them together. Each row represents a unique payer, network, provider, revenue_code, and setting combination.

tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_{rc}

Hospital Rates Base Table Code (per revenue code)

cld_version = "v2_3_1"
cld_subversion = "2025_10"
core_rate_month = "2025_10"

revenue_codes = ['275', '276', '277', '278']

for rc in revenue_codes:
    query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_{rc} AS
WITH
plan_bridge AS (
    SELECT network_name, id, provider_id, payer_id, network_type
    FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_rc_plan_bridge_{cld_subversion}
    UNION ALL
    SELECT network_name, id, provider_id, payer_id, network_type
    FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_plan_bridge_{cld_subversion}
),
network_spine AS (
    SELECT payer_id, network_id, network_name, network_type
    FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion}
),
provider_spine AS (
    SELECT provider_id
    FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
    WHERE provider_id IS NOT NULL
      AND provider_type = 'Hospital'
)
SELECT
    hr.payer_id,
    hr.payer_name,
    COALESCE(
        ns.network_id,
        from_big_endian_64(
            xxhash64(
                CAST(CAST(hr.payer_id AS VARCHAR) || hr.plan_name AS varbinary)
            )
        )
    ) AS network_id,
    COALESCE(ns.network_name, hr.plan_name) AS network_name,
    ns.network_name IS NOT NULL AS in_clear_rates,
    hr.provider_id,
    hr.revenue_code AS billing_code,
    TRIM(u.setting_split) AS setting,
    hr.id,
    ROUND(
        CASE
            WHEN negotiated_percentage < 1 THEN negotiated_percentage * 100
            WHEN negotiated_percentage > 100 THEN NULL
            WHEN negotiated_dollar IS NOT NULL
                AND gross_charge IS NOT NULL
                AND negotiated_dollar > 0
                AND gross_charge > 0
            THEN (negotiated_dollar / gross_charge) * 100
            ELSE negotiated_percentage
        END,
        2
    ) AS percentage
FROM glue.hospital_data.hospital_rates hr
LEFT JOIN plan_bridge pb
    ON hr.id = pb.id
   AND hr.provider_id = pb.provider_id
   AND hr.payer_id = CAST(pb.payer_id AS INT)
LEFT JOIN network_spine ns
    ON ns.payer_id = pb.payer_id
   AND ns.network_name = pb.network_name
   AND ns.network_type = pb.network_type
JOIN provider_spine prvs
    ON prvs.provider_id = hr.provider_id
LEFT JOIN UNNEST(SPLIT(hr.setting, '&')) AS u(setting_split)
    ON TRUE
WHERE
    hr.billing_code_type != 'APC'
    AND hr.revenue_code = '{rc}'
    AND (hr.billing_class = 'Facility' OR hr.billing_class IS NULL)
    AND hr.contract_methodology = 'percent of total billed charges'
    AND hr.payer_id != 12
    AND hr.payer_class_name = 'Commercial'
""", con=trino_conn)

Union all base tables together.

tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL

Union Base Tables Code

query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL AS
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_275
UNION ALL
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_276
UNION ALL
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_277
UNION ALL
SELECT * FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_278
""", con=trino_conn)

Calculate mode rates per contract.

tq_dev.internal_dev_mmalhotra.tmp_implant_mode_{cld_subversion}

Mode Calculation Code

query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_mode_{cld_subversion} AS
SELECT
    payer_id,
    network_id,
    provider_id,
    max_by(percentage, cnt) AS mode_rate
FROM (
    SELECT
        payer_id,
        network_id,
        provider_id,
        percentage,
        COUNT(*) AS cnt
    FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL
    WHERE percentage BETWEEN 1 AND 101
    GROUP BY 1,2,3,4
)
GROUP BY 1,2,3
""", con=trino_conn)

Create final implant rates table with summary statistics.

tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}

Final Implant Rates Table Code

query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion} AS
SELECT
    CAST(hr.payer_id AS VARCHAR)
        || CAST(hr.network_id AS VARCHAR)
        || hr.provider_id
        || hr.billing_code AS unique_id,

    CAST(hr.payer_id AS VARCHAR)
        || CAST(hr.network_id AS VARCHAR)
        || hr.provider_id AS contract_id,

    hr.payer_id,
    hr.payer_name,
    hr.network_id,
    hr.network_name,
    hr.in_clear_rates,
    hr.provider_id,
    hr.billing_code,
    hr.setting,

    COUNT(DISTINCT hr.id) AS hospital_rate_id_count,
    MIN(hr.percentage) AS min_rate,
    MAX(hr.percentage) AS max_rate,
    ROUND(AVG(hr.percentage), 2) AS avg_rate,
    approx_percentile(hr.percentage, 0.5) AS median_rate,
    ANY_VALUE(m.mode_rate) AS mode_rate,
    COUNT(DISTINCT hr.percentage) AS num_rates

FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL hr
LEFT JOIN tq_dev.internal_dev_mmalhotra.tmp_implant_mode_{cld_subversion} m
  ON m.payer_id = hr.payer_id
 AND m.network_id = hr.network_id
 AND m.provider_id = hr.provider_id

GROUP BY
  1,2,3,4,5,6,7,8,9,10
""", con=trino_conn)

Analysis

Q1: How often does a contract-revenue_code pair have multiple different rates?

There are 124,942 payer, network, provider, revenue code (275-278), setting combinations across 40,274 contracts. 75% of the time, this contract-revenue_code pair has ≤ 2 distinct percentages. However, there is a long tail with max of 8,250 distinct rates.

	num_rates
count	124942
num_contracts	40274
p25	1
p50	1
p75	2
p90	5
p95	9
p99	45
max	8250

Code

df = pd.read_sql(f"""
SELECT
    COUNT(*) AS num_objects,
    COUNT(DISTINCT contract_id) AS num_contracts,
    COUNT(DISTINCT billing_code) AS num_billing_codes,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.25)) AS INTEGER) AS p25,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.50)) AS INTEGER) AS p50,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.75)) AS INTEGER) AS p75,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.90)) AS INTEGER) AS p90,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.95)) AS INTEGER) AS p95,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.99)) AS INTEGER) AS p99,
    MAX(num_rates) AS max_rates
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
""", con=trino_conn)
df.T

Evaluate Samples with Multiple Rates:

The rate differences are often small (e.g., 38.00 vs 38.44), suggesting rounding precision may be the cause.

Code

df = pd.read_sql(f"""
WITH
samples AS (
    SELECT *
    FROM
        tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
    WHERE num_rates > 1
    ORDER BY RANDOM()
    LIMIT 10
)
SELECT
    *
FROM samples cr
""", con=trino_conn)
df

Q2: Does the distribution change when rounded to whole integers?

When rounding percentages to whole integers, the distribution improves significantly. 95% of contract-revenue_code pairs have ≤ 3 distinct rates (vs 9 without rounding). Max drops from 8,250 to 1,322.

	num_rates
num_objects	89476
p25	1
p50	1
p75	1
p90	2
p95	3
p99	24
max	1322

Code

df_int = pd.read_sql(f"""
WITH rounded_source AS (
    SELECT
        payer_id,
        network_id,
        provider_id,
        billing_code,
        ROUND(percentage, 0) AS percentage_int
    FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL
),
agg AS (
    SELECT
        payer_id,
        network_id,
        provider_id,
        billing_code,
        COUNT(DISTINCT percentage_int) AS num_rates
    FROM rounded_source
    GROUP BY 1,2,3,4
)
SELECT
    COUNT(*) AS num_objects,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.25)) AS INTEGER) AS p25,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.50)) AS INTEGER) AS p50,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.75)) AS INTEGER) AS p75,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.90)) AS INTEGER) AS p90,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.95)) AS INTEGER) AS p95,
    CAST(ROUND(APPROX_PERCENTILE(num_rates, 0.99)) AS INTEGER) AS p99,
    MAX(num_rates) AS max_rates
FROM agg
""", con=trino_conn)
df_int.T

Q3: How often do contracts contain multiple implant rev codes with different rates?

Only 57% of contracts have no rate difference across revenue codes. About 37% have a difference of 0-10 percentage points. This is much higher variability than v0 (which had 94% with no difference).

diff_bucket	num_contracts	pct_contracts
no_diff	22831	56.69
0-10	14837	36.84
10-25	1605	3.99
25-50	509	1.26
50+	74	0.18
unknown	419	1.04

Code

df_diff = pd.read_sql(f"""
WITH
df AS (
    SELECT
        contract_id,
        min(avg_rate) AS min_rate,
        max(avg_rate) AS max_rate,
        max(avg_rate) - min(avg_rate) AS rate_diff
    FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_{cld_subversion}
    GROUP BY 1
)
SELECT
    CASE
        WHEN rate_diff = 0 THEN 'no_diff'
        WHEN rate_diff > 0 AND rate_diff <= 10 THEN '0-10'
        WHEN rate_diff > 10 AND rate_diff <= 25 THEN '10-25'
        WHEN rate_diff > 25 AND rate_diff <= 50 THEN '25-50'
        WHEN rate_diff > 50 THEN '50+'
        ELSE 'unknown'
    END AS diff_bucket,
    COUNT(*) AS num_contracts
FROM df
GROUP BY 1
ORDER BY
    CASE
        WHEN diff_bucket = 'no_diff' THEN 1
        WHEN diff_bucket = '0-10' THEN 2
        WHEN diff_bucket = '10-25' THEN 3
        WHEN diff_bucket = '25-50' THEN 4
        WHEN diff_bucket = '50+' THEN 5
        ELSE 6
    END
""", con=trino_conn)
df_diff['pct_contracts'] = 100 * df_diff['num_contracts'] / df_diff['num_contracts'].sum()
print(df_diff.to_markdown(index=False))

Q4: How does the rate diff distribution look with whole integer rounding?

When rounding to whole integers, 99% of contracts have no difference. This confirms that the rate variability in Q3 is largely due to decimal precision, not actual rate differences.

diff_bucket	num_contracts	pct_contracts
no_diff	39856	98.96
unknown	419	1.04

Code

df_diff_int = pd.read_sql(f"""
WITH
rounded_source AS (
    SELECT
        CAST(payer_id AS VARCHAR)
            || CAST(network_id AS VARCHAR)
            || provider_id AS contract_id,
        ROUND(percentage, 0) AS percentage_int
    FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_{cld_subversion}_ALL
),
agg AS (
    SELECT
        contract_id,
        ROUND(AVG(percentage_int), 2) AS avg_rate
    FROM rounded_source
    GROUP BY 1
),
df AS (
    SELECT
        contract_id,
        min(avg_rate) AS min_rate,
        max(avg_rate) AS max_rate,
        max(avg_rate) - min(avg_rate) AS rate_diff
    FROM agg
    GROUP BY 1
)
SELECT
    CASE
        WHEN rate_diff = 0 THEN 'no_diff'
        WHEN rate_diff > 0 AND rate_diff <= 10 THEN '0-10'
        WHEN rate_diff > 10 AND rate_diff <= 25 THEN '10-25'
        WHEN rate_diff > 25 AND rate_diff <= 50 THEN '25-50'
        WHEN rate_diff > 50 THEN '50+'
        ELSE 'unknown'
    END AS diff_bucket,
    COUNT(*) AS num_contracts
FROM df
GROUP BY 1
ORDER BY
    CASE
        WHEN diff_bucket = 'no_diff' THEN 1
        WHEN diff_bucket = '0-10' THEN 2
        WHEN diff_bucket = '10-25' THEN 3
        WHEN diff_bucket = '25-50' THEN 4
        WHEN diff_bucket = '50+' THEN 5
        ELSE 6
    END
""", con=trino_conn)
df_diff_int['pct_contracts'] = 100 * df_diff_int['num_contracts'] / df_diff_int['num_contracts'].sum()
print(df_diff_int.to_markdown(index=False))

Notes

payer_id = 12 is the unsorted payer ID so excluded it.
Schema changes from core_rates to hospital_rates:
- billing_class = 'institutional' → billing_class = 'Facility' or NULL
- negotiated_type → contract_methodology
- product_network_label → plan_name
- billing_code → revenue_code (separate column)
hospital_rates does not have service_code, so setting is used instead and split into separate rows for IP vs OP.
The mode filter uses BETWEEN 1 AND 101 to account for unusual negotiated_dollar / gross_charge calculated values.
Only Commercial payer_class is included.
Rounding was updated to 2 decimal places to address false rate differences.
There are placeholder 0.01 values in hospital_rates for compliance.

Open Items

May need to use historical hospital versioning to match all IDs.
COMPLETE: ~~Need to determine if billing_code_types besides APC should be excluded (NF: anything not NULL or HCPCS).~~
COMPLETE: ~~Revenue codes will need reformatting to match between datasets (e.g., '275' vs '0275').~~
COMPLETE: ~~setting is NULL for payer data in v0, so will need to add logic when combining sources.~~
COMPLETE: ~~~~Negotiated percentages at 100 are currently included but may be updated to exclude (cap at 99).~~ Prioritized non-100 rates.
Individual hospital IDs are not surfaced in the final table, so will need a traceability table/feature once added to a DAG.
There is some duplication of unique_id because of the IP / OP split, need to add that to the hash. Besides these, there are a handful of duplicate unique_ids (0.17%) because of issues in hospital rates (null plan name, PPO not mapping in plan bridge). Added IP/OP to the hash, plus in_clear_rates as a bit at the end. Also, filtered out null plan names.

Goal​

Scope​

Methodology​

SQL​

Analysis​

Q1: How often does a contract-revenue_code pair have multiple different rates?​

Q2: Does the distribution change when rounded to whole integers?​

Q3: How often do contracts contain multiple implant rev codes with different rates?​

Q4: How does the rate diff distribution look with whole integer rounding?​

Notes​

Open Items​

On this page:

Goal

Scope

Methodology

SQL

Analysis

Q1: How often does a contract-revenue_code pair have multiple different rates?

Q2: Does the distribution change when rounded to whole integers?

Q3: How often do contracts contain multiple implant rev codes with different rates?

Q4: How does the rate diff distribution look with whole integer rounding?

Notes

Open Items