OPG Base Rate Tuning

summary

Problems:

SEs have raised concerns about the accuracy of some OPG base rates. In particular:

some OPG rates appear worse than raw rates
some OPG rates are too low
some OPG rates are inferred from a very small sample of posted MRF rates

Solutions:

Tighten the conditions under which OPG base rates are imputed.
Implement a lower outlier bound for OP rates in general (effective starting v2.3.0)

Background

The OPG (Outpatient Procedure Grouper) base rate imputation tier uses payer-specific mappings from HCPCS codes to Outpatient Procedure Groups to infer the corresponding base rates. These mappings are sourced directly from our first-party data.

The CLD methodlogy for making these inferences is described in detail here.

The following fields are used in the Clear Rates pipeline to evaluate candidate base rates.

Definitions:

opg: the opg group name
opg_base_rate: the base rate for the opg group
- this field is only populated if the OPG imputation conditions are met
opg_candidate_base_rate: the candidate base rate for the opg group
- this field is always populated with the mode rate for the opg group
opg_n_freq: the frequency count of the mode rate for the opg group
opg_n_total: the total count of rates available for the opg group
opg_n_total_possible: the total possible count for the opg group

Solution

Current Logic:

opg_base_rate is populated whenever opg_n_freq / opg_n_total > 0.8

The current logic results in some OPG base rates being inferred from a very small number of posted MRF rates. For example, if there are only 2 posted MRF rates for an OPG group, and both rates are the same, then the OPG base rate will be set to that rate, even though the sample size is very small. We used to have a condition that required opg_n_total > 15, but this was removed in a previous update.

Proposed Logic: use if either of below are true

opg_n_freq / opg_n_total > 0.8 AND opg_n_total > 100
opg_n_freq / opg_n_total_possible > 0.5

We should bring back the minimum sample size condition, and increase it to 100. This will ensure that we have a sufficient number of posted MRF rates to make a reliable inference. In addition, we should add a new condition that requires the frequency count to be at least 20% of the total possible count. This will help for cases where the total count is small, but the frequency count is still relatively high. As an example, there are some OPG groups that only have 5 possible rates, so if 2 of those rates are the same, then the OPG base rate will be set to that rate, even though the sample size is small.

Analysis

The purpose of this analysis is to evaluate the following:

With the proposed logic, how many OPG base rates would we lose?
Are the proposed conditions reasonable?
- what is the distribution of opg_n_total_possible?
- what is the distribution of opg_n_freq / opg_n_total_possible?
Are the imputations reasonable?
Are imputed OPGs monotonic increasing?

1. Number of OPG Base Rates Lost

version	n_opg_imputed_rates
old	90324394
new	63009127

We would lose 90m - 63m = 27m OPG base rates

Code

df = pd.read_sql(f"""
SELECT COUNT(*) n_opg_imputed_rates, 'old' as version
FROM tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_imputations_derived_2025_09
WHERE 1.00 * opg_n_freq / opg_n_total > 0.8
UNION ALL
SELECT COUNT(*) n_opg_imputed_rates, 'new' as version
FROM tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_imputations_derived_2025_09
WHERE (1.00 * opg_n_freq / opg_n_total > 0.8 AND opg_n_total > 100)
OR (1.00 * opg_n_freq / opg_n_total_possible > 0.5)
""", con=trino_conn)

2. Are the Proposed Conditions Reasonable?

There are roughly 66 unique OPG groups. This is the distribution of total possible number of codes per OPG group. On average, there are 440 codes.

	opg_n_total_possible
count	66
mean	440.303
std	429.128
min	3
1%	3
10%	22.5
20%	64
30%	155
40%	251
100%	317.5
60%	449
70%	566
80%	663
90%	985
95%	1187.75
99%	1845.45
max	2045

In general, we impute if 80% of reported codes are the same and there are at least 100 reported codes.

Alternatively, we also impute if 50% of total possible codeshave the same rate.For smaller groups, (e.g. 10% have < 22 possible codes), this would only require 11 codes.

isn't this just a guess?

How can we be sure that these conditions are reasonable? We could analyze the imputed rates and see if there is any relationship between "bad" impuatations and the opg_n_freq / opg_n_total or opg_n_total_possible.

One approach is to tune params such that we minimize the number of non-monotonic groups (see section 4).

3. Are the imputations reasonable?

United

The plot below shows box plots for United OPGs, where each unit represents a provider-network.

alt text

Sample Providers

Here is a random sample of 10 provider-networks, color-coded by their slope.

Note one of the examples "dips" from OPG 3 to OPG 4, then goes back up for OPG 5. We should look for decreases like this, as they may indicate bad imputations.

alt text

4. Are imputed OPGs monotonic increasing?

About 10% of payer-network-providers have at least one violation where the base rate for an OPG group is less than or equal to the base rate for the previous OPG group.

We should review these + remove them.

monotonic_increasing_groups	non_monotonic_groups
23542	1970

WITH filtered AS (
    SELECT 
        payer_id,
        network_id,
        provider_id,
        CAST(opg AS integer) AS opg,
        opg_candidate_base_rate
    FROM tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_imputations_derived_2025_09
    WHERE
        (
            1.0 * opg_n_freq / opg_n_total > 0.8
            AND opg_n_total > 100
        )
        OR (
            1.0 * opg_n_freq / opg_n_total_possible > 0.5
        )
        AND CAST(opg AS integer) < 8
),
with_violations AS (
    SELECT
        payer_id,
        network_id,
        provider_id,
        CASE 
            WHEN opg_candidate_base_rate < LAG(opg_candidate_base_rate) OVER (
                    PARTITION BY payer_id, network_id, provider_id
                    ORDER BY opg
                 )
            THEN 1
            ELSE 0
        END AS violation
    FROM filtered
),
group_flags AS (
    SELECT
        payer_id,
        network_id,
        provider_id,
        MAX(violation) AS has_violation
    FROM with_violations
    GROUP BY
        payer_id,
        network_id,
        provider_id
)
SELECT
    SUM(CASE WHEN has_violation = 0 THEN 1 ELSE 0 END) AS monotonic_increasing_groups,
    SUM(CASE WHEN has_violation = 1 THEN 1 ELSE 0 END) AS non_monotonic_groups
FROM group_flags

OPG Group Bounds

In the table below, we see the distribution of OPG group bounds for UHC hospitals.

p05 and p95 are the 5th and 95th percentiles of posted MRF rates for each OPG group.
med_p05 and med_p95 are the medicare 5th and 95th percentiles for each OPG group.

alt text

Here are 20 example contracts:

alt text

Background​

Solution​

Analysis​

1. Number of OPG Base Rates Lost​

2. Are the Proposed Conditions Reasonable?​

3. Are the imputations reasonable?​

4. Are imputed OPGs monotonic increasing?​

OPG Group Bounds​

On this page:

Background

Solution

Analysis

1. Number of OPG Base Rates Lost

2. Are the Proposed Conditions Reasonable?

3. Are the imputations reasonable?

4. Are imputed OPGs monotonic increasing?

OPG Group Bounds