Skip to main content

version 1.3

Final Table: tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_3_validated_{cld_subversion}


ID Reference

The final table contains several ID columns at different levels of granularity.

IDGranularityConstructionPurpose
canonical_idpayer + network + provider + setting{payer_id}{network_id}{provider_id}{OP|IP}{1|0}Primary key of the final table. Represents a unique contract-setting combination, agnostic of billing code.
contract_idpayer + network + provider{payer_id}{network_id}{provider_id}Groups all settings (IP/OP) for a single payer-network-provider relationship.
unique_idpayer + network + provider + billing_code + setting + in_clear_rates{payer_id}{network_id}{provider_id}{billing_code}{OP|IP}{1|0}Identifies a specific rate at the billing code level. Appears as hospital_unique_id, payer_unique_id, and canonical_unique_id.
source_idindividual rate recordOriginal id from source tableLinks back to the specific row in core_rates or hospital_rates. Appears as hospital_source_id, payer_source_id, and canonical_source_id.

ID Hierarchy:

contract_id (broadest)
└── canonical_id (adds setting: IP or OP)
└── unique_id (adds billing_code: 275, 276, 277, or 278) and in_clear_rates flag
└── source_id (individual rate record)

Trailing Flags in IDs:

  • OP / IP — Setting suffix (Outpatient or Inpatient)
  • 1 / 0in_clear_rates flag (1 = network mapped to Clear Rates spine, 0 = unmapped)

Data Flow Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│ SOURCE DATA │
├──────────────────────────────────┬──────────────────────────────────────────┤
│ tq_production.public.core_rates │ glue.hospital_data.hospital_rates │
│ │ │
└──────────────────────────────────┴──────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: BASE TABLES │
│ Filter to RC 275-278, join spines, map service codes and setting to IP/OP │
├──────────────────────────────────┬──────────────────────────────────────────┤
│ tmp_implant_rates_cr_v1_3 │ tmp_implant_rates_hr_v1_3 │
│ (individual payer rates) │ (individual hospital rates) │
└──────────────────────────────────┴──────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 2: AGGREGATED TABLES │
│ Group by unique_id, compute min/max/avg/median/mode │
├──────────────────────────────────┬──────────────────────────────────────────┤
│ implant_rates_cr_v1_3 │ implant_rates_hr_v1_3 │
│ (aggregated payer rates) │ (aggregated hospital rates) │
└──────────────────────────────────┴──────────────────────────────────────────┘
│ │
└──────────────────┬───────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 3: CANONICAL TABLE │
│ Union sources → Validate across sources → Select best rate → Pivot |
│ Group by canonical_id |
├─────────────────────────────────────────────────────────────────────────────┤
│ implant_canonical_percentages_v1_3_validated │
│ (final output with hospital/payer/canonical columns) │
└─────────────────────────────────────────────────────────────────────────────┘

Step 1: Base Tables

1A: Core Rates

This table filters payer data to implant revenue codes ids, standardizes percentage values, and assigns networks based on their inclusion in Clear Rates. It also contains a setting field which is mapped via service code.

Input: tq_production.public_{core_rate_month}.core_rates

Output: tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion}

Filtering Logic:

WHERE billing_code_type = 'RC'
AND billing_code IN ('0275', '0276', '0277', '0278')
AND billing_class = 'institutional'
AND negotiated_type = 'percentage'

Network Mapping:

  • Join to tmp_ref_network_spine via payer_id + payer_data_network
  • If match found: use spine's network_id and network_name, set in_clear_rates = TRUE
  • If no match: generate network_id via xxhash64(payer_id || product_network_label), use product_network_label as name, set in_clear_rates = FALSE

Percentage Normalization:

CASE
WHEN (
CASE WHEN negotiated_rate < 1 THEN negotiated_rate * 100 ELSE negotiated_rate END
) BETWEEN 1 AND 100
THEN ROUND(
CASE WHEN negotiated_rate < 1 THEN negotiated_rate * 100 ELSE negotiated_rate END,
2
)
ELSE NULL
END AS percentage
  • Rates < 1 are treated as decimals and multiplied by 100
  • Only rates in range [1, 100] are kept; others become NULL

Service Code → Setting Mapping:

SettingService Codes
Outpatient11 (Office), 22 (On Campus-Outpatient), 49 (Independent Clinic), 62 (Comprehensive Outpatient Rehab), 65 (ESRD Facility), 72 (Rural Health Clinic), 23 (ER), 24 (ASC), 20 (Urgent Care), 01 (Pharmacy), 81 (Laboratory), 10 (Telehealth)
Inpatient21 (Inpatient Hospital), 25 (Birthing Center), 34 (Hospice), 55 (Residential Substance Abuse), 61 (Comprehensive Inpatient Rehab)
BothNULL or blank service_code creates both IP and OP rows

unique_id Construction:

{payer_id}{network_id}{provider_id}{billing_code (no leading zeros)}{OP|IP}{1|0 for in_clear_rates}

1B: Hospital Rates

This table filters hospital data to implant revenue codes ids, standardizes percentage values (plus calculates percent from dollar and gross charge), and assigns networks based on their inclusion in Clear Rates. It also groups unmapped plans into a single plan.

Input: glue.hospital_data.hospital_rates

Output: tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion}

Filtering Logic:

WHERE (billing_code_type = 'HCPCS' OR billing_code_type IS NULL)
AND revenue_code IN ('275', '276', '277', '278')
AND (billing_class = 'Facility' OR billing_class IS NULL)
AND contract_methodology = 'percent of total billed charges'
AND payer_id != 12
AND payer_class_name = 'Commercial'
AND (ns.network_id IS NOT NULL OR plan_name IS NOT NULL)

Network Mapping:

  • Join to tmp_ref_plan_bridge (both RC and standard) via id, provider_id, payer_id
  • Then join to tmp_ref_network_spine via payer_id, network_name, network_type
  • Fallback: generate network_id via xxhash64(payer_id || 'unmapped')

Percentage Derivation:

CASE
WHEN negotiated_percentage < 1 THEN negotiated_percentage * 100
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END

This allows deriving percentage from dollar amounts when negotiated_percentage is not directly available.

Setting Mapping:

  • source_setting LIKE '%Outpatient%' → Outpatient
  • source_setting LIKE '%Inpatient%' → Inpatient
  • NULL/blank → Both IP and OP rows created

Step 2: Aggregated Tables

These tables aggregate to the unique_id and calculate various metrics for the ids in that grouping. Both sources follow the same aggregation pattern.

Grouping Key: unique_id (which encodes payer + network + provider + billing_code + setting + in_clear_rates)

Computed Metrics:

ColumnLogic
rate_id_countCOUNT(DISTINCT id)
min_rateMIN(percentage)
max_rateMAX(percentage)
avg_rateROUND(AVG(percentage), 2)
median_rateMiddle element of sorted array
mode_rateMost frequent percentage (via max_by(percentage, count))
num_ratesCOUNT(DISTINCT percentage)

Mode Calculation:

WITH mode_calc AS (
SELECT
unique_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT unique_id, percentage, COUNT(*) AS cnt
FROM tmp_table
WHERE percentage IS NOT NULL
GROUP BY 1, 2
)
GROUP BY 1
)

Median Calculation:

ELEMENT_AT(
ARRAY_SORT(ARRAY_AGG(percentage)),
CAST(CEIL(CARDINALITY(ARRAY_AGG(percentage)) / 2.0) AS INTEGER)
) AS median_rate

This is a true median calculation, so that we can directly join back to a source table id in the final results.

Step 3: Canonical Table

This step combines both sources, validates across them, and selects the canonical rate.

3.1 Union Sources

Both aggregated tables are unioned with:

  • Source marked as 'payer' or 'hospital'
  • Filter: WHERE num_rates > 0
  • canonical_id created by removing billing_code from unique_id:
    {payer_id}{network_id}{provider_id}{OP|IP}{1|0}

3.2 Cross-Source Validation

For each row, check if the other source has a matching record (same canonical_id + billing_code but different source).

Validation Condition (v1.3):

other.source IS NOT NULL
AND ABS(COALESCE(other.max_rate, 0) - COALESCE(c.max_rate, 0)) < 5
AND GREATEST(COALESCE(other.max_rate, 0), COALESCE(c.max_rate, 0)) < 100

A rate is validated if:

  1. The other source exists
  2. The difference is less than 5 percentage points
  3. Both rates are below 100%

This is computed separately for max_ratemax_validated and median_ratemedian_validated.

3.3 Best Rate Selection

Priority order (uses first non-NULL):

COALESCE(
CASE WHEN max_validated THEN max_rate END, -- 1. Validated max
CASE WHEN median_validated THEN median_rate END, -- 2. Validated median
CASE WHEN max_rate < 100 THEN max_rate END, -- 3. Unvalidated max < 100
CASE WHEN median_rate < 100 THEN median_rate END,-- 4. Unvalidated median < 100
max_rate, -- 5. Any max
median_rate -- 6. Any median
) AS best_rate

metric_used tracks which metric was selected ('max' or 'median').

is_validated = TRUE if either max_validated or median_validated is TRUE.

3.4 Ranking

Two ranking schemes are applied:

Within-source rank (source_rank): Best row per canonical_id + source Overall rank (overall_rank): Best row per canonical_id across all sources

Ranking criteria (in order):

  1. is_validated DESC — validated rates first
  2. (billing_code = '278' AND best_rate < 100) DESC — preferred code with valid rate
  3. (best_rate < 100) DESC — any valid rate
  4. (billing_code = '278') DESC — preferred code
  5. best_rate DESC — highest rate

3.5 Pivot and Output

The final table pivots to show:

  • Hospital columns: hospital_best_rate, hospital_unique_id, hospital_metric_used, hospital_billing_code
  • Payer columns: payer_best_rate, payer_unique_id, payer_metric_used, payer_billing_code
  • Canonical columns: canonical_rate, canonical_source, canonical_unique_id, canonical_billing_code, canonical_metric, canonical_validated
  • Source IDs: hospital_source_id, payer_source_id, canonical_source_id (joined back from tmp tables)

Output Schema

ColumnTypeDescription
canonical_idVARCHARPrimary key: payer + network + provider + setting + in_clear_rates
contract_idVARCHARpayer_id + network_id + provider_id
payer_idVARCHARPayer identifier
payer_nameVARCHARPayer name
network_idBIGINTNetwork identifier (from spine or generated)
network_nameVARCHARNetwork name
provider_idVARCHARProvider identifier
provider_nameVARCHARProvider name
settingVARCHAR'Inpatient' or 'Outpatient'
in_clear_ratesBOOLEANTRUE if network mapped to Clear Rates spine
hospital_best_rateDOUBLEBest rate from hospital source
hospital_unique_idVARCHARSource unique_id from hospital
hospital_metric_usedVARCHAR'max' or 'median'
hospital_billing_codeVARCHARRevenue code used (275-278)
payer_best_rateDOUBLEBest rate from payer source
payer_unique_idVARCHARSource unique_id from payer
payer_metric_usedVARCHAR'max' or 'median'
payer_billing_codeVARCHARRevenue code used (275-278)
canonical_rateDOUBLEFinal selected rate
canonical_sourceVARCHAR'hospital' or 'payer'
canonical_unique_idVARCHARSource unique_id for canonical
canonical_billing_codeVARCHARRevenue code used
canonical_metricVARCHAR'max' or 'median'
canonical_validatedBOOLEANTRUE if cross-source validated
hospital_source_idVARCHAROriginal rate ID from hospital_rates
payer_source_idVARCHAROriginal rate ID from core_rates
canonical_source_idVARCHARSource ID for the canonical rate

SQL

Step 1A: Core Rates Base Table
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion} AS

WITH network_spine AS (
SELECT
payer_id,
network_id,
network_name,
t.payer_data_network
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion},
UNNEST(payer_data_network) AS t(payer_data_network)
),

provider_spine AS (
SELECT
provider_id,
provider_name
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
WHERE provider_id IS NOT NULL
AND provider_type = 'Hospital'
),

core_rates_base AS (
SELECT
cr.payer_id,
cr.payer_name,
prvs.provider_name,
COALESCE(
ns.network_id,
from_big_endian_64(xxhash64(CAST(cr.payer_id || cr.product_network_label AS varbinary)))
) AS network_id,
COALESCE(ns.network_name, cr.product_network_label) AS network_name,
ns.network_name IS NOT NULL AS in_clear_rates,
cr.provider_id,
cr.billing_code,
cr.service_code AS source_setting,
cr.id,
CASE
WHEN (
CASE
WHEN negotiated_rate < 1 THEN negotiated_rate * 100
ELSE negotiated_rate
END
) BETWEEN 1 AND 100
THEN ROUND(
CASE
WHEN negotiated_rate < 1 THEN negotiated_rate * 100
ELSE negotiated_rate
END,
2
)
ELSE NULL
END AS percentage
FROM tq_production.public_{core_rate_month}.core_rates cr
LEFT JOIN network_spine ns
ON ns.payer_id = cr.payer_id
AND ns.payer_data_network = cr.product_network_label
JOIN provider_spine prvs
ON prvs.provider_id = cr.provider_id
WHERE cr.billing_code_type = 'RC'
AND cr.billing_code IN ('0275', '0276', '0277', '0278')
AND cr.billing_class = 'institutional'
AND cr.negotiated_type = 'percentage'
),

op_rows AS (
SELECT
*,
'Outpatient' AS setting
FROM core_rates_base
WHERE source_setting LIKE '%11%'
OR source_setting LIKE '%22%'
OR source_setting LIKE '%49%'
OR source_setting LIKE '%62%'
OR source_setting LIKE '%65%'
OR source_setting LIKE '%72%'
OR source_setting LIKE '%23%'
OR source_setting LIKE '%24%'
OR source_setting LIKE '%20%'
OR source_setting LIKE '%01%'
OR source_setting LIKE '%81%'
OR source_setting LIKE '%10%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),

ip_rows AS (
SELECT
*,
'Inpatient' AS setting
FROM core_rates_base
WHERE source_setting LIKE '%21%'
OR source_setting LIKE '%25%'
OR source_setting LIKE '%34%'
OR source_setting LIKE '%55%'
OR source_setting LIKE '%61%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),

core_rates AS (
SELECT * FROM op_rows
UNION ALL
SELECT * FROM ip_rows
)

SELECT
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
|| LTRIM(billing_code, '0')
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS unique_id,
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
AS contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
LTRIM(billing_code, '0') AS billing_code,
setting,
id,
percentage
FROM core_rates
""", con=trino_conn)
Step 1B: Hospital Rates Base Table
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion} AS

WITH plan_bridge AS (
SELECT
network_name,
id,
provider_id,
payer_id,
network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_rc_plan_bridge_{cld_subversion}
UNION ALL
SELECT
network_name,
id,
provider_id,
payer_id,
network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_plan_bridge_{cld_subversion}
),

network_spine AS (
SELECT
payer_id,
network_id,
network_name,
network_type
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_network_spine_{cld_subversion}
),

provider_spine AS (
SELECT
provider_id,
provider_name
FROM tq_dev.internal_dev_csong_cld_{cld_version}.tmp_ref_provider_spine_{cld_subversion}
WHERE provider_id IS NOT NULL
AND provider_type = 'Hospital'
),

hospital_rates_base AS (
SELECT
hr.payer_id,
hr.payer_name,
COALESCE(
ns.network_id,
from_big_endian_64(xxhash64(CAST(CAST(hr.payer_id AS VARCHAR) || 'unmapped' AS varbinary)))
) AS network_id,
COALESCE(ns.network_name, 'unmapped') AS network_name,
ns.network_name IS NOT NULL AS in_clear_rates,
hr.provider_id,
prvs.provider_name,
hr.revenue_code AS billing_code,
hr.setting AS source_setting,
hr.id,
CASE
WHEN (
CASE
WHEN negotiated_percentage < 1
THEN negotiated_percentage * 100
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END
) BETWEEN 1 AND 100
THEN ROUND(
CASE
WHEN negotiated_percentage < 1
THEN negotiated_percentage * 100
WHEN negotiated_dollar IS NOT NULL
AND gross_charge IS NOT NULL
AND negotiated_dollar > 0
AND gross_charge > 0
THEN (negotiated_dollar / gross_charge) * 100
ELSE negotiated_percentage
END,
2
)
ELSE NULL
END AS percentage
FROM glue.hospital_data.hospital_rates hr
LEFT JOIN plan_bridge pb
ON hr.id = pb.id
AND hr.provider_id = pb.provider_id
AND hr.payer_id = CAST(pb.payer_id AS INT)
LEFT JOIN network_spine ns
ON ns.payer_id = pb.payer_id
AND ns.network_name = pb.network_name
AND ns.network_type = pb.network_type
JOIN provider_spine prvs
ON prvs.provider_id = hr.provider_id
WHERE (hr.billing_code_type = 'HCPCS' OR hr.billing_code_type IS NULL)
AND hr.revenue_code IN ('275', '276', '277', '278')
AND (hr.billing_class = 'Facility' OR hr.billing_class IS NULL)
AND hr.contract_methodology = 'percent of total billed charges'
AND hr.payer_id != 12
AND hr.payer_class_name = 'Commercial'
AND (ns.network_id IS NOT NULL OR plan_name IS NOT NULL)
),

op_rows AS (
SELECT
*,
'Outpatient' AS setting
FROM hospital_rates_base
WHERE source_setting LIKE '%Outpatient%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),

ip_rows AS (
SELECT
*,
'Inpatient' AS setting
FROM hospital_rates_base
WHERE source_setting LIKE '%Inpatient%'
OR source_setting IS NULL
OR TRIM(source_setting) = ''
),

hospital_rates AS (
SELECT * FROM op_rows
UNION ALL
SELECT * FROM ip_rows
)

SELECT
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
|| billing_code
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS unique_id,
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
AS contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
billing_code,
setting,
id,
percentage
FROM hospital_rates
""", con=trino_conn)
Step 2A: Aggregated Core Rates
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_cr_v1_3_{cld_subversion} AS

WITH mode_calc AS (
SELECT
unique_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT
unique_id,
percentage,
COUNT(*) AS cnt
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion}
WHERE percentage IS NOT NULL
GROUP BY 1, 2
)
GROUP BY 1
)

SELECT
cr.unique_id,
cr.contract_id,
cr.payer_id,
cr.payer_name,
cr.network_id,
cr.network_name,
cr.in_clear_rates,
cr.provider_id,
cr.provider_name,
cr.billing_code,
cr.setting,
COUNT(DISTINCT cr.id) AS rate_id_count,
MIN(cr.percentage) AS min_rate,
MAX(cr.percentage) AS max_rate,
ROUND(AVG(cr.percentage), 2) AS avg_rate,
ELEMENT_AT(
ARRAY_SORT(ARRAY_AGG(cr.percentage)),
CAST(CEIL(CARDINALITY(ARRAY_AGG(cr.percentage)) / 2.0) AS INTEGER)
) AS median_rate,
ANY_VALUE(m.mode_rate) AS mode_rate,
COUNT(DISTINCT cr.percentage) AS num_rates
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{cld_subversion} cr
LEFT JOIN mode_calc m
ON m.unique_id = cr.unique_id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
""", con=trino_conn)
Step 2B: Aggregated Hospital Rates
query.execute_query(f"""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{cld_subversion} AS

WITH mode_calc AS (
SELECT
unique_id,
max_by(percentage, cnt) AS mode_rate
FROM (
SELECT
unique_id,
percentage,
COUNT(*) AS cnt
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion}
WHERE percentage IS NOT NULL
GROUP BY 1, 2
)
GROUP BY 1
)

SELECT
hr.unique_id,
hr.contract_id,
hr.payer_id,
hr.payer_name,
hr.network_id,
hr.network_name,
hr.in_clear_rates,
hr.provider_id,
hr.provider_name,
hr.billing_code,
hr.setting,
COUNT(DISTINCT hr.id) AS rate_id_count,
MIN(hr.percentage) AS min_rate,
MAX(hr.percentage) AS max_rate,
ROUND(AVG(hr.percentage), 2) AS avg_rate,
ELEMENT_AT(
ARRAY_SORT(ARRAY_AGG(hr.percentage)),
CAST(CEIL(CARDINALITY(ARRAY_AGG(hr.percentage)) / 2.0) AS INTEGER)
) AS median_rate,
ANY_VALUE(m.mode_rate) AS mode_rate,
COUNT(DISTINCT hr.percentage) AS num_rates
FROM tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{cld_subversion} hr
LEFT JOIN mode_calc m
ON m.unique_id = hr.unique_id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
""", con=trino_conn)
Step 3: Combined Canonical Table
from jinja2 import Template

sources = ['hospital', 'payer']
metrics = ['max', 'median']
source_fields = ['best_rate', 'unique_id', 'metric_used', 'billing_code']

STEP_3_TEMPLATE = Template("""
CREATE OR REPLACE TABLE tq_dev.internal_dev_mmalhotra.implant_canonical_percentages_v1_3_validated_{{ cld_subversion }} AS

WITH combined AS (
SELECT
unique_id,
contract_id,
payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
billing_code,
setting,
max_rate,
median_rate,
'payer' AS source,
payer_id
|| CAST(network_id AS VARCHAR)
|| provider_id
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS canonical_id
FROM tq_dev.internal_dev_mmalhotra.implant_rates_cr_v1_3_{{ cld_subversion }}
WHERE num_rates > 0

UNION ALL

SELECT
unique_id,
contract_id,
CAST(payer_id AS VARCHAR) AS payer_id,
payer_name,
network_id,
network_name,
in_clear_rates,
provider_id,
provider_name,
billing_code,
setting,
max_rate,
median_rate,
'hospital' AS source,
CAST(payer_id AS VARCHAR)
|| CAST(network_id AS VARCHAR)
|| provider_id
|| CASE
WHEN setting = 'Outpatient' THEN 'OP'
WHEN setting = 'Inpatient' THEN 'IP'
ELSE setting
END
|| CASE WHEN in_clear_rates THEN '1' ELSE '0' END
AS canonical_id
FROM tq_dev.internal_dev_mmalhotra.implant_rates_hr_v1_3_{{ cld_subversion }}
WHERE num_rates > 0
),

with_validation AS (
SELECT
c.*,
{%- for metric in metrics %}
other.source IS NOT NULL
AND ABS(COALESCE(other.{{ metric }}_rate, 0) - COALESCE(c.{{ metric }}_rate, 0)) < 5
AND GREATEST(COALESCE(other.{{ metric }}_rate, 0), COALESCE(c.{{ metric }}_rate, 0)) < 100
AS {{ metric }}_validated{{ "," if not loop.last else "" }}
{%- endfor %}
FROM combined c
LEFT JOIN combined other
ON other.canonical_id = c.canonical_id
AND other.billing_code = c.billing_code
AND other.source != c.source
),

with_best_rate AS (
SELECT
*,
COALESCE(
CASE WHEN max_validated THEN max_rate END,
CASE WHEN median_validated THEN median_rate END,
CASE WHEN max_rate < 100 THEN max_rate END,
CASE WHEN median_rate < 100 THEN median_rate END,
max_rate,
median_rate
) AS best_rate,
CASE
WHEN max_validated THEN 'max'
WHEN median_validated THEN 'median'
WHEN max_rate < 100 THEN 'max'
WHEN median_rate < 100 THEN 'median'
WHEN max_rate IS NOT NULL THEN 'max'
ELSE 'median'
END AS metric_used,
COALESCE(max_validated, FALSE) OR COALESCE(median_validated, FALSE) AS is_validated
FROM with_validation
),

ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY canonical_id, source
ORDER BY
is_validated DESC,
(billing_code = '278' AND best_rate < 100) DESC,
(best_rate < 100) DESC,
(billing_code = '278') DESC,
best_rate DESC
) AS source_rank,
ROW_NUMBER() OVER (
PARTITION BY canonical_id
ORDER BY
is_validated DESC,
(billing_code = '278' AND best_rate < 100) DESC,
(best_rate < 100) DESC,
(billing_code = '278') DESC,
best_rate DESC
) AS overall_rank
FROM with_best_rate
),

pivoted AS (
SELECT
canonical_id,
ANY_VALUE(contract_id) AS contract_id,
ANY_VALUE(payer_id) AS payer_id,
ANY_VALUE(payer_name) AS payer_name,
ANY_VALUE(network_id) AS network_id,
ANY_VALUE(network_name) AS network_name,
ANY_VALUE(provider_id) AS provider_id,
ANY_VALUE(provider_name) AS provider_name,
ANY_VALUE(setting) AS setting,
ANY_VALUE(in_clear_rates) AS in_clear_rates,
{%- for src in sources %}
{%- for field in source_fields %}
MAX(CASE WHEN source = '{{ src }}' AND source_rank = 1 THEN {{ field }} END) AS {{ src }}_{{ field }},
{%- endfor %}
{%- endfor %}
MAX(CASE WHEN overall_rank = 1 THEN best_rate END) AS canonical_rate,
MAX(CASE WHEN overall_rank = 1 THEN source END) AS canonical_source,
MAX(CASE WHEN overall_rank = 1 THEN unique_id END) AS canonical_unique_id,
MAX(CASE WHEN overall_rank = 1 THEN billing_code END) AS canonical_billing_code,
MAX(CASE WHEN overall_rank = 1 THEN metric_used END) AS canonical_metric,
MAX(CASE WHEN overall_rank = 1 THEN is_validated END) AS canonical_validated
FROM ranked
WHERE source_rank = 1
OR overall_rank = 1
GROUP BY canonical_id
)

SELECT
p.*,
ANY_VALUE(hr_source.id) AS hospital_source_id,
ANY_VALUE(cr_source.id) AS payer_source_id,
CASE
WHEN p.canonical_source = 'hospital' THEN ANY_VALUE(hr_source.id)
ELSE ANY_VALUE(cr_source.id)
END AS canonical_source_id
FROM pivoted p
LEFT JOIN tq_dev.internal_dev_mmalhotra.tmp_implant_rates_hr_v1_3_{{ cld_subversion }} hr_source
ON hr_source.unique_id = p.hospital_unique_id
AND hr_source.percentage = p.hospital_best_rate
LEFT JOIN tq_dev.internal_dev_mmalhotra.tmp_implant_rates_cr_v1_3_{{ cld_subversion }} cr_source
ON cr_source.unique_id = p.payer_unique_id
AND cr_source.percentage = p.payer_best_rate
GROUP BY AUTO
""")

step_3_sql = STEP_3_TEMPLATE.render(
cld_subversion=cld_subversion, sources=sources, metrics=metrics, source_fields=source_fields
)
query.execute_query(step_3_sql, con=trino_conn)