Skip to main content

SQL Reference

This page documents every SQL statement executed by the spines_physician_groups DAG. All Jinja template parameters have been rendered with their default values so the queries are easy to read and run directly against Trino.

Default parameter values used throughout:

ParameterValue
schematq_dev.internal_dev_csong_graph
sub_version2025_07
jaccard_threshold0.3
min_containment_threshold0.5
intersection_count50
payer_mrf_month_for_ein_bridge2025_07

DAG Flow

start_dag
|
v
[data_preparation]
Step 0: Clean Provider Types
Step 1: Physician Groups
Step 2: TIN-HCP Bridge
Step 3: PAC-HCP Bridge
Step 4: HCO-HCP Bridge
Step 5: Health System-HCO Bridge
|
v
[hcp_similarity_analysis] (all 4 run in parallel)
Step 6a: PAC-HCO Similarity
Step 6b: PAC-TIN Similarity
Step 6c: PAC-PAC Similarity
Step 6d: HCO-TIN Similarity
|
v
[analysis]
Step 7: PAC Connections
Step 8: NetworkX Graph (Python)
Step 9: PAC to Health System Associations (Python/NetworkX)
Step 10: Final Output (Python merges)
|
v
[spines] (physicians >> physician_groups; rest in parallel)
Step 11: Physicians
Step 12: Physician Groups Spine
Step 13: Additional NPIs
Step 14: Associated TINs
Step 15: Associated Health Systems
Step 16: Associated Facilities
|
v
[qa_tests]
|
v
[promotion]
|
v
end_dag

Phase 1: Data Preparation

Step 0: Clean Provider Types

Joins raw PECOS reassignment data with the provider-type mapping table to produce a cleaned table with a provider_type column.

Output table: tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07 AS
SELECT df.*, pt.provider_type
FROM tq_raw.cms.pecos_provider_revalidation_reassignment df
JOIN tq_dev.internal_dev_csong_graph.pecos_provider_type_mappings pt
ON df.individual_specialty_description = pt.individual_specialty_description

Step 1: Physician Groups

Builds the core physician groups table from PECOS and Definitive Healthcare data. Each group gets a hashed provider_id, primary specialty classification, and per-provider-type counts.

The 14 provider types are:

  1. Family / Internal Medicine
  2. Hospitalist
  3. Emergency Medicine
  4. Radiologist
  5. Specialist
  6. Nurse / Physician Assistant
  7. Anesthesia & Pain
  8. Behavioral Health
  9. Women's Health / Ob-Gyn
  10. Therapy & Rehab
  11. Dental & Oral
  12. Optometry
  13. Allied / Alternative
  14. Other

Output table: tq_dev.internal_dev_csong_graph.physician_groups_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.physician_groups_2025_07 AS
WITH
pg_npis AS (
SELECT
enrollment_id, national_provider_identifier
FROM tq_raw.cms.pecos_provider_revalidation org
WHERE
organization_name IS NOT NULL
AND enrollment_specialty = 'Clinic/Group Practice'
),
bucket_counts AS (
SELECT
group_pac_id,
ARRAY_AGG(DISTINCT national_provider_identifier) as primary_npis,
MIN(group_legal_business_name) AS group_legal_business_name,
ARRAY_DISTINCT(ARRAY_AGG(group_state_code)) AS group_state_code,
COUNT_IF(provider_type = 'Family / Internal Medicine') AS total_family_internal_medicine,
COUNT_IF(provider_type = 'Hospitalist') AS total_hospitalist,
COUNT_IF(provider_type = 'Emergency Medicine') AS total_emergency_medicine,
COUNT_IF(provider_type = 'Radiologist') AS total_radiologist,
COUNT_IF(provider_type = 'Specialist') AS total_specialist,
COUNT_IF(provider_type = 'Nurse / Physician Assistant') AS total_nurse_physician_assistant,
COUNT_IF(provider_type = 'Anesthesia & Pain') AS total_anesthesia_pain,
COUNT_IF(provider_type = 'Behavioral Health') AS total_behavioral_health,
COUNT_IF(provider_type = 'Women''s Health / Ob-Gyn') AS total_womens_health_ob_gyn,
COUNT_IF(provider_type = 'Therapy & Rehab') AS total_therapy_rehab,
COUNT_IF(provider_type = 'Dental & Oral') AS total_dental_oral,
COUNT_IF(provider_type = 'Optometry') AS total_optometry,
COUNT_IF(provider_type = 'Allied / Alternative') AS total_allied_alternative,
COUNT_IF(provider_type = 'Other') AS total_other,

ARRAY[
COUNT_IF(provider_type = 'Family / Internal Medicine'),
COUNT_IF(provider_type = 'Hospitalist'),
COUNT_IF(provider_type = 'Emergency Medicine'),
COUNT_IF(provider_type = 'Radiologist'),
COUNT_IF(provider_type = 'Specialist'),
COUNT_IF(provider_type = 'Nurse / Physician Assistant'),
COUNT_IF(provider_type = 'Anesthesia & Pain'),
COUNT_IF(provider_type = 'Behavioral Health'),
COUNT_IF(provider_type = 'Women''s Health / Ob-Gyn'),
COUNT_IF(provider_type = 'Therapy & Rehab'),
COUNT_IF(provider_type = 'Dental & Oral'),
COUNT_IF(provider_type = 'Optometry'),
COUNT_IF(provider_type = 'Allied / Alternative'),
COUNT_IF(provider_type = 'Other')
] AS provider_type_counts_array,

ARRAY[
'Family / Internal Medicine',
'Hospitalist',
'Emergency Medicine',
'Radiologist',
'Specialist',
'Nurse / Physician Assistant',
'Anesthesia & Pain',
'Behavioral Health',
'Women''s Health / Ob-Gyn',
'Therapy & Rehab',
'Dental & Oral',
'Optometry',
'Allied / Alternative',
'Other'
] AS provider_type_buckets_array
FROM tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07 p
LEFT JOIN pg_npis pgn
ON p.group_enrollment_id = pgn.enrollment_id
GROUP BY
group_pac_id
),
primary_state AS (
SELECT
group_pac_id,
group_state_code,
state_count,
ROW_NUMBER() OVER (PARTITION BY group_pac_id ORDER BY state_count DESC) AS rn
FROM (
SELECT
group_pac_id,
group_state_code,
COUNT(*) AS state_count
FROM tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07
GROUP BY
group_pac_id, group_state_code
)
),
state_counts AS (
SELECT
group_pac_id,
SUM(CAST(group_reassignments_and_physician_assistants AS INT)) as total_reassignments
FROM (
SELECT DISTINCT group_pac_id, group_state_code, group_reassignments_and_physician_assistants
FROM tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07
)
GROUP BY
group_pac_id
),
pecos AS (
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
bc.group_pac_id
AS varbinary
))) AS VARCHAR) as provider_id,
bc.group_pac_id,
bc.group_legal_business_name as provider_name,
bc.primary_npis,
ps.group_state_code as provider_state,
bc.group_state_code as provider_all_states,
CASE
WHEN (ARRAY_MAX(provider_type_counts_array) > total_reassignments / 2) AND (provider_type_buckets_array[ARRAY_POSITION(provider_type_counts_array, ARRAY_MAX(provider_type_counts_array))] != 'Nurse / Physician Assistant')
THEN provider_type_buckets_array[ARRAY_POSITION(provider_type_counts_array, ARRAY_MAX(provider_type_counts_array))]
WHEN (ARRAY_MAX(provider_type_counts_array) = total_reassignments) AND (provider_type_buckets_array[ARRAY_POSITION(provider_type_counts_array, ARRAY_MAX(provider_type_counts_array))] = 'Nurse / Physician Assistant')
THEN provider_type_buckets_array[ARRAY_POSITION(provider_type_counts_array, ARRAY_MAX(provider_type_counts_array))]
WHEN (ARRAY_MAX(provider_type_counts_array) > total_reassignments / 2) AND (provider_type_buckets_array[ARRAY_POSITION(provider_type_counts_array, ARRAY_MAX(provider_type_counts_array))] = 'Nurse / Physician Assistant')
THEN provider_type_buckets_array[ARRAY_POSITION(provider_type_counts_array, ARRAY_SORT(provider_type_counts_array, (x, y) -> IF(x < y, 1, IF(x = y, 0, -1)))[2])]
ELSE 'Broad Group'
END AS primary_specialty,
ARRAY_MAX(provider_type_counts_array) AS primary_specialty_count,
sc.total_reassignments,
total_family_internal_medicine,
total_hospitalist,
total_emergency_medicine,
total_radiologist,
total_specialist,
total_nurse_physician_assistant,
total_anesthesia_pain,
total_behavioral_health,
total_womens_health_ob_gyn,
total_therapy_rehab,
total_dental_oral,
total_optometry,
total_allied_alternative,
total_other,
'PECOS' as provider_source,
CURRENT_TIMESTAMP AS created_at,
CURRENT_TIMESTAMP AS updated_at
FROM bucket_counts bc
JOIN state_counts sc
ON bc.group_pac_id = sc.group_pac_id
LEFT JOIN primary_state ps
ON bc.group_pac_id = ps.group_pac_id AND ps.rn = 1
),
definitive AS (
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
group_practice_pac_id
AS varbinary
))) AS VARCHAR) as group_pac_id,
group_practice_pac_id AS group_pac_id,
hospital_name AS provider_name,
ARRAY[CAST(npi_number AS VARCHAR)] AS primary_npi,
hq_state AS provider_state,
NULL AS provider_all_states,
main_specialty_group AS primary_specialty,
NULL AS primary_specialty_count,
NULL AS total_reassignments,
NULL AS total_family_internal_medicine,
NULL AS total_hospitalist,
NULL AS total_emergency_medicine,
NULL AS total_radiologist,
NULL AS total_specialist,
NULL AS total_nurse_physician_assistant,
NULL AS total_anesthesia_pain,
NULL AS total_behavioral_health,
NULL AS total_womens_health_ob_gyn,
NULL AS total_therapy_rehab,
NULL AS total_dental_oral,
NULL AS total_optometry,
NULL AS total_allied_alternative,
NULL AS total_other,
'Definitive' AS provider_source,
CURRENT_TIMESTAMP AS created_at,
CURRENT_TIMESTAMP AS updated_at
FROM tq_production.definitive_healthcare.physician_groups_overview
WHERE group_practice_pac_id IS NOT NULL
AND group_practice_pac_id NOT IN (
SELECT group_pac_id
FROM pecos
)
)
SELECT *
FROM pecos
UNION ALL
SELECT *
FROM definitive

Step 2: TIN-HCP Bridge

Links physician group NPIs to their TINs (Tax Identification Numbers) from MRF compressed-providers data. Filters to NPIs that belong to a known physician group and requires at least 4 distinct payers.

Output table: tq_dev.internal_dev_csong_graph.tin_hcp_bridge_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.tin_hcp_bridge_2025_07 AS
WITH
physician_group_npis AS (
SELECT individual_npi
FROM tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07
WHERE group_pac_id IN (
SELECT group_pac_id
FROM tq_dev.internal_dev_csong_graph.physician_groups_2025_07
)
)
SELECT
tin,
npi as hcp_npi,
count(distinct payer_id) as n_payers
FROM tq_production.public_2025_07.compressed_providers cp
WHERE EXISTS (
SELECT 1
FROM physician_group_npis pd
WHERE pd.individual_npi = cp.npi
)
GROUP BY
tin,
npi
HAVING count(distinct payer_id) > 3

Step 3: PAC-HCP Bridge

Links PECOS group PAC IDs to individual provider NPIs using PECOS reassignment data.

Output table: tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07

CREATE TABLE IF NOT EXISTS tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07 AS
SELECT
group_pac_id,
individual_npi as hcp_npi,
enrollment_specialty as physician_primary_specialty,
'PECOS' AS source
FROM tq_raw.cms.pecos_provider_revalidation_reassignment pecos
FULL JOIN tq_raw.cms.pecos_provider_revalidation org
ON org.enrollment_id = pecos.group_enrollment_id
WHERE group_pac_id IN (
SELECT group_pac_id
FROM tq_dev.internal_dev_csong_graph.physician_groups_2025_07
)

Step 4: HCO-HCP Bridge

Links Healthcare Organizations (HCOs) to Healthcare Professionals (HCPs) using Komodo medical encounter data. Joins headers and service lines to find HCP-HCO pairs that share encounters.

Output table: tq_dev.internal_dev_csong_graph.hco_hcp_bridge_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.hco_hcp_bridge_2025_07 AS
WITH

physician_group_npis AS (
SELECT individual_npi
FROM tq_dev.internal_dev_csong_graph.pecos_data_clean_provider_types_2025_07
WHERE group_pac_id IN (
SELECT group_pac_id
FROM tq_dev.internal_dev_csong_graph.physician_groups_2025_07
)
),

npi_group_hco_1_npi AS (
SELECT
encounter_key,
hco_1_npi AS npi
FROM tq_intermediate.external_komodo.medical_headers
WHERE hco_1_npi IS NOT NULL
AND statement_from > DATE('2023-01-01')
),
npi_group_hcp_1_npi AS (
SELECT
encounter_key,
hcp_1_npi AS npi
FROM tq_intermediate.external_komodo.medical_headers
WHERE hcp_1_npi IS NOT NULL
AND statement_from > DATE('2023-01-01')
AND hcp_1_npi IN (SELECT individual_npi FROM physician_group_npis)
),
npi_group_hcp_2_npi AS (
SELECT
encounter_key,
hcp_2_npi AS npi
FROM tq_intermediate.external_komodo.medical_headers
WHERE hcp_2_npi IS NOT NULL
AND statement_from > DATE('2023-01-01')
AND hcp_2_npi IN (SELECT individual_npi FROM physician_group_npis)
),
npi_group_hcp_3_npi AS (
SELECT
encounter_key,
hcp_3_npi AS npi
FROM tq_intermediate.external_komodo.medical_headers
WHERE hcp_3_npi IS NOT NULL
AND statement_from > DATE('2023-01-01')
AND hcp_3_npi IN (SELECT individual_npi FROM physician_group_npis)
),
npi_group_hcp_4_npi AS (
SELECT
encounter_key,
hcp_4_npi AS npi
FROM tq_intermediate.external_komodo.medical_headers
WHERE hcp_4_npi IS NOT NULL
AND statement_from > DATE('2023-01-01')
AND hcp_4_npi IN (SELECT individual_npi FROM physician_group_npis)
),
npi_group_hcp_s_1_npi AS (
SELECT
encounter_key,
hcp_s_1_npi AS npi
FROM tq_intermediate.external_komodo.medical_service_lines
WHERE hcp_s_1_npi IS NOT NULL
AND service_from > DATE('2023-01-01')
AND hcp_s_1_npi IN (SELECT individual_npi FROM physician_group_npis)
),
npi_group_hcp_s_2_npi AS (
SELECT
encounter_key,
hcp_s_2_npi AS npi
FROM tq_intermediate.external_komodo.medical_service_lines
WHERE hcp_s_2_npi IS NOT NULL
AND service_from > DATE('2023-01-01')
AND hcp_s_2_npi IN (SELECT individual_npi FROM physician_group_npis)
),
npi_group_hco_s_1_npi AS (
SELECT
encounter_key,
hco_s_1_npi AS npi
FROM tq_intermediate.external_komodo.medical_service_lines
WHERE hco_s_1_npi IS NOT NULL
AND service_from > DATE('2023-01-01')
),

final_union AS (
SELECT DISTINCT *
FROM (
SELECT * FROM npi_group_hco_1_npi
UNION ALL
SELECT * FROM npi_group_hcp_1_npi
UNION ALL
SELECT * FROM npi_group_hcp_2_npi
UNION ALL
SELECT * FROM npi_group_hcp_3_npi
UNION ALL
SELECT * FROM npi_group_hcp_4_npi
UNION ALL
SELECT * FROM npi_group_hcp_s_1_npi
UNION ALL
SELECT * FROM npi_group_hcp_s_2_npi
UNION ALL
SELECT * FROM npi_group_hco_s_1_npi
)
),

with_npi_type AS (
SELECT
fu.encounter_key,
fu.npi,
pd.npi_type,
pd.state
FROM final_union fu
INNER JOIN (
SELECT npi, ANY_VALUE(entity_type_code) AS npi_type, ANY_VALUE(provider_mailing_state) AS state
FROM tq_production.spines.spines_provider_cms_nppes
GROUP BY npi
) pd
ON fu.npi = pd.npi
WHERE pd.npi_type IN ('1','2')
),

npi_hcp AS (
SELECT *
FROM with_npi_type
WHERE npi_type = '1'
),
npi_hco AS (
SELECT *
FROM with_npi_type
WHERE npi_type = '2'
AND npi IN (
SELECT npi
FROM tq_production.spines.spines_provider_hospitals
UNION
SELECT npi
FROM tq_production.spines.spines_provider_hospitals_additional_npis
)
)
SELECT
npi_hcp.npi AS hcp_npi,
ANY_VALUE(npi_hcp.state) AS hcp_state,
npi_hco.npi AS hco_npi,
ANY_VALUE(npi_hco.state) AS hco_state,
count(DISTINCT npi_hcp.encounter_key) AS n_encounters
FROM npi_hcp
JOIN npi_hco
ON npi_hcp.encounter_key = npi_hco.encounter_key
GROUP BY npi_hcp.npi, npi_hco.npi

Step 5: Health System-HCO Bridge

Maps health systems to their HCO NPIs by joining the spines provider table with the hospitals additional NPIs table. Only includes providers that belong to a known health system.

Output table: tq_dev.internal_dev_csong_graph.hs_hco_bridge_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.hs_hco_bridge_2025_07 AS
SELECT
DISTINCT
cr.provider_healthsystem_id as health_system_id,
cr.provider_healthsystem_parent_id as health_system_parent_id,
cr.provider_healthsystem_name as health_system_name,
cr.provider_healthsystem_parent_name as health_system_parent_name,
cr.provider_state as state,
upt.npi as hco_npi
FROM tq_production.spines.spines_provider cr
JOIN tq_production.spines.spines_provider_hospitals_additional_npis upt
ON cr.provider_id = CAST(upt.provider_id AS VARCHAR)
WHERE provider_healthsystem_id IS NOT NULL

Phase 2: HCP Similarity Analysis

Similarity Template

All four similarity tables use the same Jinja template (hcp_similarity.sql). The template computes set-overlap metrics (Jaccard similarity, containment) between two bridge tables by comparing their HCP NPI arrays. The template accepts:

  • left_table / right_table -- which bridge tables to compare
  • left_id / right_id -- the entity ID columns from each bridge
  • table_name -- the output table name (without version suffix)

For each pair of entities that share at least one HCP NPI, it computes:

  • count_intersection -- number of shared NPIs
  • count_union -- total unique NPIs across both
  • jaccard -- Jaccard similarity index
  • containment_l / containment_r -- containment ratios (intersection / left or right set size)
  • hcp_npi_intersection -- the actual shared NPI array

Step 6a: PAC-HCO Similarity

Compares PAC groups (from pac_hcp_bridge) with HCOs (from hco_hcp_bridge) to find physician groups that share doctors with hospitals.

Output table: tq_dev.internal_dev_csong_graph.pac_hco_similarity_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.pac_hco_similarity_2025_07 AS
WITH
t1 AS (
SELECT
group_pac_id,
array_agg(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07

GROUP BY 1
),
t2 AS (
SELECT
hco_npi,
ARRAY_AGG(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.hco_hcp_bridge_2025_07

WHERE n_encounters > 0

GROUP BY 1
)
SELECT
t1.group_pac_id as group_pac_id_l,
t2.hco_npi as hco_npi_r,
CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) AS count_intersection,
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS count_union,
CARDINALITY(t1.npi_array) AS count_l,
CARDINALITY(t2.npi_array) AS count_r,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS jaccard,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t1.npi_array) AS containment_l,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t2.npi_array) AS containment_r,
ARRAY_INTERSECT(t1.npi_array, t2.npi_array) AS hcp_npi_intersection
FROM t1
CROSS JOIN t2
WHERE ARRAYS_OVERLAP(t1.npi_array, t2.npi_array) = True
AND t1.group_pac_id != t2.hco_npi

Step 6b: PAC-TIN Similarity

Compares PAC groups with TINs to find physician groups that share doctors with the same tax entities.

Output table: tq_dev.internal_dev_csong_graph.pac_tin_similarity_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.pac_tin_similarity_2025_07 AS
WITH
t1 AS (
SELECT
group_pac_id,
array_agg(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07

GROUP BY 1
),
t2 AS (
SELECT
tin,
ARRAY_AGG(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.tin_hcp_bridge_2025_07

WHERE hcp_npi IN (
SELECT distinct hcp_npi
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07
)
AND n_payers > 3

GROUP BY 1
)
SELECT
t1.group_pac_id as group_pac_id_l,
t2.tin as tin_r,
CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) AS count_intersection,
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS count_union,
CARDINALITY(t1.npi_array) AS count_l,
CARDINALITY(t2.npi_array) AS count_r,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS jaccard,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t1.npi_array) AS containment_l,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t2.npi_array) AS containment_r,
ARRAY_INTERSECT(t1.npi_array, t2.npi_array) AS hcp_npi_intersection
FROM t1
CROSS JOIN t2
WHERE ARRAYS_OVERLAP(t1.npi_array, t2.npi_array) = True
AND t1.group_pac_id != t2.tin

Step 6c: PAC-PAC Similarity

Self-join of PAC groups to find physician groups that share doctors with each other.

Output table: tq_dev.internal_dev_csong_graph.pac_pac_similarity_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.pac_pac_similarity_2025_07 AS
WITH
t1 AS (
SELECT
group_pac_id,
array_agg(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07

GROUP BY 1
),
t2 AS (
SELECT
group_pac_id,
ARRAY_AGG(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07

GROUP BY 1
)
SELECT
t1.group_pac_id as group_pac_id_l,
t2.group_pac_id as group_pac_id_r,
CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) AS count_intersection,
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS count_union,
CARDINALITY(t1.npi_array) AS count_l,
CARDINALITY(t2.npi_array) AS count_r,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS jaccard,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t1.npi_array) AS containment_l,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t2.npi_array) AS containment_r,
ARRAY_INTERSECT(t1.npi_array, t2.npi_array) AS hcp_npi_intersection
FROM t1
CROSS JOIN t2
WHERE ARRAYS_OVERLAP(t1.npi_array, t2.npi_array) = True
AND t1.group_pac_id != t2.group_pac_id

Step 6d: HCO-TIN Similarity

Compares HCOs (from hco_hcp_bridge) with TINs (from tin_hcp_bridge) to find hospitals that share doctors with the same tax entities.

Output table: tq_dev.internal_dev_csong_graph.hco_tin_similarity_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.hco_tin_similarity_2025_07 AS
WITH
t1 AS (
SELECT
hco_npi,
array_agg(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.hco_hcp_bridge_2025_07

GROUP BY 1
),
t2 AS (
SELECT
tin,
ARRAY_AGG(DISTINCT hcp_npi) as npi_array
FROM tq_dev.internal_dev_csong_graph.tin_hcp_bridge_2025_07

WHERE hcp_npi IN (
SELECT distinct hcp_npi
FROM tq_dev.internal_dev_csong_graph.hco_hcp_bridge_2025_07
)
AND n_payers > 3

GROUP BY 1
)
SELECT
t1.hco_npi as hco_npi_l,
t2.tin as tin_r,
CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) AS count_intersection,
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS count_union,
CARDINALITY(t1.npi_array) AS count_l,
CARDINALITY(t2.npi_array) AS count_r,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(ARRAY_UNION(t1.npi_array, t2.npi_array)) AS jaccard,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t1.npi_array) AS containment_l,
1.000 * CARDINALITY(ARRAY_INTERSECT(t1.npi_array, t2.npi_array)) /
CARDINALITY(t2.npi_array) AS containment_r,
ARRAY_INTERSECT(t1.npi_array, t2.npi_array) AS hcp_npi_intersection
FROM t1
CROSS JOIN t2
WHERE ARRAYS_OVERLAP(t1.npi_array, t2.npi_array) = True
AND t1.hco_npi != t2.tin

Phase 3: Analysis

Step 7: PAC Connections

Aggregates PAC-to-HCO and PAC-to-TIN similarity results, filtering by Jaccard/containment thresholds and minimum intersection count. Produces a single row per PAC with its HCO neighbors, TIN neighbors, and combined HCP neighbor arrays.

Output table: tq_dev.internal_dev_csong_graph.pac_connections_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.pac_connections_2025_07 AS
WITH
hco_hcp_bridge AS (
SELECT hco_npi, ARRAY_AGG(DISTINCT hcp_npi) as hco_hcp_npis
FROM tq_dev.internal_dev_csong_graph.hco_hcp_bridge_2025_07
GROUP BY 1
),
tin_hcp_bridge AS (
SELECT tin, ARRAY_AGG(DISTINCT hcp_npi) as tin_hcp_npis
FROM tq_dev.internal_dev_csong_graph.tin_hcp_bridge_2025_07
GROUP BY 1
),
hco AS (
SELECT
group_pac_id_l as pac_id,
ARRAY_AGG(hco_npi_r) as hco_npi,
ARRAY_DISTINCT(FLATTEN(ARRAY_AGG(hco_hcp_npis))) as hco_hcp_npis
FROM tq_dev.internal_dev_csong_graph.pac_hco_similarity_2025_07 hco
LEFT JOIN hco_hcp_bridge
ON hco.hco_npi_r = hco_hcp_bridge.hco_npi
WHERE
(
jaccard > 0.3
OR
(
CASE
WHEN containment_l > containment_r THEN containment_l > 0.5
WHEN containment_r > containment_l THEN containment_r > 0.5
ELSE False
END
)
)
AND count_intersection > 50
GROUP BY 1
),
tin AS (
SELECT
group_pac_id_l as pac_id,
ARRAY_AGG(tin_r) as tin,
ARRAY_DISTINCT(FLATTEN(ARRAY_AGG(tin_hcp_npis))) as tin_hcp_npis
FROM tq_dev.internal_dev_csong_graph.pac_tin_similarity_2025_07 tin
LEFT JOIN tin_hcp_bridge
ON tin.tin_r = tin_hcp_bridge.tin
WHERE (
jaccard > 0.3
OR (
CASE
WHEN containment_l > containment_r THEN containment_l > 0.5
WHEN containment_r > containment_l THEN containment_r > 0.5
ELSE False
END
)
)
AND count_intersection > 50
GROUP BY 1
)
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
COALESCE(hco.pac_id, tin.pac_id)
AS varbinary
))) AS VARCHAR) as provider_id,
COALESCE(hco.pac_id, tin.pac_id) as pac_id,
pg.provider_name AS pac_name,
hco.hco_npi as hco_neighbors,
tin.tin as tin_neighbors,
hco.hco_hcp_npis,
tin.tin_hcp_npis,
ARRAY_UNION(COALESCE(hco.hco_hcp_npis, ARRAY[]), COALESCE(tin.tin_hcp_npis, ARRAY[])) as hcp_neighbors
FROM hco
FULL OUTER JOIN tin
ON hco.pac_id = tin.pac_id
LEFT JOIN tq_dev.internal_dev_csong_graph.physician_groups_2025_07 pg
ON COALESCE(hco.pac_id, tin.pac_id) = pg.group_pac_id

Step 8: NetworkX Graph

This step builds a NetworkX graph in Python by querying four edge sets from Trino. The graph is serialized as a pickle file and uploaded to S3.

The four SQL queries executed by graph_utils.py get_sql_queries():

Query 1 -- PAC-HCO edges:

SELECT pac_id, hco_neighbor as hco
FROM tq_dev.internal_dev_csong_graph.pac_connections_2025_07,
unnest(hco_neighbors) as t(hco_neighbor)
WHERE hco_neighbor IS NOT NULL

Query 2 -- PAC-TIN edges:

SELECT pac_id, tin_neighbor as tin
FROM tq_dev.internal_dev_csong_graph.pac_connections_2025_07,
unnest(tin_neighbors) as t(tin_neighbor)
WHERE tin_neighbor IS NOT NULL

Query 3 -- HCO-TIN edges (filtered by similarity thresholds):

SELECT hco_npi_l as hco, tin_r as tin
FROM tq_dev.internal_dev_csong_graph.hco_tin_similarity_2025_07
WHERE (
jaccard > 0.3
OR
(
CASE
WHEN containment_l > containment_r THEN containment_l > 0.5
WHEN containment_r > containment_l THEN containment_r > 0.5
ELSE False
END
)
)
AND count_intersection > 50

Query 4 -- Health System-HCO edges:

SELECT health_system_id, health_system_name, state, hco_npi as hco
FROM tq_dev.internal_dev_csong_graph.hs_hco_bridge_2025_07
WHERE health_system_id IS NOT NULL AND hco_npi IS NOT NULL

After executing these queries, the Python code:

  1. Adds edges with edge_type attributes (pac_hco, pac_tin, hco_tin, hs_hco)
  2. Sets node_type attributes on all nodes based on prefixes (pac, hco, tin, health_system)
  3. Saves the graph to S3 as provider_network_graph_2025_07.pkl

Step 9: PAC to Health System Associations

This is a Python/NetworkX step with no SQL template. The PACToHSFinder class loads the serialized graph from S3 and performs depth-first search from each PAC node to find paths to health system nodes.

Allowed path patterns:

  • PAC -> HCO -> Health System
  • PAC -> TIN -> HCO -> Health System

Output table: tq_dev.internal_dev_csong_graph.pac_to_hs_associations_2025_07

Output schema:

ColumnType
pac_idVARCHAR
hs_idVARCHAR
graph_based_labelVARCHAR (Single Affiliated or Multiple Affiliated)
path_idxINTEGER
path_nodesARRAY(VARCHAR)
node_typesARRAY(VARCHAR)
edge_typesARRAY(VARCHAR)
patternVARCHAR (e.g. PAC->hco->health_system)
hs_nodes_strVARCHAR (comma-separated list of all HS IDs for the PAC)

Step 10: Final Output

This step loads four datasets from Trino and merges them in Python (pandas) to produce the final pg_hs_affiliations table.

Query 1 -- Load PAC associations:

SELECT pac_id, hs_nodes_str, graph_based_label, path_nodes, edge_types, pattern
FROM tq_dev.internal_dev_csong_graph.pac_to_hs_associations_2025_07

Query 2 -- Load Ryan's health systems reference data:

SELECT health_system_id, pecos_id, health_system_name
FROM tq_intermediate.cld_utils.cld_pg_ids_rkost_2025_06_18

Query 3 -- Load PAC names and metadata:

SELECT
group_pac_id,
provider_name,
provider_state,
primary_specialty,
total_reassignments
FROM tq_dev.internal_dev_csong_graph.physician_groups_2025_07
WHERE group_pac_id IS NOT NULL

Query 4 -- Load health system names:

SELECT health_system_id, health_system_name
FROM tq_dev.internal_dev_csong_graph.hs_hco_bridge_2025_07

Merge logic:

  1. Outer join PAC associations with Ryan's health systems on pac_id = pecos_id
  2. Left join with PAC names on pac_id = group_pac_id
  3. Left join with health system names on hs_nodes_str = health_system_id
  4. Fill missing health_system_name from Ryan's data; fill missing pac_id from Ryan's pecos_id
  5. Deduplicate on (pac_id, hs_nodes_str)

Output table: tq_dev.internal_dev_csong_graph.pg_hs_affiliations_2025_07

Output schema:

ColumnType
pac_idVARCHAR
provider_nameVARCHAR
total_reassignmentsDOUBLE
primary_specialtyVARCHAR
provider_stateVARCHAR
health_system_nameVARCHAR
hs_nodes_strVARCHAR
path_nodesARRAY(VARCHAR)
edge_typesARRAY(VARCHAR)
patternVARCHAR
graph_based_labelVARCHAR

Phase 4: Spines

Step 11: Physicians

Links physician groups to individual HCPs using PECOS data only (Definitive is excluded as overly inclusive). Adds taxonomy details for each physician.

Output table: tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_physicians_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_physicians_2025_07 AS
WITH
taxonomy_details AS (
SELECT
code,
taxonomy_grouping,
taxonomy_classification,
taxonomy_specialization
FROM (
SELECT
taxonomy_code as code,
taxonomy_grouping,
taxonomy_classification,
taxonomy_specialization,
ROW_NUMBER() OVER (PARTITION BY taxonomy_code ORDER BY taxonomy_nucc_date DESC) AS rank
FROM tq_production.spines.spines_provider_nucc_taxonomy_codes
)
WHERE rank = 1
),
physician_primary_specialty_bridge AS (
SELECT hcp_npi, ANY_VALUE(physician_primary_specialty) as physician_primary_specialty
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07
GROUP BY hcp_npi
)
SELECT
DISTINCT
CAST(from_big_endian_64(xxhash64(CAST(
group_pac_id
AS varbinary
))) AS VARCHAR) as provider_id,
group_pac_id,
hcp_npi,
CAST(from_big_endian_64(xxhash64(CAST(
hcp_npi
AS varbinary
))) AS VARCHAR) as physician_provider_id,
'PECOS' as source,
ppsb.physician_primary_specialty,
spc.primary_taxonomy_code,
td.taxonomy_grouping AS primary_taxonomy_grouping,
td.taxonomy_classification AS primary_taxonomy_classification,
td.taxonomy_specialization AS primary_taxonomy_specialization
FROM tq_dev.internal_dev_csong_graph.pac_hcp_bridge_2025_07 p
LEFT JOIN physician_primary_specialty_bridge ppsb USING (hcp_npi)
LEFT JOIN tq_production.spines.spines_provider_cms_nppes spc ON hcp_npi = spc.npi
LEFT JOIN taxonomy_details td ON spc.primary_taxonomy_code = td.code

Step 12: Physician Groups Spine

The main physician groups spine table. Joins the base physician groups table with physicians, taxonomy, specialty, and Definitive Healthcare geographic data. This step depends on Step 11 (physicians) completing first.

Output table: tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_2025_07 AS
WITH
physicians AS (
SELECT
provider_id,
ARRAY_AGG(DISTINCT hcp_npi) AS hcp_npi_array
FROM tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_physicians_2025_07
GROUP BY 1
),
top_specialties AS (
SELECT
group_pac_id,
individual_specialty_description as provider_specialty
FROM (
SELECT
group_pac_id,
individual_specialty_description,
COUNT(*) AS specialty_count,
ROW_NUMBER() OVER (
PARTITION BY group_pac_id
ORDER BY COUNT(*) DESC
) AS rn
FROM tq_raw.cms.pecos_provider_revalidation_reassignment
WHERE individual_specialty_description IN (
SELECT individual_specialty_description
FROM tq_dev.internal_dev_csong_graph.pecos_provider_type_mappings
WHERE provider_type = 'Specialist'
)
GROUP BY group_pac_id, individual_specialty_description
)
WHERE rn = 1
),
definitive AS (
SELECT
group_practice_pac_id AS group_pac_id,
hq_address AS provider_address,
hq_city AS provider_city,
hq_zip_code AS provider_zip5,
SUBSTRING(hq_zip_code, 1, 3) AS provider_zip3,
hq_county AS provider_county,
SUBSTRING(cbsa.cbsa_id, 1, 5) AS provider_cbsa_code,
cbsa.cbsa_name AS provider_cbsa_name,
hospital_id AS dh_hospital_id,
website AS provider_website,
hq_latitude AS provider_latitude,
hq_longitude AS provider_longitude
FROM tq_production.definitive_healthcare.physician_groups_overview pgo
LEFT JOIN tq_production.spines.spines_geo_cbsa cbsa
ON SUBSTRING(pgo.cbsa_code, 1, 5) = cbsa.cbsa_id
),
top_taxonomy AS (
SELECT
provider_id,
primary_taxonomy_code,
primary_taxonomy_grouping,
primary_taxonomy_classification,
primary_taxonomy_specialization
FROM (
SELECT
provider_id,
primary_taxonomy_code,
primary_taxonomy_grouping,
primary_taxonomy_classification,
primary_taxonomy_specialization,
COUNT(*) AS taxonomy_count,
ROW_NUMBER() OVER (
PARTITION BY provider_id
ORDER BY COUNT(*) DESC
) AS rn
FROM tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_physicians_2025_07
WHERE primary_taxonomy_code IS NOT NULL
GROUP BY provider_id, primary_taxonomy_code, primary_taxonomy_grouping,
primary_taxonomy_classification, primary_taxonomy_specialization
)
WHERE rn = 1
)
SELECT
provider_id,
primary_npis,
group_pac_id,
provider_source,
provider_name,
provider_state,
provider_all_states,
'Physician Group' AS provider_type,
CASE
WHEN provider_source = 'PECOS' THEN 'Highly Likely'
WHEN provider_source = 'Definitive' AND primary_specialty = 'Pediatric Medicine' THEN 'Likely'
WHEN provider_source = 'Definitive' THEN 'Possible'
END AS provider_type_rating,
primary_specialty as provider_subtype,
CASE
WHEN primary_specialty = 'Specialist'
THEN provider_specialty
END as provider_specialty,
primary_specialty_count as provider_subtype_count,
hcp_npi_array,
provider_address,
provider_city,
provider_zip5,
provider_zip3,
provider_county,
provider_cbsa_code,
provider_cbsa_name,
dh_hospital_id,
provider_website,
provider_latitude,
provider_longitude,
primary_taxonomy_code,
primary_taxonomy_grouping,
primary_taxonomy_classification,
primary_taxonomy_specialization,
total_reassignments,
total_family_internal_medicine,
total_hospitalist,
total_emergency_medicine,
total_radiologist,
total_specialist,
total_nurse_physician_assistant,
total_anesthesia_pain,
total_behavioral_health,
total_womens_health_ob_gyn,
total_therapy_rehab,
total_dental_oral,
total_optometry,
total_allied_alternative,
total_other,
created_at,
updated_at
FROM tq_dev.internal_dev_csong_graph.physician_groups_2025_07
LEFT JOIN physicians USING (provider_id)
LEFT JOIN top_taxonomy USING (provider_id)
LEFT JOIN top_specialties USING (group_pac_id)
LEFT JOIN definitive USING (group_pac_id)

Step 13: Additional NPIs

Links physician group PAC IDs to their NPIs and associated NPIs from Definitive Healthcare. Excludes NPIs that belong to Hospital or ASC provider types.

Output table: tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_additional_npis_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_additional_npis_2025_07 AS
WITH
definitive AS (
SELECT
group_practice_pac_id AS pac_id,
CAST(npi_number AS VARCHAR) AS npi,
'Definitive' as npi_source
FROM tq_production.definitive_healthcare.physician_groups_overview
),
definitive_associated AS (
SELECT
p2.group_practice_pac_id AS pac_id,
CAST(p.npi_number AS VARCHAR) AS npi,
'Definitive' as npi_source
FROM tq_production.definitive_healthcare.physician_groups_associated_npi_numbers p
JOIN tq_production.definitive_healthcare.physician_groups_associated_group_practice_pac_ids p2
ON p.hospital_id = p2.hospital_id
),
unioned AS (
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
pac_id
AS varbinary
))) AS VARCHAR) as provider_id,
npi,
npi_source,
True as primary_npi,
CURRENT_TIMESTAMP AS created_at,
CURRENT_TIMESTAMP AS updated_at
FROM definitive
UNION
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
pac_id
AS varbinary
))) AS VARCHAR) as provider_id,
npi,
npi_source,
False as primary_npi,
CURRENT_TIMESTAMP AS created_at,
CURRENT_TIMESTAMP AS updated_at
FROM definitive_associated
),
deduplicated AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY provider_id, npi
ORDER BY primary_npi DESC
) AS rn
FROM unioned
)
SELECT
u.provider_id,
u.npi,
u.npi_source,
spc.primary_taxonomy_code,
spc.primary_taxonomy_grouping,
u.primary_npi,
u.created_at,
u.updated_at
FROM deduplicated u
LEFT JOIN tq_production.spines.spines_provider_cms_nppes spc
ON u.npi = spc.npi
LEFT JOIN (
SELECT npi
FROM tq_production.spines.spines_provider
WHERE provider_type IN ('Hospital', 'ASC')
) sp
ON u.npi = sp.npi
WHERE sp.npi IS NULL
AND u.rn = 1

Step 14: Associated TINs

Links physician groups to TINs via PAC-TIN similarity, filtered by Jaccard/containment thresholds. Includes an additional filter allowing smaller intersections if they represent at least 25% of the group's total reassignments.

Output table: tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_associated_tins_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_associated_tins_2025_07 AS
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
group_pac_id_l
AS varbinary
))) AS VARCHAR) as provider_id,
group_pac_id_l as group_pac_id,
tin_r as tin,
'Turquoise' as tin_source,
count_intersection,
count_union,
jaccard,
containment_l as containment_pg,
containment_r as containment_tin,
count_l as count_pg,
count_r as count_tin,
hcp_npi_intersection,
total_reassignments,
CURRENT_TIMESTAMP AS created_at,
CURRENT_TIMESTAMP AS updated_at
FROM tq_dev.internal_dev_csong_graph.pac_tin_similarity_2025_07 pts
LEFT JOIN tq_dev.internal_dev_csong_graph.physician_groups_2025_07 pg
ON pts.group_pac_id_l = pg.group_pac_id
WHERE
(
jaccard > 0.3
OR
(
CASE
WHEN containment_l > containment_r THEN containment_l > 0.5
WHEN containment_r > containment_l THEN containment_r > 0.5
ELSE False
END
)
) AND (
count_intersection > 50
OR
count_intersection >= total_reassignments / 4
)

Step 15: Associated Health Systems

Links physician groups to health systems by unnesting the comma-separated hs_nodes_str from the final affiliations table.

Output table: tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_associated_health_systems_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_associated_health_systems_2025_07 AS
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
pac_id
AS varbinary
))) AS VARCHAR) as provider_id,
pac_id as group_pac_id,
t.hs_id as health_system_id,
graph_based_label as graph_based_affiliation_type,
path_nodes,
edge_types,
pattern,
CURRENT_TIMESTAMP AS created_at,
CURRENT_TIMESTAMP AS updated_at
FROM tq_dev.internal_dev_csong_graph.pg_hs_affiliations_2025_07,
UNNEST(SPLIT(hs_nodes_str, ', ')) as t(hs_id)

Step 16: Associated Facilities

Links physician groups to hospital facilities via PAC-HCO similarity, filtered by the same Jaccard/containment thresholds. Resolves facility provider_id from both the hospitals spine and additional NPIs table.

Output table: tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_associated_facilities_2025_07

CREATE TABLE tq_dev.internal_dev_csong_graph.spines_provider_physician_groups_associated_facilities_2025_07 AS
SELECT
DISTINCT
CAST(from_big_endian_64(xxhash64(CAST(
p.group_pac_id_l
AS varbinary
))) AS VARCHAR) as provider_id,
p.group_pac_id_l as group_pac_id,
p.hco_npi_r as facility_npi,
COALESCE(h.provider_id, an.provider_id) as facility_provider_id
FROM tq_dev.internal_dev_csong_graph.pac_hco_similarity_2025_07 p
LEFT JOIN tq_production.spines.spines_provider_hospitals h
ON h.npi = p.hco_npi_r
LEFT JOIN tq_production.spines.spines_provider_hospitals_additional_npis an
ON an.npi = p.hco_npi_r
LEFT JOIN tq_dev.internal_dev_csong_graph.physician_groups_2025_07 pg
ON p.group_pac_id_l = pg.group_pac_id
WHERE COALESCE(h.provider_id, an.provider_id) IS NOT NULL
AND
(
jaccard > 0.3
OR
(
CASE
WHEN containment_l > containment_r THEN containment_l > 0.5
WHEN containment_r > containment_l THEN containment_r > 0.5
ELSE False
END
)
) AND (
count_intersection > 50
OR
count_intersection >= total_reassignments / 4
)