Skip to main content

Pipeline DAG

The pipeline is implemented as the Airflow DAG data_science/ssp/ssp_pricing. The sections below describe its stages, its parameters, and the task-level topology that ties them together.


DAG Stages

Six Airflow TaskGroups run in sequence. Each stage is data-dependent on those above it, with branch-level wiring shown under Cross-stage topology.

#Stage (TaskGroup)PurposeTasks
1pre_buildSSP groupings + RII tier multipliers, per-line-code average units, 1%-sampled encounter volumessp_groupings, rii_tier_mapping, avg_line_code_units_inst_enc, avg_line_code_units_all_enc, avg_line_code_units, encounter_matches_op, encounter_matches_ip, encounter_volume
2claimsAnchor-encounter discovery, claims-derived institutional + professional line codes, revenue-code family proportionsop_anchors, ip_anchors, excluded, discover_institutional, discover_professional, combined, rc_family
3line_codesEnriched institutional + professional line-code tables, service-type classification, ancillary encounter proportions, NCCI groupinginstitutional, service_types, ancillary, professional, ncci
4fee_schedulesInstitutional + professional fee schedules, RC-family allocation, combined SSP / subcategory prices, multiple-procedure insertsinst_fee, rc_alloc, prof_fee, prices_sub, prices_ssp, combo
5exportVersioned export_* tables ready for downstream productsexport_ssp_descriptions, export_ssp_prices, export_facility_rc_families, export_professional_line_items, export_professional_conveners, export_sub_category, export_all_line_items, export_metadata
6latestPer-version upserts into generic-library latest_* tables — each task deletes rows for the current ssp_version then re-inserts from the matching export_*latest_ssp_descriptions, latest_ssp_prices, latest_facility_rc_families, latest_professional_line_items, latest_professional_conveners, latest_sub_category, latest_all_line_items, latest_metadata

Each stage writes to {schema}.<name>_<table_version> (default: _v3). No cross-schema writes.


DAG Parameters

All pipeline constants are exposed as Airflow DAG params. Defaults live in __init__.py and are passed to every SQL template via Jinja.

ParamDefaultMeaning
schematq_dev.internal_dev_csong_sspTarget schema for all intermediate + export tables
schema_cldtq_dev.internal_dev_csong_cld_v2_4_3Clear Rates source schema for commercial benchmarks
table_versionv3Suffix on every output table (e.g. institutional_fee_schedule_v3)
ssp_versionv3Written into export_metadata; tags rows in every latest_* table
pipeline_date2026_04_13Written into export_metadata
base_rate500Denominator for relative weights
encounter_threshold0.3Minimum encounter-association rate for claims-derived line codes
assistant_surgeon_factor0.16Assistant surgeon price = 16% of primary surgeon price
assistant_nonsurgeon_factor0.136Assistant non-surgeon price = 13.6% of primary surgeon price
crna_supervised_factor0.50Supervised CRNA price = 50% of full anesthesia fee
anesthesia_exclusion_codes'99152','99153'HCPCS codes excluded from the anesthesia-reference-pricing match
labpath_radiology_codes33-code listHCPCS codes force-classified as Lab/Path or Radiology

All factors mirror CMS payment policy (assistant surgeon, CRNA medical-direction split). encounter_threshold filters infrequent co-billed codes; base_rate is the normalizing constant for weight output.


Cross-stage topology

start
└── pre_build.ssp_groupings
├── pre_build.rii_tier_mapping
├── pre_build.encounter_matches_op ──┐
├── pre_build.encounter_matches_ip ──┤
│ └─► pre_build.encounter_volume ─► export.ssp_descriptions
├── pre_build.avg_line_code_units_inst_enc
│ └── pre_build.avg_line_code_units_all_enc
│ └── pre_build.avg_line_code_units ─┐
│ │
├── claims.op_anchor_encounters ─┐ │
├── claims.ip_anchor_encounters ─┤ │
│ ├── claims.excluded_line_codes
│ │ └── claims.discover_institutional
│ │ └── claims.discover_professional
│ │ └── claims.combined ◄─────┘
│ │ └── line_codes.institutional
│ │ └── line_codes.service_types
│ │ └── line_codes.ancillary
│ │ └── line_codes.professional
│ │ └── line_codes.ncci
│ └── claims.rc_family
│ └── fee_schedules.rc_alloc
│ └── fee_schedules.professional (also needs ncci)
│ └── fee_schedules.prices_sub
│ └── fee_schedules.prices_ssp
│ └── fee_schedules.combo
│ └── export.*
│ └── latest.*
│ └── end
└── line_codes.institutional → fee_schedules.inst_fee → fee_schedules.rc_alloc

Key dependencies worth calling out:

  • line_codes.professional needs both the ancillary encounter proportions (Stage 3) and the institutional RC-family allocation from Stage 4. The Makefile in the legacy repo ran stages in this order; the DAG preserves it via an explicit rc_alloc >> professional edge.
  • fee_schedules.professional requires ncci (for NCCI-aware aggregation) — the edge is explicit in the DAG.
  • export_ssp_descriptions needs both encounter_volume (sampled per-SSP volumes) and fee_schedules.combo (combo SSPs need to be in the combined tables before they show up in descriptions).

Table dependency matrix

Each row is an output table; columns mark its direct upstream reads.

Outputssp_groupingsavg_line_code_unitsop/ip_anchor_encounterssupplemented_sub_package_contentsrevenue_code_family_proportionsinstitutional_line_codesancillary_encounter_proportionsssp_line_code_service_typesprofessional_line_codesprofessional_line_code_ncci_groupsinstitutional_fee_scheduleinstitutional_rc_family_allocationprofessional_fee_schedulecombined_subcategory_fee_schedulecombined_ssp_fee_schedule
rii_tier_mappingX
avg_line_code_units_inst_encX
avg_line_code_units_all_encXX
avg_line_code_units
ssp_encounter_matches_op/ipX
ssp_encounter_volume
ssp_op_anchor_encountersX
ssp_ip_anchor_encountersX
excluded_line_codes
manual_institutional_line_codesXX
manual_professional_line_codesXXX
supplemented_sub_package_contentsXX
revenue_code_family_proportionsXX
institutional_line_codesXXX
ssp_line_code_service_typesXX
ancillary_encounter_proportionsXX
professional_line_codesXXXX
professional_line_code_ncci_groupsX
institutional_fee_scheduleX
institutional_rc_family_allocationXX
professional_fee_scheduleXX
combined_subcategory_fee_scheduleXXXX
combined_ssp_fee_scheduleXXXXX
combo_ssps (INSERTs into combined_*)XX
export_ssp_descriptionsXXX
export_ssp_pricesXX
export_facility_rc_familiesX
export_professional_line_itemsXX
export_professional_convenersXX
export_sub_categoryX
export_all_line_items(reads export_professional_line_items)
export_metadata(uses pipeline params only)

combo_ssps is an INSERT into combined_ssp_fee_schedule and combined_subcategory_fee_schedule — it mutates the output tables in place rather than producing a separate artifact.


Entity relationships

Join keys used across the most important tables:

AJoin key(s)B
ssp_groupingscode = base_codesupplemented_sub_package_contents
ssp_groupingscode = anchor_codemanual_institutional_line_codes, manual_professional_line_codes
ssp_groupingsssp_grouperrevenue_code_family_proportions
supplemented_sub_package_contentsbase_code, line_codeinstitutional_line_codes
supplemented_sub_package_contentsbase_code, line_code, fee_typeprofessional_line_codes
institutional_line_codesssp_grouper, revenue_code_familyinstitutional_rc_family_allocation
institutional_rc_family_allocationssp_grouper, sub_category, pos, provider_idinstitutional_fee_schedule, combined_subcategory_fee_schedule
professional_line_codesssp_grouper, service_type, line_codeprofessional_line_code_ncci_groups
institutional_fee_schedulessp_grouper, sub_category, pos, provider_idcombined_subcategory_fee_schedule
professional_fee_schedulesamecombined_subcategory_fee_schedule
ssp_descriptions (xwalk)old_ssp_grouper, sub_categorycombined_subcategory_fee_schedule (new-ID attachment)
combined_ssp_fee_scheduleprovider_id, ssp_grouper = 'GA.0.colonoscopy' vs = 'GA.0.egd'combo SSP insert self-join

Source tables

External inputs read by the pipeline (none are written to).

SSP definitions

SourcePurpose
{schema}.ssp_initial_pilot_codes_and_xwalk_v3Raw SSP crosswalk — the input to ssp_groupings
{schema}.rii_code_tiersPer-DRG intensity tiers for RII-based multipliers
{schema}.ssp_descriptions_v3New-ID crosswalk (ssp_grouper_id, sub_category_id, descriptions) attached in the combined and export tables

Medicare reference pricing

SourcePurpose
tq_production.reference_external.ipps_reference_pricingInpatient Medicare rates by MS-DRG (IPPS)
tq_production.reference_external.opps_reference_pricingOutpatient Medicare rates by APC (OPPS)
tq_production.reference_external.physician_reference_pricingProfessional Medicare rates by HCPCS × state (MPFS)
tq_production.reference_external.clinical_laboratory_reference_pricingNational CLFS fallback
tq_production.reference_internal.anesthesia_reference_pricingAnesthesia Medicare rates by HCPCS × state
tq_production.reference_external.asp_reference_pricingAverage Sales Price — identifies drug carve-out codes

Provider and service spines

SourcePurpose
tq_production.spines.spines_provider_hospitalsprovider_id → provider_state for state-level Medicare rate selection
tq_production.spines.spines_services(+_relationships, +_clinical_categories)Clinical category labels used by the service-type waterfall
{schema}.services_spines_cleanedCleaned spines copy with ccs_category labels used by the institutional and service-type logic

Commercial benchmarks

SourcePurpose
{schema_cld}.prod_combined_abridgedClear Rates median canonical rates — Hospital PPO for institutional, Physician Group / Professional for professional
tq_dev.internal_dev_csong_sandbox.acute_care_largeEncounter volumes used as rate weights in institutional aggregation

Claims data (for anchor discovery and association rates)

SourcePurpose
{schema}.medical_headers_bothInstitutional + professional claim headers, partitioned by claim_pos and year
{schema}.medical_service_lines_bothService-line details (procedure code, revenue code, units, charges)
{schema}.komodo_op_max_pay_lineHighest-paying line per OP encounter — used as anchor
tq_intermediate.external_komodo.ipps_grouper_drgsDRG assignment per IP encounter

Coding references

SourcePurpose
tq_production.reference_legacy.ref_aapc_cpthierarchyAAPC CPT hierarchy — identifies implant / device / prosthetic codes
tq_production.reference_legacy.ref_cms_ncci_ptp_edits_practitioner_v2CMS NCCI PTP edits — mutually exclusive code pairs
{schema}.revenue_codesRevenue-code dictionary (specific code → family + description)
tq_intermediate.cld_utils.rc_to_hcpcs_cross_validated_75HCPCS → revenue-code-family fallback crosswalk
{schema}.allowed_by_rc_family_v3Komodo-derived allowed-amount association rates per RC family (used for the allowed_allocation output)