Pipeline DAG
The pipeline is implemented as the Airflow DAG
data_science/ssp/ssp_pricing.
The sections below describe its stages, its parameters, and the
task-level topology that ties them together.
DAG Stages
Six Airflow TaskGroups run in sequence. Each stage is data-dependent
on those above it, with branch-level wiring shown under
Cross-stage topology.
| # | Stage (TaskGroup) | Purpose | Tasks |
|---|---|---|---|
| 1 | pre_build | SSP groupings + RII tier multipliers, per-line-code average units, 1%-sampled encounter volume | ssp_groupings, rii_tier_mapping, avg_line_code_units_inst_enc, avg_line_code_units_all_enc, avg_line_code_units, encounter_matches_op, encounter_matches_ip, encounter_volume |
| 2 | claims | Anchor-encounter discovery, claims-derived institutional + professional line codes, revenue-code family proportions | op_anchors, ip_anchors, excluded, discover_institutional, discover_professional, combined, rc_family |
| 3 | line_codes | Enriched institutional + professional line-code tables, service-type classification, ancillary encounter proportions, NCCI grouping | institutional, service_types, ancillary, professional, ncci |
| 4 | fee_schedules | Institutional + professional fee schedules, RC-family allocation, combined SSP / subcategory prices, multiple-procedure inserts | inst_fee, rc_alloc, prof_fee, prices_sub, prices_ssp, combo |
| 5 | export | Versioned export_* tables ready for downstream products | export_ssp_descriptions, export_ssp_prices, export_facility_rc_families, export_professional_line_items, export_professional_conveners, export_sub_category, export_all_line_items, export_metadata |
| 6 | latest | Per-version upserts into generic-library latest_* tables — each task deletes rows for the current ssp_version then re-inserts from the matching export_* | latest_ssp_descriptions, latest_ssp_prices, latest_facility_rc_families, latest_professional_line_items, latest_professional_conveners, latest_sub_category, latest_all_line_items, latest_metadata |
Each stage writes to {schema}.<name>_<table_version> (default:
_v3). No cross-schema writes.
DAG Parameters
All pipeline constants are exposed as Airflow DAG params. Defaults
live in
__init__.py
and are passed to every SQL template via Jinja.
| Param | Default | Meaning |
|---|---|---|
schema | tq_dev.internal_dev_csong_ssp | Target schema for all intermediate + export tables |
schema_cld | tq_dev.internal_dev_csong_cld_v2_4_3 | Clear Rates source schema for commercial benchmarks |
table_version | v3 | Suffix on every output table (e.g. institutional_fee_schedule_v3) |
ssp_version | v3 | Written into export_metadata; tags rows in every latest_* table |
pipeline_date | 2026_04_13 | Written into export_metadata |
base_rate | 500 | Denominator for relative weights |
encounter_threshold | 0.3 | Minimum encounter-association rate for claims-derived line codes |
assistant_surgeon_factor | 0.16 | Assistant surgeon price = 16% of primary surgeon price |
assistant_nonsurgeon_factor | 0.136 | Assistant non-surgeon price = 13.6% of primary surgeon price |
crna_supervised_factor | 0.50 | Supervised CRNA price = 50% of full anesthesia fee |
anesthesia_exclusion_codes | '99152','99153' | HCPCS codes excluded from the anesthesia-reference-pricing match |
labpath_radiology_codes | 33-code list | HCPCS codes force-classified as Lab/Path or Radiology |
All factors mirror CMS payment policy (assistant surgeon, CRNA
medical-direction split). encounter_threshold filters infrequent
co-billed codes; base_rate is the normalizing constant for weight
output.
Cross-stage topology
start
└── pre_build.ssp_groupings
├── pre_build.rii_tier_mapping
├── pre_build.encounter_matches_op ──┐
├── pre_build.encounter_matches_ip ──┤
│ └─► pre_build.encounter_volume ─► export.ssp_descriptions
├── pre_build.avg_line_code_units_inst_enc
│ └── pre_build.avg_line_code_units_all_enc
│ └── pre_build.avg_line_code_units ─┐
│ │
├── claims.op_anchor_encounters ─┐ │
├── claims.ip_anchor_encounters ─┤ │
│ ├── claims.excluded_line_codes
│ │ └── claims.discover_institutional
│ │ └── claims.discover_professional
│ │ └── claims.combined ◄─────┘
│ │ └── line_codes.institutional
│ │ └── line_codes.service_types
│ │ └── line_codes.ancillary
│ │ └── line_codes.professional
│ │ └── line_codes.ncci
│ └── claims.rc_family
│ └── fee_schedules.rc_alloc
│ └── fee_schedules.professional (also needs ncci)
│ └── fee_schedules.prices_sub
│ └── fee_schedules.prices_ssp
│ └── fee_schedules.combo
│ └── export.*
│ └── latest.*
│ └── end
└── line_codes.institutional → fee_schedules.inst_fee → fee_schedules.rc_alloc
Key dependencies worth calling out:
line_codes.professionalneeds both the ancillary encounter proportions (Stage 3) and the institutional RC-family allocation from Stage 4. The Makefile in the legacy repo ran stages in this order; the DAG preserves it via an explicitrc_alloc >> professionaledge.fee_schedules.professionalrequiresncci(for NCCI-aware aggregation) — the edge is explicit in the DAG.export_ssp_descriptionsneeds bothencounter_volume(sampled per-SSP volumes) andfee_schedules.combo(combo SSPs need to be in the combined tables before they show up in descriptions).
Table dependency matrix
Each row is an output table; columns mark its direct upstream reads.
| Output | ssp_groupings | avg_line_code_units | op/ip_anchor_encounters | supplemented_sub_package_contents | revenue_code_family_proportions | institutional_line_codes | ancillary_encounter_proportions | ssp_line_code_service_types | professional_line_codes | professional_line_code_ncci_groups | institutional_fee_schedule | institutional_rc_family_allocation | professional_fee_schedule | combined_subcategory_fee_schedule | combined_ssp_fee_schedule |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rii_tier_mapping | X | ||||||||||||||
| avg_line_code_units_inst_enc | X | ||||||||||||||
| avg_line_code_units_all_enc | X | X | |||||||||||||
| avg_line_code_units | |||||||||||||||
| ssp_encounter_matches_op/ip | X | ||||||||||||||
| ssp_encounter_volume | |||||||||||||||
| ssp_op_anchor_encounters | X | ||||||||||||||
| ssp_ip_anchor_encounters | X | ||||||||||||||
| excluded_line_codes | |||||||||||||||
| manual_institutional_line_codes | X | X | |||||||||||||
| manual_professional_line_codes | X | X | X | ||||||||||||
| supplemented_sub_package_contents | X | X | |||||||||||||
| revenue_code_family_proportions | X | X | |||||||||||||
| institutional_line_codes | X | X | X | ||||||||||||
| ssp_line_code_service_types | X | X | |||||||||||||
| ancillary_encounter_proportions | X | X | |||||||||||||
| professional_line_codes | X | X | X | X | |||||||||||
| professional_line_code_ncci_groups | X | ||||||||||||||
| institutional_fee_schedule | X | ||||||||||||||
| institutional_rc_family_allocation | X | X | |||||||||||||
| professional_fee_schedule | X | X | |||||||||||||
| combined_subcategory_fee_schedule | X | X | X | X | |||||||||||
| combined_ssp_fee_schedule | X | X | X | X | X | ||||||||||
| combo_ssps (INSERTs into combined_*) | X | X | |||||||||||||
| export_ssp_descriptions | X | X | X | ||||||||||||
| export_ssp_prices | X | X | |||||||||||||
| export_facility_rc_families | X | ||||||||||||||
| export_professional_line_items | X | X | |||||||||||||
| export_professional_conveners | X | X | |||||||||||||
| export_sub_category | X | ||||||||||||||
| export_all_line_items | (reads export_professional_line_items) | ||||||||||||||
| export_metadata | (uses pipeline params only) |
combo_sspsis an INSERT intocombined_ssp_fee_scheduleandcombined_subcategory_fee_schedule— it mutates the output tables in place rather than producing a separate artifact.
Entity relationships
Join keys used across the most important tables:
| A | Join key(s) | B |
|---|---|---|
ssp_groupings | code = base_code | supplemented_sub_package_contents |
ssp_groupings | code = anchor_code | manual_institutional_line_codes, manual_professional_line_codes |
ssp_groupings | ssp_grouper | revenue_code_family_proportions |
supplemented_sub_package_contents | base_code, line_code | institutional_line_codes |
supplemented_sub_package_contents | base_code, line_code, fee_type | professional_line_codes |
institutional_line_codes | ssp_grouper, revenue_code_family | institutional_rc_family_allocation |
institutional_rc_family_allocation | ssp_grouper, sub_category, pos, provider_id | institutional_fee_schedule, combined_subcategory_fee_schedule |
professional_line_codes | ssp_grouper, service_type, line_code | professional_line_code_ncci_groups |
institutional_fee_schedule | ssp_grouper, sub_category, pos, provider_id | combined_subcategory_fee_schedule |
professional_fee_schedule | same | combined_subcategory_fee_schedule |
ssp_descriptions (xwalk) | old_ssp_grouper, sub_category | combined_subcategory_fee_schedule (new-ID attachment) |
combined_ssp_fee_schedule | provider_id, ssp_grouper = 'GA.0.colonoscopy' vs = 'GA.0.egd' | combo SSP insert self-join |
Source tables
External inputs read by the pipeline (none are written to).
SSP definitions
| Source | Purpose |
|---|---|
{schema}.ssp_initial_pilot_codes_and_xwalk_v3 | Raw SSP crosswalk — the input to ssp_groupings |
{schema}.rii_code_tiers | Per-DRG intensity tiers for RII-based multipliers |
{schema}.ssp_descriptions_v3 | New-ID crosswalk (ssp_grouper_id, sub_category_id, descriptions) attached in the combined and export tables |
Medicare reference pricing
| Source | Purpose |
|---|---|
tq_production.reference_external.ipps_reference_pricing | Inpatient Medicare rates by MS-DRG (IPPS) |
tq_production.reference_external.opps_reference_pricing | Outpatient Medicare rates by APC (OPPS) |
tq_production.reference_external.physician_reference_pricing | Professional Medicare rates by HCPCS × state (MPFS) |
tq_production.reference_external.clinical_laboratory_reference_pricing | National CLFS fallback |
tq_production.reference_internal.anesthesia_reference_pricing | Anesthesia Medicare rates by HCPCS × state |
tq_production.reference_external.asp_reference_pricing | Average Sales Price — identifies drug carve-out codes |