Getting Started¶

This guide shows the fastest path to the current subway-access workflow.

Install¶

pip install subway-access

For the full plotting + network layer:

pip install "subway-access[all]"

For local development:

make install-dev

Fetch a real snapshot¶

subway-access fetch-snapshot --geography borough --value Manhattan --cache-dir cache/manhattan

This writes a reusable local cache bundle.

Analyze the cached snapshot¶

subway-access analyze-snapshot --cache-dir cache/manhattan --output-dir artifacts/manhattan

Use the Python API¶

from pathlib import Path

from subway_access import analysis, models, pipeline

snapshot = pipeline.fetch_study_area_snapshot(
    models.AccessibilityQuery(geography="borough", value="Manhattan"),
    cache_dir=Path("cache/manhattan"),
)
catchments = analysis.generate_catchments(
    snapshot.stations,
    models.CatchmentRequest(minutes=10),
)
scores = analysis.score_accessibility(
    snapshot.stations,
    catchments,
    snapshot.demographics,
)
reliability = analysis.compute_reliability(
    snapshot.stations,
    snapshot.outages,
    models.TimeWindow(days=30),
)
gaps = analysis.analyze_gaps(scores)
print(len(gaps.records), len(reliability.records))

Add The Network Layer¶

The advanced path builds on the same cached snapshot:

fetch or reuse the official MTA + ACS cache
build or reuse a cached OSM walking graph for the same study area
compare Euclidean coverage to network-based accessibility

Current methodology¶

The current flow is intentionally explicit and reproducible:

select a real study area through nyc-geo-toolkit
fetch official MTA and Census records into a local cache
load those cached records back into typed in-memory datasets
generate Euclidean walk catchments from a fixed walking speed
optionally compare that baseline against cached OSM walking graphs
compute need, reliability, and gap metrics

This is a documented first pass, not a claim of full routing realism.

Research workflow (temporal panel + causal estimators)¶

On top of the baseline snapshot/score flow, subway-access ships a research-oriented surface for changes over time — how accessibility, gap, and coverage evolve as stations gain elevators year-over-year. The core primitives live in subway_access.temporal:

from subway_access.temporal import (
    build_panel_dataset,
    build_upgrade_timeline,
    build_distance_weights,
)

# Build a (tract × year) panel from cached snapshots + known upgrade years.
# `known_upgrades` maps station_id -> upgrade year; populate from
# `load_known_upgrades[_from_dir]` or your own data source.
known_upgrades = {"S1": 2019, "S2": 2021}
timeline = build_upgrade_timeline(snapshot.stations, known_upgrades=known_upgrades)

# `vintage_estimates` is dict[year, dict[tract_id, dict[field, value]]].
# `station_locations` is dict[station_id, (lat, lon)].
panel = build_panel_dataset(
    vintage_estimates,
    station_locations,
    timeline,
    catchment_radius_meters=800.0,  # 0.5-mile walk radius
)
treatment_obs = panel.treatment_group()
control_obs = panel.control_group()

From there you can either:

run a hand-rolled DiD / OLS / Moran's I pipeline (no optional extras — numpy suffices), or
plug the panel into factor-factory for peer-reviewed causal estimators (TWFE, Sun-Abraham, synthetic-control, RDD, spatial autocorrelation) behind a single Panel + Engine contract, and render jellycell tearsheets from the results.

The full worked example is examples/accessibility-change-over-time/ — a real research artifact with a 48 KB APA-formatted case study, 15 figures, 6 tables, and an engine-audit appendix. Minimal RDD recipe: examples/factor-factory-rdd-walkthrough/.

For the integration details see factor-factory integration.