Data Methodology

This page documents how PollutionLookup.com collects, joins, and serves EPA pollution data. If you are citing this data in academic research, journalism, or policy work, please reference this page and the original federal sources described below.

Data Sources

EPA Superfund National Priorities List (NPL)

The Superfund program, created by the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA), gives EPA authority to clean up the nation's worst contaminated sites. The National Priorities List is the subset of sites eligible for long-term federal-led cleanup. Our database includes 1,380 sites drawn from EPA's Superfund Site Information portal and the SEMS (Superfund Enterprise Management System) public extracts.

Source: epa.gov/superfund, SEMS public data

EPA Toxic Release Inventory (TRI)

TRI was established by the Emergency Planning and Community Right-to-Know Act of 1986. Facilities in covered industries that exceed reporting thresholds must annually disclose releases and off-site transfers of about 770 listed chemicals. Our database includes 23,331 facilities and 233,079 individual chemical-year release records pulled from the TRI Basic Data Files.

Source: TRI Basic Data Files

EPA ECHO (Enforcement and Compliance History Online)

ECHO is EPA's flagship enforcement database. It covers facilities regulated under the Clean Air Act, Clean Water Act (NPDES permits), and RCRA (hazardous waste), and tracks inspections, formal enforcement actions, penalties, and compliance status. Our database includes 285,063 ECHO facilities pulled from the ECHO Exporter files.

Source: ECHO Data Downloads

Join Strategy

EPA publishes these datasets separately and never fully joins them. We use two mechanisms to unify records so one search can return Superfund, TRI, and ECHO data together:

FRS Registry ID — EPA's Facility Registry Service assigns a stable identifier to every regulated facility. When a Superfund, TRI, and ECHO record share the same FRS Registry ID, we can confidently join them as the same physical facility.
Spatial proximity — When FRS linkage is missing (common for older TRI records and some Superfund sites), we fall back to bounding-box + haversine distance. The /api/nearby endpoint queries the raw lat/lng indexes rather than relying on the join tables.

This is intentionally conservative: we do not merge records without high-confidence linkage. You may occasionally see the same physical site listed more than once if EPA has not reconciled its own identifiers.

Processing Pipeline

Download — Raw files pulled directly from EPA. ECHO Exporter CSV, TRI Basic Data Files, Superfund SEMS extracts.
Filter — Records without valid U.S. coordinates are excluded from the map-facing tables but retained in the raw tables so text search still works.
Normalize — State codes, county names, and program flags are standardized. Coordinates are bounded to plausible U.S. values.
Join — FRS Registry IDs are used to match ECHO, TRI, and Superfund records. Where a TRI facility has a matching ECHO facility, we propagate the TRI release totals onto the ECHO record so a single facility page shows both compliance and release history.
Roll up — state_summary, county_summary, and chemical_summary tables are built via SQL aggregation for fast per-page queries without in-memory caching.

Known Limitations

TRI is self-reported. Facilities can under-report if they believe they are below reporting thresholds. TRI covers only listed chemicals at larger facilities and is a lower bound on industrial chemical pollution.
Penalty totals lag violations. A facility can be in significant noncompliance for quarters before any monetary penalty is assessed. A $0 penalty total does not mean the facility was compliant.
Coordinate accuracy varies. Superfund and ECHO coordinates are generally accurate. Some older TRI facilities have been placed at city centroids rather than specific street addresses.
State-level data gaps. Many CWA and RCRA permits are administered by state agencies that don't always report enforcement actions back to EPA on a timely basis.
Brownfields and ACRES data is not yet ingested. A future update will add EPA's Brownfields/ACRES dataset.

Update Schedule

ECHO publishes updated data weekly. TRI is updated once per year (typically summer) with the prior calendar year's releases. Superfund information is updated as EPA changes site status. Our database is rebuilt from the latest sources on a regular basis. Most recent build: 2026-04-10.

Citation

If you use this data in research or publications, we suggest:

PollutionLookup.com. (2026). U.S. pollution lookup database: combined EPA Superfund, TRI, and ECHO enforcement data. Retrieved from https://pollutionlookup.com/methodology

We also encourage citing the original EPA source datasets directly — Superfund NPL, TRI Basic Data Files, and ECHO Exporter — alongside this page.

Downloads & API

State-level CSV downloads are linked from each state page. Raw endpoints:

/api/download/{state}/echo.csv — e.g. /api/download/california/echo.csv
/api/download/{state}/superfund.csv — e.g. /api/download/california/superfund.csv

For programmatic access see the JSON API documentation.