Transparency in how EPWForge generates, transforms, and presents climate and weather file data.
EPWForge generates simulation-ready weather files for building energy modeling using physically consistent methods based on CMIP6 climate projections, ERA5 reanalysis, and ASHRAE-aligned workflows.
The result is distribution-aware, physically coherent weather data that improves confidence in peak load estimation, HVAC sizing, and resilience analysis.
Each percentile corresponds to a single climate model realization across all variables, preserving coherent weather states and avoiding physically impossible combinations created by variable-by-variable ranking.
ASHRAE-style design values are computed directly from morphed 8760-hour datasets — not by applying offsets to historical design conditions.
Multiple EPW files per scenario are generated using individual CMIP6 model deltas, enabling explicit evaluation of inter-model uncertainty.
Weather files can be generated for any location on Earth using ERA5-based datasets — not limited to predefined weather stations.
The dashboard surfaces UTCI heat-stress carpets, solar resource under extreme events, and ASHRAE design-condition trajectories from 2030 to 2100 across every SSP scenario — so designs can be stress-tested directly against morphed weather without leaving the tool.
Baseline Weather (TMY / ERA5)
+
CMIP6 Climate Deltas
↓
Morphing + QA/QC + Ensembles
↓
EPW Files • Design Conditions • Extreme EventsEPWForge generates simulation-ready weather files for building energy modeling using physically consistent methods based on CMIP6 climate projections, ERA5 reanalysis, and ASHRAE-aligned workflows.
EPWForge departs from conventional weather file workflows in several important ways:
Percentiles are assigned at the model level. All variables at a given percentile originate from the same CMIP6 model realization, preserving internally consistent relationships between temperature, humidity, solar radiation, and wind. Approaches that rank each variable independently can produce physically impossible weather states (for example, very high temperature paired with very low solar radiation drawn from different models).
Future ASHRAE design values are derived directly from the full 8760-hour morphed dataset, ensuring consistency between extreme values and coincident conditions. This is materially different from delta-only methods that simply offset historical design values.
Each CMIP6 model produces a distinct EPW file, allowing simulation across the full ensemble to quantify uncertainty in loads, energy use, and thermal comfort.
In contrast to traditional delta-only approaches, EPWForge operates on full hourly distributions rather than adjusting summary statistics — preserving extremes, coincident conditions, and inter-variable relationships throughout the morphing process.
TMY data is sourced from Climate.OneBuilding.Org, derived from ASHRAE IWEC2, TMY3, and TMYx datasets. Using the de facto reference for the energy modeling community ensures compatibility and reproducibility of simulation results across projects and firms.
Infrastructure: EPWForge maintains a local mirror — GuzzStations — of 74,693 EPW files covering 17,137 stations, audited monthly against the upstream OneBuilding spreadsheet so requests never depend on third-party uptime. Catalog and library are kept in 1:1 sync; dead entries from OneBuilding's own spreadsheet drift are pruned from our station picker so users never click into a file that doesn't exist.
AMY data is generated from the ECMWF ERA5 reanalysis, which provides hourly, gap-free global coverage from the mid-20th century to present. ERA5 integrates millions of observations — surface stations, radiosondes, satellites, aircraft, and ocean buoys — into a physically consistent atmospheric state at every timestep, so all variables are internally consistent rather than spliced from disparate sources. This eliminates the missing-data and instrument-gap issues common to traditional station-based AMY files, and extends coverage to any location on Earth.
As with all gridded datasets, ERA5 represents regional atmospheric conditions and may not fully capture localized effects such as complex terrain, coastal microclimates, or dense urban environments. EPWForge addresses the urban case through its UHI adjustment system.
ERA5's coarse grid resolution (25 km) is a known weakness for solar radiation and near-surface wind, both of which benefit from satellite-derived or higher-resolution sources where they exist. EPWForge runs a multi-source router that prefers the best available source per variable per location, falling back to ERA5 when no superior source is available.
Active sources: ERA5 (global, all variables), ERA5-Land (global land surfaces, 9 km), NSRDB (US satellite-derived solar, 4 km). Sources in acquisition: CM SAF SARAH-3 (Europe / Africa / Middle East satellite-derived solar, 5 km — order placed), NREL WIND Toolkit and NEWA (high-resolution wind). Lineage of which source supplied each variable is recorded in the EPW header COMMENTS field.
EPWForge generates TMYx weather files for arbitrary locations using the Finkelstein–Schafer statistic in accordance with ISO 15927-4 — the same methodology used by OneBuilding for their published TMYx datasets — with multiple period variants available to match published TMYx products.
Validation against several hundred OneBuilding TMYx stations across all major climate zones shows strong agreement in annual temperature and solar radiation. Deviations are primarily observed in complex terrain and high-latitude regions, consistent with the known limitations of gridded reanalysis datasets.
Future weather files are generated using a Belcher et al. (2005) mean-shift methodology paired with an ERA5-derived climatological diurnal anomaly profile for the stretch term — a hybrid consistent with how modern production tools handle morphing (UKCP18 local projections, NOAA Atlas 14 for sub-daily disaggregation, WeatherShift commercial). Pure Belcher-classic stretches against the specific baseline EPW's hourly deviation from its monthly mean, which can over-amplify outlier days and over-distort low-DTR tropical baselines. EPWForge replaces that stretch target with the 45-year ERA5 climatological diurnal anomaly profile for the same grid cell, giving a smoother and more physically defensible diurnal redistribution. Climate change deltas remain CMIP6-derived and are applied across all variables to preserve inter-variable relationships.
Supported scenarios:
Each scenario is available across ten time horizons (2030, 2035, 2040, 2045, 2050, 2060, 2070, 2080, 2090, 2100) and seven percentile bands (P5–P95). The near-term 5-year cadence through 2050 supports authorities-having-jurisdiction (AHJ) starting to require 5-yr future-projection submittals for net-zero compliance. The 2100 horizon uses a 10-year averaging window (2091–2100) because the underlying CMIP6 SSP runs end at 2100.
The CMIP6 climate-deltas dataset extracts 11 variables across 34 models per scenario, per horizon, per cell. Ten of these flow into the EPW through morphEPW with the convention noted; the eleventh (DTR derived from tasmax/tasmin) drives the diurnal-stretch term:
tas) — additive monthly shift + diurnal stretch (Belcher + ERA5 anomaly template).hurs) — additive %-point shift, clamped 1–100%. Dew point recomputed from morphed T and RH via Magnus.rsds) — multiplicative ratio applied uniformly to GHI / DNI / DHI. Luminance fields (illuminance, zenith luminance) scale by the same factor to preserve photometric consistency.sfcWind) — multiplicative ratio clamped to ±5% per the "global stilling" note above.rlds, new in v5) — additive W/m² shift applied to the horizontal infrared field. Replaces what would otherwise be a Brunt/Berdahl-Martin sky-emissivity reconstruction; the direct CMIP6 delta is cleaner physics.pr, v5) — multiplicative ratio applied to liquid precipitation depth. Duration in hours is left unchanged (multiplying duration is wrong dimensional analysis).psl, v5) — additive Pa shift applied to atmospheric pressure, clamped to physically plausible bounds (50–110 kPa).clt, v5) — additive %-point shift applied to both total and opaque sky cover (same shift, preserving opaque ⊂ total), clamped 0–10 tenths.prsn, v5) — multiplicative ratio applied to snow depth as a proxy. See Snow physics limitation in the Limitations section for the caveat on this approach.Per-variable ensemble sizes vary because not every CMIP6 model publishes every variable in Pangeo. The API surfaces this transparently via model_counts: for a typical mid-latitude SSP2-4.5 query you'll see tas: 33,hurs: 29, prsn: 19 — narrower ensembles for the harder-to-publish fields. The wider variables drive the mean climate shift; the narrower ones still represent a credible multi-model signal.
SSP5-8.5 — peak fossil-fuel use, no policy response, ~4.4 °C of warming by 2100 — is not recommended for standard design work. The IPCC AR6 and the upcoming CMIP7 generation place this trajectory at low likelihood; observed emissions and policy momentum have moved well below it, so treating it as a routine "worst case" would inflate the design envelope. We nonetheless offer it as an explicit, opt-in extreme stress-test scenario — a "what breaks first" upper bound for resilience and worst-case analysis, clearly labelled as such throughout the UI. For defensible design conditions, use SSP3-7.0 paired with P90 or P95 percentile bands. Historical comparison data that referenced SSP5-8.5 (e.g. the EnergyModelIQ V2 building-energy simulation results) remains available as a frozen reference.
CMIP6 morphing in EPWForge operates at two distinct temporal resolutions, and it's worth distinguishing them when reading the methods that follow:
The split is deliberate. Daily-extreme metrics empirically embed both the thermodynamic background warming and the synoptic-regime contribution (per the Coen 2026 framework) — applying them to the tails avoids over-correcting the mean climate, which is already handled by the monthly delta. Sub-monthly delta fields for the mean climate path are on the research roadmap; see Diurnal-cycle climate deltas in Ongoing R&D.
Surface wind speed is the most uncertain variable in the CMIP6 ensemble — inter-model spread on future mean wind change typically exceeds the ensemble-mean signal, and observed mean wind has been declining over much of North America, Europe, and East Asia (the “global stilling” trend documented by Zeng 2019). To avoid amplifying this noisy signal, EPWForge clamps the CMIP6 wind multiplicative factor to ±5% per month regardless of the raw ensemble value. The clamp is symmetric so it doesn't bias the morph in either direction, and acknowledges that wind under warming is genuinely an open question rather than something CMIP6 ensembles resolve cleanly.
Design day values (heating 99.6%/99%, cooling 0.4%/1%/2% dry bulb and wet bulb, evaporation, dehumidification) are computed directly from the 8760-hour dataset using ASHRAE Fundamentals methodology. Mean coincident values are derived from percentile exceedance subsets, ensuring physically meaningful pairings between dry-bulb temperature and coincident variables.
Handbook preservation for unmodified TMY downloads: When you download a baseline TMYx file with no UHI, no events, no smoke, and no future-climate morphing applied, the published ASHRAE Handbook of Fundamentals (Chapter 14 Climatic Design Information) values are passed through verbatim into both the EPW DESIGN CONDITIONS header and the .ddy file. This matches what auditors expect for ASHRAE 90.1 code compliance — the Handbook values come from a 30-year reanalysis at each station, which is more authoritative than computing percentiles from a single 8760-hour TMY. For any modified download (UHI on, events, morphed scenarios, AMY), values are recomputed from the modified hourly data — Handbook values no longer apply once the climate has been modified. The .stat file's Comment 1 line states which path was taken.
Custom-pin and off-station design conditions: For locations not in the OneBuilding catalog (custom-pin lat/lon, or stations between published OneBuilding stations), design conditions are computed from a corrected long-term hourly record. Two corrections are applied before the percentile math:
The percentiles themselves (heating 99.6%/99%, cooling 0.4%/1%/2%, etc.) are always computed from the full multi-year corrected hourly record, not from the 8760-hour TMY composite. This matters because typical-year synthesis optimizes the bulk distribution, not tail extremes.
Cooling 1% Dry Bulb Historical TMY: 32.1 °C EPWForge (morphed): 35.4 °C ← derived from morphed hourly distribution Delta-only method: 33.6 °C ← historical + ΔT only
Illustrative values shown — actual results vary by location and percentile. The point: morphing the full hourly distribution captures shifts in extremes that simple offsets miss.
Undisturbed ground temperatures at any depth are computed using a published two-harmonic analytical model that extends the classical Kusuda–Achenbach approach with improved seasonal asymmetry. Inputs are derived from TMY surface air temperature data.
Most weather stations are located at airports or rural outskirts, which can be several °C cooler than dense urban cores at night. EPWForge offers UHI presets aligned with the Stewart & Oke (2012) Local Climate Zone framework. Diurnal profiles are applied to dry bulb temperature, dew point, and surface wind, with magnitudes consistent with observed ranges in the urban climatology literature. For known urban locations (major metros) we auto-apply a default UHI preset; the chosen preset and all applied modifications are disclosed in the downloaded .stat and .pvsyst headers so the data lineage stays auditable.
Every download is a zip bundle containing five companion files that mirror what climate.onebuilding.org ships alongside their static TMYx files — with the critical difference that ours are regenerated from the morphed weather data, so a future-climate or UHI'd scenario gets matching design conditions, statistics, and PV-tool inputs that reflect the modified climate. Nobody else does this for any of these formats.
SizingPeriod:DesignDay blocks for ASHRAE 99.6% / 99% heating and 0.4% / 1% / 2% cooling conditions. Consumed by EnergyPlus, OpenStudio, IES VE, eQUEST.A single morphed weather file is deterministic and cannot capture inter-model climate uncertainty. EPWForge generates per-model climate ensembles using real CMIP6 model deltas — multiple individual model EPWs per scenario, each representing a distinct projection. Engineers can run simulations across all members to understand the range of possible outcomes (peak loads, annual EUI, overheating hours) rather than relying on a single point estimate. This approach aligns with emerging best practice from NREL and LBNL for climate-resilient design.
Ensemble generation may take several seconds depending on location and scenario.
Typical meteorological years deliberately exclude extreme events, but resilience analysis requires understanding building performance under heat waves, cold snaps, and humidity events. EPWForge maintains a global extreme events database derived from ERA5 reanalysis, with statistically fitted return-period intensities at every grid cell. Events can be stitched into baseline or future weather files for direct analysis of questions like “if grid power fails during a 25-year heat wave in 2050, does the building remain habitable?”
Event intensity is exposed as a 1–10 slider per event type, mapped piecewise to an anomaly multiplier on the historical event:
Sliders 8–10 are gated behind an explicit stress_test=true flag on the API and the MCP server. These produce events more extreme than anything in the observational record and are appropriate for resilience studies (“what fails first?”) but not for HVAC sizing or code compliance — downstream simulators may also exceed their psychrometric bounds at extreme intensities.
When a user requests an event of duration N days, EPWForge maps the historical event's available days across the requested duration via a day-level time-warp: stitched day k draws from source event day floor(k × source_days / stitched_days), preserving the natural rise-peak-fall shape and each day's diurnal cycle. The peak day still appears for ~1–2 stitched days; surrounding days carry the natural shoulder pattern.
This replaces an older “peak-day-cycling” approach that repeated the single hottest / coldest day for the full event duration. The cycling produced sustained extremes that were physically implausible and inflated stitched temperatures by ~10°F relative to a natural event shape — particularly noticeable when stacked with SSP morphing and high percentile bands.
Two compound pairings are supported with co-driven physics:
The 0.5 physical blend reflects that compound extremes are not fully additive — the atmospheric circulations driving them are partially correlated rather than independent.
When an SSP scenario is active, event intensity sliders auto-populate using factors from the IPCC AR6 Atlas (Phase 4): TXx (annual peak temperature) for heat events, TNn (annual minimum) for cold. The factor reflects how much extreme events scale faster (or slower) than the mean climate signal at each location and horizon.
Cold-family floor. AR6 evidence suggests cold extremes warm faster than mean (dampening future severity), but recent observational record (Texas 2021, polar-vortex disruption events documented in Cohen 2026) doesn't yet support this cleanly. EPWForge defaults cold-family events to slider 5 under SSP (no future dampening, no false amplification) — users can manually override either way.
Improbability indicator. Each generated file carries an improbability score (1–10) computed from the joint rarity of active dials (SSP × year × percentile × UHI × event intensity × smoke). The UI surfaces this as a header warning when the combined settings drift into stress-test (≥4) or exploratory (≥7) territory.
Events are stitched at the climatologically peak window of the baseline file, not at the historical event's calendar date. Heat-family events (heat waves, hot-humid events) are inserted at the hottest 14-day window of the baseline file; cold-family events (cold snaps, cold-windy) at the coldest. Smoke overlays auto-align to whichever event drives the season.
We do this for three reasons:
What we keep from the historical event: the anomaly profile shape and the relative magnitudes across temperature, humidity, wind, and solar. What we replace: the calendar slot. The historical reference dates remain visible in the UI tooltip as provenance.
Wildfire smoke increasingly drives building HVAC sizing in fire-prone regions but is absent from conventional EPW files. EPWForge models smoke impact via a CAMS-AOD climatology (aerosol optical depth, 86,896 global grid cells, 2003–2025 record) and applies it physically to the relevant EPW fields when active.
Coefficients are mid-points of published literature ranges; site-specific tuning (by albedo, vegetation, aerosol chemistry) is not currently applied. Smoke events are auto-aligned in time to whichever event drives the season (smoke onset overlapping the peak of an active heat event, for instance).
Smoke severity auto-amplifies under SSP scenarios using a v1 proxy methodology, pending acquisition of wildfire-specific projection data (Touma 2021, Burke 2023, Wang 2025). For each of 46 AR6 reference regions, the peak-AOD intensity factor is derived as:
smoke_anomaly = heatwave_anomaly × biome_factor
(from AR6 TXx) (region-specific, ∈ [0.6, 1.2])
intensity_factor ≈ biome_factor × 1.7
⇒ ∈ [1.0, 2.0] (floor: no decrease; ceiling: 2× peak AOD)Biome factor tiers (centered on a baseline temperate response):
The floor=1.0 guarantees smoke severity never decreases below historical climatology even in fuel-limited regions. The ceiling=2.0 caps peak AOD at 2× the historical worst-case event for the location (e.g., Bay Area 2020 AOD ~7 → max ~14 under SSP3-7.0 2090 P95 in Western N. America), well within observed physical bounds.
Sources informing tier assignment: Abatzoglou & Williams 2016 (PNAS) on fire-weather TXx supra-linearity; Bowman et al. 2020 (Nat Rev Earth Env) global fire-climate review; IPCC AR6 WG1 Ch. 12 regional fire-prone projections.
This is a v1 proxy methodology — explicit because we'd rather show our work than silently apply a flat global multiplier. Plan: replace per-region biome factors with wildfire-projection-specific intensity factors from published CMIP6 smoke-day / burned-area projections once we've sourced and audited those datasets.
EPWForge undergoes ongoing automated validation across the data pipeline, the Belcher morphing implementation, and the EnergyPlus simulation outputs of pre-computed reference cases. Recent audits confirm:
Validation includes cross-checks against published TMYx datasets and internal consistency checks across all generated variables.
The lapse and per-station diurnal corrections used for custom-pin design conditions are validated against the published OneBuilding 2025 ASHRAE Handbook values across 16,906 reference stations:
Generated EPW and DDY files are runtime-tested in EnergyPlus 25.2 to confirm they load and simulate without errors.
Detailed audit reports are maintained internally and are available to enterprise customers on request.
EPWForge uses methods that are standard practice in the field, but all modelling tools carry inherent limitations users should understand when applying results to design decisions.
prsn) — a useful proxy but not a full accumulation × melt model. A proper treatment would track snow water equivalent through a degree-day melt model and couple albedo to snow depth. In practice this rarely matters for building energy simulation: most BEMs read ground temperature (which we compute correctly via Kusuda) and surface-specific albedos defined in the IDF rather than the EPW's air-side snow fields. For applications where the snow signal materially matters — slab-on-grade with snow coverage effects, ground-source heat pumps with frozen-soil regimes — treat the morphed snow depth as indicative rather than authoritative.These limitations are inherent to global-to-local downscaling. For critical design decisions, results should be validated against primary data sources and applicable local code requirements. EPWForge is designed to make these limitations explicit while providing the most physically consistent weather inputs practical for simulation workflows.
EPWForge is under active development. We continuously evaluate new datasets, climate-science advances, and modeling techniques as they emerge — refining the underlying methodology when peer-reviewed work or new operational data warrants. Specific roadmap items shift as the science evolves and customer use cases sharpen what's worth investing in.
If your project depends on a specific capability — a particular climate scenario, geographic coverage, validation against a dataset you trust — reach out and we'll prioritize accordingly.
Belcher, S.E., Hacker, J.N. & Powell, D.S. (2005). “Constructing design weather data for future climates.” Building Services Engineering Research and Technology, 26(1), 49–61.
Hersbach, H. et al. (2020). “The ERA5 global reanalysis.” Quarterly Journal of the Royal Meteorological Society, 146(730), 1999–2049.
Stewart, I.D. & Oke, T.R. (2012). “Local Climate Zones for urban ecosystem studies.” Bulletin of the American Meteorological Society, 93(12), 1879–1900.
Eyring, V. et al. (2016). “Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization.” Geoscientific Model Development, 9(5), 1937–1958.
IPCC. (2021). “Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change” (AR6 WG1) — including the AR6 Atlas regional CMIP6 indices used for Phase 4 event amplification factors.
Van Vuuren, D.P. et al. (2026). “The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP7.” Geoscientific Model Development, 19, 2627–2656.
Cohen, J. et al. (2026). Polar-vortex disruption and cold-extreme trends under continued Arctic warming. Used as the basis for our conservative cold-family floor in AR6 Phase 4 auto-fill (no future dampening applied).
EPWForge composes each hourly EPW field from the dataset best suited to that variable. The exact per-variable provenance is recorded in every output file's COMMENTS 1 line and its companion .stat file's Comment 1.
8,784 hours of overlap between our multi-source synthesis and NOAA SURFRAD ground-truth measurements. Lineage active for this run: ERA5 (pr/pressure/rlds/sfcWind) + ERA5-Land (hurs/tas) + NSRDB (dhi/dni/ghi)
| Variable | Monthly RMSE | Bias | Correlation |
|---|---|---|---|
| tas (°C) | 0.96 | +0.70 | 0.997 |
| hurs (%) | 7.16 | -5.33 | 0.725 |
| sfcWind (m/s) | 1.01 | -0.85 | 0.941 |
| pressure (hPa) | 0.71 | +0.69 | 0.995 |
| ghi (W/m²) | 4.70 | +3.91 | 1.000 |
| dni (W/m²) | 19.17 | +18.65 | 0.998 |
| dhi (W/m²) | 7.65 | -6.66 | 0.990 |
Validated against 11 SURFRAD sites across the continental US (BON / TBL / DRA / FPK / GWN / PSU / SXF). Pick a station above to see its 2020 error stats.