Temporal Aggregation for Time-Series Geodata

Q: Should I compute spectral indices before or after temporal aggregation?

Always compute indices (NDVI, EVI, etc.) before temporal aggregation. Aggregating raw reflectance bands first and deriving indices afterward introduces mathematical bias because index formulas are non-linear — e.g., mean(NIR)/mean(Red) does not equal mean(NIR/Red).

Q: What frequency alias should I use for monthly resampling in xarray?

Use 'ME' (Month End) in xarray >= 2023.10 and pandas >= 2.2. The older 'M' alias is deprecated and raises FutureWarning. For calendar-month boundaries aligned to the first of each month, use 'MS' (Month Start).

Q: How do I avoid training/inference feature mismatch when aggregation windows differ?

Centralise all aggregation logic in a shared utility function and pass the frequency alias as a validated parameter. Never hardcode 'monthly' in one place and '30 days' in another — they produce different timestamps and silent feature drift.

Q: When is forward-fill safe for filling temporal gaps?

Forward-fill is appropriate only for variables with high temporal autocorrelation (soil moisture, land surface temperature) and gaps shorter than three observation periods. Avoid forward-fill for variables that can change rapidly (precipitation, flood extent) or for any binary/categorical mask layer.

Problem Framing

Raw satellite imagery, IoT sensor streams, and climate reanalysis grids arrive at cadences that almost never match the temporal resolution required for model training or inference. A Sentinel-2 tile revisits a site every five days under cloud-free conditions; a soil-moisture model might need monthly composites. Naively thinning the stack by sampling every Nth observation discards observations unevenly and bakes calendar artifacts directly into the feature vector. Systematic temporal aggregation — resampling and summarising the full stack into fixed-frequency statistical composites — is the correct solution, but it is non-trivial in production because cloud contamination, sensor dropouts, and timezone drift can silently corrupt every downstream statistic.

This topic sits inside Spatial Feature Engineering for Machine Learning, which covers the full chain from raw raster ingest to model-ready feature tables. Temporal aggregation is the step that bridges volatile daily observations and the stable monthly or seasonal features that supervised models generalise from.

Prerequisites & Environment Setup

Install exact versions to guarantee reproducibility. The ME frequency alias requires pandas 2.2+ and xarray 2023.10+; older releases use the deprecated M alias that will raise FutureWarning and will be removed.

pip install \
  "xarray>=2023.10" \
  "rioxarray>=0.15" \
  "geopandas>=1.0" \
  "pandas>=2.2" \
  "numpy>=1.26" \
  "dask>=2024.1" \
  "zarr>=2.17" \
  "pydantic>=2.5" \
  "s3fs>=2024.1"

System requirements:

GDAL 3.8+ (for rioxarray CRS operations on cloud-hosted COGs)
PROJ 9.3+ (CF-convention datum shifts)
Minimum 32 GB RAM for continental-scale stacks; configure a Dask LocalCluster or submit to a managed cluster (Coiled, Saturn Cloud) for petabyte workloads
Object storage: S3 or GCS bucket with HTTP range-request support for lazy Zarr reads

All time coordinates must follow ISO 8601 with explicit UTC designation per CF Conventions. Mixed timezone offsets silently shift observation timestamps and produce aggregation buckets that span incorrect calendar intervals.

Step-by-Step Implementation

Step 1 — Load and validate the multi-temporal stack

Open the Zarr archive with explicit chunking tuned to your storage block size. Validate the time dimension before any computation to surface gaps or timezone ambiguity early.

import xarray as xr
import pandas as pd
import numpy as np
from pathlib import Path


def load_and_validate(
    dataset_path: str,
    chunk_dims: dict | None = None,
) -> xr.Dataset:
    """
    Open a multi-temporal Zarr archive, validate the time dimension,
    and normalise timestamps to UTC-naive for consistent resampling.
    """
    if chunk_dims is None:
        chunk_dims = {"time": 1, "y": 512, "x": 512}

    ds = xr.open_zarr(dataset_path, chunks=chunk_dims, consolidated=True)

    if "time" not in ds.dims:
        raise ValueError(
            "Dataset must contain a 'time' coordinate dimension. "
            f"Found dimensions: {list(ds.dims)}"
        )

    # Normalise to UTC-naive so resample() produces deterministic month boundaries.
    if ds.time.dt.tz is not None:
        ds = ds.assign_coords(
            time=ds.time.dt.tz_convert("UTC").dt.tz_localize(None)
        )

    # Detect large temporal gaps that would invalidate monthly composites.
    time_vals = pd.DatetimeIndex(ds.time.values)
    gaps = time_vals[1:] - time_vals[:-1]
    max_gap_days = gaps.max().days
    if max_gap_days > 30:
        import warnings
        warnings.warn(
            f"Largest temporal gap is {max_gap_days} days. "
            "Monthly composites over this gap will have zero valid observations.",
            stacklevel=2,
        )

    return ds


# Sanity check
ds = load_and_validate("s3://my-bucket/sentinel2-daily.zarr")
assert "time" in ds.dims, "Time dimension missing after load"
print(f"Loaded {len(ds.time)} time steps spanning "
      f"{ds.time.values[0]} – {ds.time.values[-1]}")

Step 2 — Apply pre-aggregation cloud masking

Masking must happen before any resampling. Aggregating unmasked pixels and then discarding them afterward inflates observation counts and biases the mean toward cloud-contaminated values. Use the scene classification layer (SCL for Sentinel-2) or a dedicated QA band to null invalid pixels.

Before computing the index layers, apply Raster Band Math and Index Calculation to derive NDVI, EVI, and other indices from the clean masked reflectance — not from the raw stack.

def apply_cloud_mask(
    ds: xr.Dataset,
    scl_var: str = "SCL",
    valid_classes: tuple[int, ...] = (4, 5, 6),  # vegetation, bare soil, water
) -> xr.Dataset:
    """
    Null pixels whose Scene Classification Layer value is not in valid_classes.
    Returns a Dataset with invalid pixels set to NaN for all non-SCL variables.
    """
    if scl_var not in ds.data_vars:
        raise KeyError(
            f"Expected SCL variable '{scl_var}' not found. "
            f"Available: {list(ds.data_vars)}"
        )

    scl = ds[scl_var]
    # Build boolean valid mask; True where pixel is usable.
    valid_mask = xr.zeros_like(scl, dtype=bool)
    for cls in valid_classes:
        valid_mask = valid_mask | (scl == cls)

    masked_vars = {}
    for var in ds.data_vars:
        if var == scl_var:
            continue
        masked_vars[var] = ds[var].where(valid_mask)

    return ds.assign(masked_vars)


ds_masked = apply_cloud_mask(ds, scl_var="SCL")
# Quick check: masked fraction should be < 70% per tile on average
cloud_fraction = float(ds_masked["ndvi"].isnull().mean())
print(f"Masked pixel fraction (NDVI): {cloud_fraction:.1%}")
assert cloud_fraction < 0.95, "Cloud masking removed >95% of pixels — check SCL classes"

Step 3 — Resample to fixed temporal frequency

xarray.DataArray.resample() returns a Resample object; each statistical reduction is a separate method call. Passing a dict of functions directly via .reduce() is not supported. Compute observation count alongside the standard statistics — it becomes a first-class feature for weighting downstream models and for filtering months with insufficient valid observations.

def resample_and_summarise(
    ds: xr.Dataset,
    variables: list[str],
    freq: str = "ME",
    min_obs: int = 3,
) -> xr.Dataset:
    """
    Resample masked variables to `freq` and compute mean, std, min, max,
    and valid-observation count. Months with fewer than min_obs valid
    pixels are set to NaN.

    Parameters
    ----------
    ds : xr.Dataset
        Cloud-masked dataset with NaN for invalid pixels.
    variables : list[str]
        Variable names to aggregate.
    freq : str
        Pandas/xarray frequency alias. Use 'ME' for month-end (xarray >= 2023.10).
    min_obs : int
        Minimum valid observations required; periods below this are nulled.
    """
    missing = [v for v in variables if v not in ds.data_vars]
    if missing:
        raise KeyError(f"Variables not found in dataset: {missing}")

    agg_parts = []
    for var in variables:
        da = ds[var]
        r = da.resample(time=freq)

        obs_count = r.count(dim="time").rename(f"{var}_obs_count")
        da_mean = r.mean(dim="time").rename(f"{var}_mean")
        da_std = r.std(dim="time").rename(f"{var}_std")
        da_min = r.min(dim="time").rename(f"{var}_min")
        da_max = r.max(dim="time").rename(f"{var}_max")

        # Null any period where valid observations fall below threshold.
        sufficient = obs_count >= min_obs
        da_mean = da_mean.where(sufficient)
        da_std = da_std.where(sufficient)

        merged = xr.merge([da_mean, da_std, da_min, da_max, obs_count])
        agg_parts.append(merged)

    aggregated = xr.merge(agg_parts)
    aggregated.attrs.update({
        "temporal_aggregation_freq": freq,
        "variables_aggregated": variables,
        "min_obs_threshold": min_obs,
        "processing_engine": "xarray+dask",
    })
    return aggregated


aggregated = resample_and_summarise(
    ds_masked,
    variables=["ndvi", "lst"],
    freq="ME",
    min_obs=3,
)
print(aggregated)

Step 4 — Handle temporal gaps via interpolation

For short gaps under three observation periods, linear interpolation along the time axis recovers missing monthly composites without introducing large bias. For variables with high temporal autocorrelation — soil moisture, land surface temperature — forward-fill (ffill) is acceptable for gaps up to two periods. Never interpolate binary or categorical masks; restore them from source or leave them null.

def fill_temporal_gaps(
    ds: xr.Dataset,
    method: str = "linear",
    max_gap_periods: int = 2,
) -> xr.Dataset:
    """
    Interpolate NaN periods along the time axis.

    Parameters
    ----------
    method : str
        'linear' for continuous variables, 'ffill' for high-autocorrelation vars.
    max_gap_periods : int
        Maximum consecutive NaN periods to fill; longer gaps remain NaN.
    """
    filled = {}
    for var in ds.data_vars:
        if var.endswith("_obs_count"):
            filled[var] = ds[var]  # never interpolate observation counts
            continue
        if method == "linear":
            filled[var] = ds[var].interpolate_na(
                dim="time",
                method="linear",
                max_gap=max_gap_periods,
            )
        elif method == "ffill":
            filled[var] = ds[var].ffill(dim="time", limit=max_gap_periods)
        else:
            raise ValueError(f"Unsupported fill method: '{method}'")

    return ds.assign(filled)


aggregated_filled = fill_temporal_gaps(aggregated, method="linear", max_gap_periods=2)

When aggregating point sensors or administrative-unit statistics, Vector Proximity and Buffer Generation ensures irregular point locations align with raster grids before this interpolation step, preventing spatial misalignment from propagating into the temporal axis.

Step 5 — Register aggregated features in the feature store

Attach schema validation and lineage metadata before writing. A Pydantic model gates ingestion, blocking out-of-range statistics that indicate a masking failure upstream.

from pydantic import BaseModel, field_validator
import zarr


class TemporalFeatureSchema(BaseModel):
    feature_name: str
    temporal_resolution: str
    aggregation_type: str
    valid_range: tuple[float, float]
    missingness_threshold: float = 0.15

    @field_validator("valid_range")
    @classmethod
    def check_range_order(cls, v: tuple[float, float]) -> tuple[float, float]:
        if v[0] >= v[1]:
            raise ValueError(
                f"Lower bound {v[0]} must be strictly less than upper bound {v[1]}."
            )
        return v

    @field_validator("missingness_threshold")
    @classmethod
    def check_threshold(cls, v: float) -> float:
        if not 0.0 <= v <= 1.0:
            raise ValueError("missingness_threshold must be in [0, 1].")
        return v


def register_features(
    ds: xr.Dataset,
    output_path: str,
    schemas: list[TemporalFeatureSchema],
    pipeline_version: str = "1.0.0",
) -> None:
    """
    Validate each feature variable against its schema, then write
    the aggregated dataset to a consolidated Zarr store.
    """
    import datetime

    schema_map = {s.feature_name: s for s in schemas}

    for schema in schemas:
        if schema.feature_name not in ds.data_vars:
            raise KeyError(
                f"Feature '{schema.feature_name}' not found in aggregated dataset."
            )
        da = ds[schema.feature_name]
        missingness = float(da.isnull().mean())
        if missingness > schema.missingness_threshold:
            raise ValueError(
                f"Feature '{schema.feature_name}' missingness {missingness:.1%} "
                f"exceeds threshold {schema.missingness_threshold:.1%}."
            )
        lo, hi = schema.valid_range
        out_of_range = float(((da < lo) | (da > hi)).mean())
        if out_of_range > 0.01:
            raise ValueError(
                f"Feature '{schema.feature_name}': {out_of_range:.1%} of values "
                f"are outside valid range [{lo}, {hi}]."
            )

    ds.attrs.update({
        "pipeline_version": pipeline_version,
        "aggregation_timestamp": datetime.datetime.utcnow().isoformat(),
        "source_dataset": "sentinel2-daily.zarr",
    })
    ds.to_zarr(output_path, mode="w", consolidated=True, compute=True)
    print(f"Registered {len(schemas)} feature(s) to {output_path}")


register_features(
    aggregated_filled,
    output_path="s3://my-bucket/features/monthly-composites.zarr",
    schemas=[
        TemporalFeatureSchema(
            feature_name="ndvi_mean",
            temporal_resolution="ME",
            aggregation_type="mean",
            valid_range=(-0.2, 1.0),
            missingness_threshold=0.10,
        ),
        TemporalFeatureSchema(
            feature_name="lst_mean",
            temporal_resolution="ME",
            aggregation_type="mean",
            valid_range=(220.0, 340.0),  # Kelvin range for land surface temperature
            missingness_threshold=0.15,
        ),
    ],
)

For detailed monthly calendar alignment and observation-count thresholds, see Aggregating Daily Satellite Data to Monthly Features.

Verification & Testing

After writing to the feature store, validate the output against expected temporal and statistical properties before using features in model training.

def verify_temporal_output(output_path: str, expected_months: int) -> None:
    """
    Load the registered feature store and assert structural and
    statistical correctness.
    """
    result = xr.open_zarr(output_path, consolidated=True)

    # 1. Temporal completeness
    assert len(result.time) == expected_months, (
        f"Expected {expected_months} monthly timesteps, "
        f"got {len(result.time)}"
    )

    # 2. Calendar alignment — all timestamps should be month-end boundaries
    times = pd.DatetimeIndex(result.time.values)
    is_month_end = times == times + pd.offsets.MonthEnd(0)
    assert is_month_end.all(), (
        f"Non-month-end timestamps detected: {times[~is_month_end].tolist()}"
    )

    # 3. NDVI mean plausibility check
    ndvi_mean = float(result["ndvi_mean"].mean())
    assert -0.2 <= ndvi_mean <= 1.0, (
        f"NDVI mean {ndvi_mean:.3f} is outside valid range"
    )

    # 4. Missingness is within acceptable bounds
    missingness = float(result["ndvi_mean"].isnull().mean())
    assert missingness < 0.20, (
        f"NDVI missingness {missingness:.1%} exceeds 20% threshold"
    )

    print(
        f"Verification passed: {expected_months} monthly timesteps, "
        f"NDVI mean = {ndvi_mean:.3f}, missingness = {missingness:.1%}"
    )


verify_temporal_output(
    "s3://my-bucket/features/monthly-composites.zarr",
    expected_months=24,  # 2 years of monthly composites
)

For the broader context of how temporal features slot into normalisation and model-ready scaling, see Feature Scaling for Geospatial Inputs.

Troubleshooting & Common Errors

FutureWarning: 'M' is deprecated, use 'ME' Raised by xarray >= 2023.10 and pandas >= 2.2. Replace every freq="M" argument with freq="ME" for month-end or freq="MS" for month-start boundaries. The deprecated alias will be removed in a future release.

ValueError: Cannot reindex or align along dimension 'time' with duplicates Two observations share identical timestamps after UTC normalisation. Deduplicate with ds.sel(time=~ds.indexes['time'].duplicated()) before resampling.

KeyError: 'time' after open_zarr The Zarr archive was written without encoding the time dimension, or the time variable is stored as int64 without units. Re-encode using xr.coding.times.encode_cf_datetime() or confirm the source pipeline writes CF-compliant metadata.

Silent all-NaN monthly composites despite valid source data Cloud masking eliminated every observation in that month. Check ndvi_obs_count for those timesteps — if it is zero, the SCL valid-class list is too restrictive, or the source acquisition genuinely has total cloud cover. Lower the min_obs threshold only after inspecting the raw stack visually.

MemoryError when calling .compute() on a large Dask graph Chunk boundaries are misaligned with the underlying Zarr block size. Rechunk the dataset before resampling: ds = ds.chunk({"time": 1, "y": 256, "x": 256}). Call .persist() only after masking and filtering to avoid materialising unused variables.

Training and inference statistics diverge despite identical code Training uses calendar months (ME), but an inference wrapper was written with a rolling 30-day window. The two produce different timestamps and different pixel sets. Centralise the frequency alias in a shared configuration constant and reference it from both contexts.

Performance Optimisation

Chunk alignment: Match Dask chunk boundaries to the underlying Zarr block size (zarr.open(path).chunks). Misaligned chunks trigger redundant range requests on object storage and inflate memory pressure by 3–5×.

Compressor choice: Use zstd with shuffle=True for floating-point geospatial arrays. zstd delivers 15–25% better compression ratios than lz4 on NDVI and LST arrays while maintaining comparable read throughput.

Lazy graph pruning: Filter variables and apply cloud masks before calling .persist(). Premature materialisation forces Dask to load the entire time axis for variables you will discard.

Parallel I/O: Set consolidated=True on both read and write to batch all Zarr metadata into a single JSON fetch per store, reducing cold-open latency from tens of seconds to under one second on S3.

Storage reduction: Aggregating daily 10 m Sentinel-2 composites to monthly summaries reduces storage by 85–92% while preserving predictive signal for most supervised learning tasks. For petabyte-scale pipelines, combine monthly aggregation with Spatial Lag and Neighborhood Statistics at the monthly cadence rather than daily to avoid multiplicative blowup.

FAQ

Should I compute spectral indices before or after temporal aggregation?

Always compute indices before temporal aggregation. Aggregating raw reflectance bands first and deriving indices afterward introduces mathematical bias because index formulas are non-linear. For NDVI: mean(NIR) / mean(Red) does not equal mean(NIR / Red). Applying Raster Band Math and Index Calculation before the resample step is the correct order.

What frequency alias should I use for monthly resampling in xarray?

Use "ME" (Month End) in xarray >= 2023.10 with pandas >= 2.2. The older "M" alias is deprecated and raises FutureWarning. For composites anchored to the first of each month use "MS" (Month Start). Mixing the two in different pipeline stages creates timestamp collisions that silently corrupt joins.

How do I avoid training/inference feature mismatch when aggregation windows differ?

Define the frequency alias as a single constant in a shared configuration module and import it in every pipeline component. Never hardcode "monthly" in one place and "30D" in another — the two produce different timestamps, different pixel counts, and silent feature drift that only surfaces when predictions diverge in production.

When is forward-fill safe for filling temporal gaps?

Forward-fill is safe for variables with high temporal autocorrelation (soil moisture, land surface temperature) across gaps of at most two observation periods. Avoid it for variables that can change rapidly (precipitation, flood extent, burn severity) and never apply it to any binary or categorical mask layer.

Aggregating Daily Satellite Data to Monthly Features — calendar alignment, observation-count thresholds, and cloud masking for Sentinel-2
Raster Band Math and Index Calculation — compute NDVI, EVI, and custom indices before temporal summarisation
Feature Scaling for Geospatial Inputs — normalise and scale monthly composites before model training
Spatial Lag and Neighborhood Statistics — extend monthly composites with spatial autocorrelation features
Vector Proximity and Buffer Generation — align point sensors with raster grids before temporal aggregation

Part of: Spatial Feature Engineering for Machine Learning