About Data

Across the Americas, independent scientists measure the flows of carbon between land and atmosphere, using a technique called eddy covariance, and then contribute their data to the AmeriFlux Network. The AmeriFlux Network works with scientists to standardize, quality check, and process data into common forms that the scientific community can use to examine crucial linkages between ecosystem processes and climate responses.

The following documents describe various aspects of data within the AmeriFlux Network:

  • Data Overview (below)
  • Data Variables explains the data format conventions used within the network.
  • How to Upload/Download Data contains information for submitting and retrieving data.
  • Data Processing Levels describes the data products generated by the network.
  • Data Processing Pipelines describes the QA/QC and value-added processing performed by the network.
  • BADM provides information for how to submit Biological, Ancillary, Disturbance and Metadata (BADM) information that support the eddy covariance data.


Data Overview

Data contributed to the AmeriFlux Network are complex and diverse. Ecosystem-level field sites acquire continuous measurements from a large number of sensors at high temporal resolution, which can result in large quantities of data. Data can be collected and processed at different time resolutions, with different units, and quality levels (e.g., filtering criteria and gap-filling). Data can also be derived through process based modeling activities.

The data collected at individual sites are sent to an archive operated by the network. Network data processing involves:

  • performing uniform data quality checks,
  • transforming original measurements from individual sensors to ecologically or micro-meteorologically meaningful quantities, and
  • generating high quality and standardized datasets.

The resulting datasets are made available to the scientific community.

Within the AmeriFlux Network, a data variable is defined as a set of magnitudes or values of a physically or abstractly meaningful quantity, e.g., air temperature or quality level. A data variable label is the assigned name used to identify and describe data within a dataset, e.g., TA, TA_F, TA_QC. Data variables can be measured directly by a sensor, or generated by data processing operations, which transform one or more data variables into new data variables. Data processing operations can be grouped into stages and data generated at each of these stages define data processing levels.

To make use of a given data variable, several aspects should be easily identifiable:

  • Spatial representativeness: the area being represented by a variable (e.g., sensor surroundings or tower footprint);
  • Measured / derived: whether a data variable is measured directly or derived from measured data through data processing operations or execution of models;
  • Filtered / Unfiltered: whether a data variable has been filtered for quality purposes or not;
  • Single / multiple sensors: whether the data variable was originated from a single sensor or a combination of sensors (of either the same type or different types);
  • Process knowledge: if the processing applied to a data variable includes any ecosystem process understanding that could violate the data independency requirements when used in the context of model validation and parameterization.

The data variable labels, the data processing levels, and BADM associated to a data variable are the main mechanisms to help easily identify these characteristics in a dataset.

The standardized datasets facilitate synthesis of earth science information across measurement types, methods of collection, and ecosystems. Scientists use these datasets to assess responses and feedbacks of terrestrial ecosystems to the environment, including changes in climate, land use, and extreme events such as droughts, storms, or wildfire.