Data Variables

[pdf]

This document describes variable labels and file formatting used for continuously sampled data within the AmeriFlux and European Fluxes databases. Agreement on a common and shared system to name and organize the variables collected is important to data sharing across networks.

Continuously sampled data are defined as observations that are reported at regular intervals of time, generally daily or more frequent, for a certain time period. The time interval between two sequential values is always the same.

The rules described in this document include the following:

The rules generally apply to the various steps involved in the data life cycle within the network data system: from data uploads by the tower team to centralized processing and quality assessment / quality control (QA/QC), to the data distributed to final users. Rules specific to particular aspects of the measurement life cycle are noted. See also Half-hourly / Hourly Data Upload Format and Uploading High-Frequency Data.

1. Temporal representativeness and timestamps

Two forms of reporting the time associated with data are needed:

  • Data files using daily, monthly, and yearly resolutions use a single timestamp variable: TIMESTAMP. For these types of files, the temporal resolution of the data matches the temporal resolution of the timestamp. For instance, a single timestamp with daily resolution is sufficient to unambiguously identify the interval represented by a daily aggregate, e.g., 20150728.

  • Data files in half-hourly, hourly, and weekly resolutions use two timestamps variables. TIMESTAMP_START and TIMESTAMP_END to refer to the start and end of the reporting interval. In these types of files, the temporal resolution of the data differs from that of the timestamp. For instance, using a timestamp with minute resolution — e.g., 201507281730 — to identify a single half-hour period can be interpreted in different ways: 5:00pm to 5:30pm, 5:30pm to 6:00pm, or even 5:15pm to 5:45pm. While various conventions can be used to eliminate ambiguity, we have found the use of these two timestamp variables to be the most straightforward.

Below are examples of resolutions using a single TIMESTAMP variable as well as resolutions using both TIMESTAMP_START and TIMESTAMP_END.

  • sample half-hourly data file (both timestamps):

              TIMESTAMP_START,TIMESTAMP_END,CO2,...
              201507281700,201507281730,391.1,...
              201507281730,201507281800,391.8,...
              ...


  • sample hourly data file (both timestamps):

              TIMESTAMP_START,TIMESTAMP_END,CO2,...
              201507281700,201507281800,391.1,...
              201507281800,201507281900,391.8,...
              ...


  • sample daily data file (single timestamp):

              TIMESTAMP,CO2,...
              20150728,391.1,...
              20150729,392.8,...
              ...


  • sample weekly data file (both timestamps):

              TIMESTAMP_START,TIMESTAMP_END,CO2,...
              20150701,20150707,391.1,...
              20150708,20150714,391.8,...
              20150715,20150721,390.9,...
              20150722,20150728,392.0,...
              ...


  • sample monthly data file (single timestamp):

              TIMESTAMP,CO2,...
              201507,391.1,...
              201508,392.8,...
              ...


  • sample yearly data file (single timestamp):

              TIMESTAMP,CO2,...
              2014,388.1,...
              2015,392.8,...
              ...

Timestamp column ordering (text-based files only)

For text file data representations (i.e., CSV formatted), timestamps are always in the first column(s) of the file.

Time zone convention

Time is reported in local standard time (i.e., without Daylight Saving Time). The time zone is specified using the Site General Information BADM for the site.

Missing data

Missing data are reported using -9999 as replacing value.2 Data for all days in a leap year are reported.

1 Biological, Ancillary, Disturbance and Metadata.
2 Other values such as -6999, N/A, and NaN are not acceptable as an indication of a missing value.

2. Data Variable: Base names

Base names indicate fundamental quantities that are either measured or calculated / derived. They can also indicate quantified quality information.

Table 1. Base names for data variable labels

Name Description Units


3. Data Variable: Qualifiers

Qualifiers are suffixes appended to variable base names that provide additional information about the variable. For example, the _F qualifier in the variable label TS_F indicates that soil temperature (TS) has been gap-filled by the network.

Multiple qualifiers can be added, and they must follow the order in which they are presented here.

In general, qualifiers are applied at the network level (network teams only) and should not be used in data uploads by tower teams. Exceptions are noted in the qualifier descriptions below.

3.1. Qualifiers: General

General qualifiers indicate additional information about a variable.

3.1.1. _PI (Provided by PI / tower team)

  • Use: network team only
  • Details: _PI indicates a variable that has been QA/QC filtered or gap-filled by the tower team, independently of network QA/QC or gap-filling processing.
3.1.2. _QC (Quality control flag)

  • Use: network team only
  • Details: _QC reports quality checks resulting from standard and centralized QA/QC of the data.
3.1.3. _F (Gap-filled variable)

  • Use: tower team or network team
  • Details: _F indicates that the variable has been gap-filled.
3.1.4. _IU (Instrument units)

  • Use: tower team or network team
  • Details: _IU indicates that the variable uses instrument units (e.g., counts, mV, absorbance) instead of standard units (e.g., mm, degC, µmol mol-1). In general, this qualifier is used only in high-frequency data uploads and should be discussed with the network team before using.

3.2. Qualifiers: Theme, Methods, and Uncertainty

Placeholder for theme, methods, and uncertainty related qualifiers.

This will be their position in the order of suffixes to variable labels.

These qualifiers are currently being defined along with the post-processing results.


3.3. Qualifiers: Positional (_H_V_R)

Positional qualifiers are used to indicate relative positions of observations at the site. For example, observations can be measured at different points in space (e.g., along a vertical profile or in different positions within the horizontal plane) or measured at the same position using two or more sensors (replicates). Position qualifiers are appended to a variable base name. The actual sensor position is reported along with the corresponding position qualifier in BADM Instrument Ops template.3

3 Note: the indices may be reassigned by the network team in released data products. Any such change will be based on positions described in the BADM and feedback from tower teams.


3.3.1. _H_V_R (Three-index positional qualifier)

  • Use: tower team and network team
  • Details: The three components of the qualifier are indices that indicate an observation’s spatial position. In other words, the indices describe the position of a sensor relative to other sensors that measure the same variable within a site. They are not measurements of distances. The letters H, V, and R are to be replaced with integer values to represent:

  • Horizontal position (H): Use of the same H index indicates the same position within the horizontal plane among variables with the same base name. For example, observations that have the same variable base name and are arranged in a vertical profile would have the same H index. Note: variables with different base names could have different H indices even if located in the same physical location.

    Vertical position (V): Use of the same V index indicates the same position along the vertical axis among variables with the same base name. Indices must be in order, starting with the highest. For example, V = 1 for the highest air temperature or most shallow soil temperature sensor in a profile. The indices are assigned on the basis of the relative position for each vertical profile separately.

    Replicate (R): The R index indicates that the variable is measured in the same position (both H and V) as another sensor. Two co-located sensors are considered “replicates” if the difference in observations is due to separate instrumentation or different measurement technique. Spatial variability is never represented with different R indices. Defining spatial variability versus replication is variable dependent. For example, two radiometers measuring incoming radiation that are spaced 1 meter apart horizontally could be considered replicates, while two soil water content sensors at 1 meter horizontally spacing may have different spatial positions (different H indices).


Example:

Two profiles of soil temperature in two different horizontal positions: Profile 1 has four sensors at -2, -5, -10 and -30 cm, and Profile 2 has three sensors, one at -5 and two at -30 cm (e.g. different models). The codes will be:

Sensor Code
Profile 1, -2 cm TS_1_1_1
Profile 1, -5 cm TS_1_2_1
Profile 1, -10 cm TS_1_3_1
Profile 1, -30 cm TS_1_4_1
Profile 2, -5 cm TS_2_1_1
Profile 2, -30 cm, sensor model A TS_2_2_1
Profile 2, -30 cm, sensor model B TS_2_2_2


Adding sensors:

  • When a new sensor is added in the horizontal plane, a new value of the H qualifier is added.
  • When a new height / depth is added in an existing vertical profile, the entire profile can be renumbered to be in sequential order. Alternatively, a new index number can be used (even if not in the correct order). Metadata describing the new position or renumbered profile should be indicated in a BADM Instrument Ops template. If a new index number is used out of the correct order, the entire profile will be renamed sequentially by the network team. When the position is not measured for a certain year, the values for that year will be filled with -9999.
  • For AmeriFlux sites, see Data Variable: Qualifiers (Section 5) at Half-Hourly / Hourly Data Upload Format for additional instructions.
Example:

Continuing the example above, two new sensors are added. One is added in a new horizontal position at -30 cm depth, forming the new Profile 3. The other sensor is added to the existing Profile 2 at -20 cm depth. The codes become:

Sensor Code
Profile 1, -2 cm TS_1_1_1
Profile 1, -5 cm TS_1_2_1
Profile 1, -10 cm TS_1_3_1
Profile 1, -30 cm TS_1_4_1
Profile 2, -5 cm TS_2_1_1
Profile 2, -20 cm TS_2_2_2
Profile 2, -30 cm, sensor model A TS_2_3_1
Profile 2, -30 cm, sensor model B TS_2_3_2
Profile 3, -30 cm TS_3_1_1

Note: The entire Profile 2 is renumbered to accomodate the new sensor (TS_2_2_2) that is positioned between existing sensors above and below.

3.4. Qualifiers: Aggregation

Data from individual sensors may be aggregated by the network team using variable base names, position qualifiers, metadata, and discussion with the tower team. It is possible for tower teams to upload their preferred aggregations as well, using the aggregation qualifiers as described below. However, AmeriFlux prefers that individual sensor data are uploaded over aggregated values.

3.4.1. _H_V_A (Aggregation of replicates)

  • Use: network team only
  • Details: If replicates can be aggregated, they are averaged, and the result is reported with the R index of the _H_V_R position qualifier replaced with the letter A, i.e. _H_V_A. Continuing the example above, if the TS_2_3_1 and TS_2_3_2 can be averaged, the result will be named TS_2_3_A. Standard deviation and number of samples can also be reported with TS_2_3_A_SD and TS_2_3_A_N (see _SD and _N descriptions below).
  • Note: H and V are replaced with numerical indices, while A is used as is.
3.4.2. _# (Aggregation layer index)

  • Use: tower team or network team
  • Details: Variables with the same base name and the same height / depth but different horizontal positions can be aggregated. This aggregation across a horizontal plane represents the footprint at a given layer. The _# qualifier is replaced by a numerical index indicating the layer’s relative height / depth position.
  • Note: An aggregated layer index may not be needed for variables that are representative of the tower footprint, either through aggregation or spatial resolution (see note in example after 3.4.4). There are a few exceptions like soil temperature where the qualifiers are always needed to indicate layer depth.
3.4.3. _SD (Standard deviation – spatial variability)

  • Use: network team only
  • Details: Standard deviation of an aggregated variable. The _SD qualifier must be used in conjunction with an aggregation of replicates or aggregation layer index.
3.4.4. _N (Number of samples – spatial variability)

  • Use: network team only
  • Details: Number of samples in the aggregated variable. The _N qualifier must be used in conjunction with an aggregation of replicates or aggregation layer index.


Example:

Continuing the examples above, variables measured by sensors located at different positions within the horizontal plane but at a “similar” height / depth can be averaged. The aggregated layer variable qualifier (_#) indicates the sequential horizontal planes, with 1 indicating the highest layer position.

TS_1 = TS_1_1_1 (sensor at -2 cm)
TS_2 = TS_1_2_1 & TS_2_1_1 (sensors at -5 cm)
TS_3 = TS_1_3_1 (sensor at -10 cm)
TS_4 = TS_2_2_2 (sensor at -20 cm)
TS_5 = TS_1_4_1 & TS_2_3_A & TS_3_1_1 (sensors at -30 cm)

Note: TS_2_3_A in layer 5 is the aggregated value of replicate sensors in Profile 2 located at -30 cm depth, as indicated by the _H_V_A qualifier.

If a specific layer (_#) has two or more sensors, additional variables are also created. The standard deviation between sensors is identified with _SD. The number of sensors in the layer is identified with _N. In the case above, this would happen for TS_2 and TS_5, producing TS_2_SD, TS_2_N, TS_5_SD and TS_5_N.

Note: If a variable is not measured along a vertical profile, the _# qualifier is not used. For example, if there is only one radiation sensor measuring SW_IN, SW_IN_1 is not created. Similarly if there are PPFD sensors at different heights below canopy measuring PPFD_BC_IN, they can be averaged and standard deviation calculated. The _# is not used — the variables are named directly PPFD_BC_IN and PPFD_BC_IN_SD.


3.5 Order of Qualifiers

When multiple qualifiers are used, qualifiers are ordered as follows:
  1. General Qualifiers (As ordered in Section 3.1)
  2. Position Qualifiers or Aggregation Qualifiers (As ordered in Section 3.3 or Section 3.4)
Example:
Variable Explanation
TA_F_1_1_1 Air temperature (TA), gap-filled by network (_F) at horizontal position 1, vertical position 1, and replicate 1 (_1_1_1).
FC_PI_F_1_1_A Carbon dioxide CO2 flux (FC), gap-filled by tower team (_PI_F), aggregated value of replicated sensors at horizontal position 1 and vertical position 1 (_1_1_A).

P_IU_1 Precipitation (P) in instrument units (_IU) e.g. mV, at aggregate layer 1 (_1).
TS_2_3_A_SD Standard deviation (_SD), for soil temperature (TS) at horizontal position 2 and vertical position 3 aggregated across replicate sensors (_2_3_A).
TS_5_N Number of samples (_N) for soil temperature (TS) aggregated into layer 5 (_5).