Format QA/QC Tests

Format QA/QC tests are used in the AmeriFlux BASE QA/QC processing pipeline to assess the compliance of uploaded files with the required FP-In format (a.k.a., Half-Hourly / Hourly Data Upload Format).

Test Description Example
Any problems reading file? If the uploaded file is malformed, it cannot be read. Error reading data from the file.
Is Filename Format valid? Checks the uploaded filename against the FP-In filename format. These filename components are not in the standard AmeriFlux format: optional parameter included (will be removed in autocorrected file)
Do filename time components match file time period? The TIMESTAMP_START value in the first data row must match the ts_start component of the filename. The TIMESTAMP_END value in the last data row must match the ts_end component of the filename. TIMESTAMP_START 199912312330 does not match filename ts_start 20000101000 time.
Any invalid Missing-Value Formats? Looks for common missing value formats, including -6999, NaN, NA, and empty values. Reports the variable names in which invalid missing values are found with the number of times in parenthesis. Missing values are not indicated with -9999 for these variables (number of timestamps): TA (2); FC (41); TS_1_1_1 (12)
Are Timestamp variables as expected? TIMESTAMP_START and TIMESTAMP_END must be in columns 1 & 2. If they are not, this check reports the variables that are found. These unexpected variables were found in columns 1 & 2 instead of TIMESTAMP_START and TIMESTAMP_END: YEAR, DAY
Are Timestamp variables present? Looks for TIMESTAMP_START and TIMESTAMP_END in any column. If one or both are missing, this check reports the missing variable. Expected timestamp variable(s) TIMESTAMP_END is / are missing.
Is all Data Missing? If there is not data present in the file, this test reports an informational message. During Data QA/QC we combine files to create the entire data record. For each timestamp, the most recent value received that has passed Format QA/QC is used. Thus, values in recently uploaded files will overwrite those in previously uploaded files (for the same time period) even if the newer value is a missing value (-9999). All 20 data variables found in the file have only missing values. Previously uploaded data with the same time period will be overwritten.
Any Variables with ALL Data Missing? Reports variables with all missing values (-9999) as an informational message. During Data QA/QC we combine files to create the entire data record. For each timestamp, the most recent value received that has passed Format QA/QC is used. Thus, values in recently uploaded files will overwrite those in previously uploaded files (for the same time period) even if the newer value is a missing value (-9999). These variables have all data missing: TA_1; TS_1_1_1. Previously uploaded data with the same time period will be overwritten.
Are primary flux Data Variables present? In most cases, we require at least one of the primary flux variables FC, LE, or H to be present in the uploaded file. None of the primary flux variables (FC, LE, H) are present.
Are any standard AmeriFlux Data Variable names present? Files are not accepted if they do not contain a few data variables in the AmeriFlux FP-In format. No data variables in the standard AmeriFlux format are present.
Are Data Variable names in correct format? Checks compliance of variable names with AmeriFlux FP-In format. See Data Variables: Base names for a list of variable base names. A reminder that variable names should not be submitted with the “_PI” qualifier. These variable names are not in standard AmeriFlux format: TSOIL, NRad, FC_PI;. They will not be included in the standard AmeriFlux data products. Non-standard variables will be saved for a non-standard data product that will be available in future.
Any Variables suspected gap-fill? Reports variables that have no missing values as an informational message to confirm that the variables are not gap-filled. If the variables are gap-filled, use the “_F” variable qualifier. While gap-filled versions of variables are accepted, non-filled data must be submitted for primary flux variables (FC, LE, H). Please also consider submitting non-filled data for all other variables. These variables are suspected to be gap-filled because they have no missing values: NEE, PREC
Are quotes found in all variable names? Detects the use of quotes around variable names. Quotes around variable names or data values are not permitted. All variable names have quotes. Quotes are not permitted in the standard AmeriFlux format.
Are non-filled data present for primary flux, gap-filled Data Variables? Primary flux variables (FC, FCH4, LE, H) must be submitted without gap-filling. Gap-filled data can be submitted in addition to the non-filled data. These primary flux variables are marked gap-filled: FC_F_1_1_1, LE_F_1_1_1. Corresponding non-filled data could not be identified and must also be submitted.
Any duplicate Variable names? Duplicate variable names are not allowed. We temporarily rename the variable by adding a “_d#” suffix so that the remaining Format QA/QC test can be completed for identification of other issues. The temporary names may be referenced in other tests. Duplicate variable names are present and are temporarily renamed as follows for Format QA/QC reporting: 1 duplicate instance of FC is temporarily renamed FC_d1.
Are Timestamps in correct format? AmeriFlux FP-In format must be used for timestamps. A common error is that timestamp values are treated as a float (e.g., YYYYMMDDHHMM.00) where as it should be an integer or text. This issue can be autocorrected. 275 timestamps in TIMESTAMP_START have invalid format (YYYYMMDDHHMM is standard AmeriFlux format).
Any Timestamp duplicates? Reports duplicated timestamp values. We attempt to autocorrect this issue by removing the duplicate’s entire data row. Gap-filling with missing values (-9999) may be done to fill any resulting time gaps. 4 duplicate timestamps found in TIMESTAMP_START.
Is Timestamp resolution OK? Reports inconsistencies in the timestamp resolution between rows of timestamp values, as well as within a row (i.e., between TIMESTAMP_START and TIMESTAMP_END values in the same row). We attempt to autocorrect this issue by removing the entire erroneous data row. Gap-filling with missing values (-9999) may be done to fill any resulting time gaps. 3 timestamps in TIMESTAMP_START have invalid resolution HH within or between rows
3 timestamps in TIMESTAMP_END have invalid resolution HH within or between rows
Timestamp problem encountered. Reports that timestamp tests could not be completed. These Format QA/QC assessments could not be completed: Do filename time components match file time period? Is Timestamp resolution OK? Any Timestamp duplicates?
File Conversion Successful? Reports an issue while extracting contents of a .zip or .7z file. File with zip extension does not appear to contain any files.
Autocorrections that can be attempted if failed issues are addressed in replacement file. Reports issues that we can attempt to automatically correct if blocking issues are corrected in a replacement file. See examples. Changed dat extension to CSV.
Fixed invalid variable name TIMESTAMP_END with TIMESTAMP_END: whitespace removed
Fixed invalid variable name FC_1_1_1_F with FC_F_1_1_1: qualifiers re-ordered
Issues that cannot be autocorrected. Reports issues that were found and cannot be corrected. Timestamps are in scientific notation and cannot be fixed
File could not be converted/extracted to csv.

Data QA/QC Test Modules

Data QA/QC is part of the AmeriFlux BASE QA/QC processing pipeline and evaluates the quality of data. Data QA/QC test modules assess units and sign conventions, timestamp alignments, trends, step changes, outliers based on site-specific historical ranges, multivariate comparisons, diurnal/seasonal patterns, USTAR (i.e., friction velocity) filtering, and variable availability. Read more details for the test modules: