Format QA/QC tests are used in the AmeriFlux BASE QA/QC processing pipeline to assess the compliance of uploaded files with the required FP-In format (a.k.a., Half-Hourly / Hourly Data Upload Format).
Test | Description | Example |
---|---|---|
Any problems reading file? | If the uploaded file is malformed, it cannot be read. | Error reading data from the file. |
Is Filename Format valid? | Checks the uploaded filename against the FP-In filename format. | These filename components are not in the standard AmeriFlux format: optional parameter included (will be removed in autocorrected file) |
Do filename time components match file time period? | The TIMESTAMP_START value in the first data row must match the ts_start component of the filename. The TIMESTAMP_END value in the last data row must match the ts_end component of the filename. | TIMESTAMP_START 199912312330 does not match filename ts_start 20000101000 time. |
Any invalid Missing-Value Formats? | Looks for common missing value formats, including -6999, NaN, NA, and empty values. Reports the variable names in which invalid missing values are found with the number of times in parenthesis. | Missing values are not indicated with -9999 for these variables (number of timestamps): TA (2); FC (41); TS_1_1_1 (12) |
Are Timestamp variables as expected? | TIMESTAMP_START and TIMESTAMP_END must be in columns 1 & 2. If they are not, this check reports the variables that are found. | These unexpected variables were found in columns 1 & 2 instead of TIMESTAMP_START and TIMESTAMP_END: YEAR, DAY |
Are Timestamp variables present? | Looks for TIMESTAMP_START and TIMESTAMP_END in any column. If one or both are missing, this check reports the missing variable. | Expected timestamp variable(s) TIMESTAMP_END is / are missing. |
Is all Data Missing? | If there is not data present in the file, this test reports an informational message. During Data QA/QC we combine files to create the entire data record. For each timestamp, the most recent value received that has passed Format QA/QC is used. Thus, values in recently uploaded files will overwrite those in previously uploaded files (for the same time period) even if the newer value is a missing value (-9999). | All 20 data variables found in the file have only missing values. Previously uploaded data with the same time period will be overwritten. |
Any Variables with ALL Data Missing? | Reports variables with all missing values (-9999) as an informational message. During Data QA/QC we combine files to create the entire data record. For each timestamp, the most recent value received that has passed Format QA/QC is used. Thus, values in recently uploaded files will overwrite those in previously uploaded files (for the same time period) even if the newer value is a missing value (-9999). | These variables have all data missing: TA_1; TS_1_1_1. Previously uploaded data with the same time period will be overwritten. |
Are primary flux Data Variables present? | In most cases, we require at least one of the primary flux variables FC, LE, or H to be present in the uploaded file. | None of the primary flux variables (FC, LE, H) are present. |
Are any standard AmeriFlux Data Variable names present? | Files are not accepted if they do not contain a few data variables in the AmeriFlux FP-In format. | No data variables in the standard AmeriFlux format are present. |
Are Data Variable names in correct format? | Checks compliance of variable names with AmeriFlux FP-In format. See Data Variables: Base names for a list of variable base names. A reminder that variable names should not be submitted with the “_PI” qualifier. | These variable names are not in standard AmeriFlux format: TSOIL, NRad, FC_PI;. They will not be included in the standard AmeriFlux data products. Non-standard variables will be saved for a non-standard data product that will be available in future. |
Any Variables suspected gap-fill? | Reports variables that have no missing values as an informational message to confirm that the variables are not gap-filled. If the variables are gap-filled, use the “_F” variable qualifier. While gap-filled versions of variables are accepted, non-filled data must be submitted for primary flux variables (FC, LE, H). Please also consider submitting non-filled data for all other variables. | These variables are suspected to be gap-filled because they have no missing values: NEE, PREC |
Are quotes found in all variable names? | Detects the use of quotes around variable names. Quotes around variable names or data values are not permitted. | All variable names have quotes. Quotes are not permitted in the standard AmeriFlux format. |
Are non-filled data present for primary flux, gap-filled Data Variables? | Primary flux variables (FC, FCH4, LE, H) must be submitted without gap-filling. Gap-filled data can be submitted in addition to the non-filled data. | These primary flux variables are marked gap-filled: FC_F_1_1_1, LE_F_1_1_1. Corresponding non-filled data could not be identified and must also be submitted. |
Any duplicate Variable names? | Duplicate variable names are not allowed. We temporarily rename the variable by adding a “_d#” suffix so that the remaining Format QA/QC test can be completed for identification of other issues. The temporary names may be referenced in other tests. | Duplicate variable names are present and are temporarily renamed as follows for Format QA/QC reporting: 1 duplicate instance of FC is temporarily renamed FC_d1. |
Are Timestamps in correct format? | AmeriFlux FP-In format must be used for timestamps. A common error is that timestamp values are treated as a float (e.g., YYYYMMDDHHMM.00) where as it should be an integer or text. This issue can be autocorrected. | 275 timestamps in TIMESTAMP_START have invalid format (YYYYMMDDHHMM is standard AmeriFlux format). |
Any Timestamp duplicates? | Reports duplicated timestamp values. We attempt to autocorrect this issue by removing the duplicate’s entire data row. Gap-filling with missing values (-9999) may be done to fill any resulting time gaps. | 4 duplicate timestamps found in TIMESTAMP_START. |
Is Timestamp resolution OK? | Reports inconsistencies in the timestamp resolution between rows of timestamp values, as well as within a row (i.e., between TIMESTAMP_START and TIMESTAMP_END values in the same row). We attempt to autocorrect this issue by removing the entire erroneous data row. Gap-filling with missing values (-9999) may be done to fill any resulting time gaps. | 3 timestamps in TIMESTAMP_START have invalid resolution HH within or between rows 3 timestamps in TIMESTAMP_END have invalid resolution HH within or between rows |
Timestamp problem encountered. | Reports that timestamp tests could not be completed. | These Format QA/QC assessments could not be completed: Do filename time components match file time period? Is Timestamp resolution OK? Any Timestamp duplicates? |
File Conversion Successful? | Reports an issue while extracting contents of a .zip or .7z file. | File with zip extension does not appear to contain any files. |
Autocorrections that can be attempted if failed issues are addressed in replacement file. | Reports issues that we can attempt to automatically correct if blocking issues are corrected in a replacement file. See examples. | Changed dat extension to CSV. Fixed invalid variable name TIMESTAMP_END with TIMESTAMP_END: whitespace removed Fixed invalid variable name FC_1_1_1_F with FC_F_1_1_1: qualifiers re-ordered |
Issues that cannot be autocorrected. | Reports issues that were found and cannot be corrected. | Timestamps are in scientific notation and cannot be fixed File could not be converted/extracted to csv. |
Data QA/QC Test Modules
Data QA/QC is part of the AmeriFlux BASE QA/QC processing pipeline and evaluates the quality of data. Data QA/QC test modules assess units and sign conventions, timestamp alignments, trends, step changes, outliers based on site-specific historical ranges, multivariate comparisons, diurnal/seasonal patterns, USTAR (i.e., friction velocity) filtering, and variable availability. Read more details for the test modules:
- Timestamp Alignment
- Physical Range
- Multivariate Comparison
- Seasonal-Diurnal Pattern
- USTAR Filtering