In September 2020, the Ameriflux Management Project Tech Team hosted a panel discussion on best practices for collecting flux observations. The first topic of discussion focused on techniques to ensure high-quality data and to minimize data gaps. High-quality data means accurate, precise, and continuous observations with no large data gaps. High-quality data provides a richer dataset from which we can make insightful interpretations with less uncertainty. As lead panelist Russ Scott emphasized, take pride in collecting the highest quality data you can because you can’t go back in time to fix a problem.
Here are two approaches that can help you collect high-quality data and minimize data gaps: automate data visualization for post-visit data QA/QC, and take advantage of digital notes for field work and data processing.
Check your data early and often – automate data visualization
High-quality data depends on checking your data early and often. Honestly, it can be time consuming to do this task after every site visit or remote data download. If you find yourself going through a convoluted series of steps to check your data, you will not do it often (enough). Instead, put in the time early-on to automate your data QA/QC process. If you can make it easy on yourself to check your data, you will do it often. Frequent data checking will greatly help you produce high-quality data with minimal data gaps.
At our flux sites, our data QA/QC strategy is to automate data visualization. The key is to generate informative plots with the fewest possible steps. A great first step is to write a simple script that plots the most useful data columns as a time-series figure. It doesn’t have to be fancy. You can also add 1:1 plots of variables that act as reasonable proxies, such as data from pyranometer and PAR sensors.
More graphs are not necessarily better, either. There is a balance between too few graphs and too many graphs: if you have too few graphs and only plot CO2 and CH4 fluxes but don’t plot CO2 signal strength and CH4 RSSI, then you may miss deleterious decreases in signal strengths. This is what happened to me in grad school – I lost several months of flux data over the summer from our semi-arid creosote site because I didn’t check the CO2 signal strength. When I finally set up a ladder to check the sensor at the end of summer, there was one big splat of old bird poop covering the lower window of the LI-7500 sensor head. On the other hand, if you’re plotting every single column, trying to sort through too many graphs may be overwhelming and you may miss something simple. For example, you probably don’t need to check CH4 sensor temperature and pressure data every time if you have redundant temperature and pressure measurements.
Our lab uses MATLAB scripts and an in-house website for our data visualization. Simple scripts could be written in any language of choice (Python, R, …). If you’re downloading data remotely, Campbell Scientific LoggerNet also has tools for real-time and historical data plotting. Jonathan Thom of the ChEAS Core Site Cluster (Chequamegon Ecosystem-Atmosphere Study) wrote an informative blog post about using influxDB and Grafana for real-time data viewing after using LoggerNet Linux to collect data from their stations. If the community has other recommendations for easy data visualization tools, please share them in the comment section of this blog.
Beyond data visualization, you can also write scripts to calculate data statistics as a quick check. Our scripts check for minimum, maximum, and variance of selected variables, and will flag a variable if these statistics are outside of the expected bounds. We check for large variance because an unmoving value or a noisy signal from a broken sensor can still be within its expected range. We once found a typo in our data acquisition script where we were measuring soil moisture sensors every 10 hours instead of every 10 minutes.
Finally, beyond the data itself, you can also keep an eye on (or write a script to automatically flag) the number of files, the number of records per file, and the file sizes to make sure they are within expectations. For example, every 2 weeks when I visit our sites, I should be downloading about 600 lines of half-hourly data. If I only see 400 lines of data coming in, I know something is wrong.
Digital notes – who, where, when, what, why
The whole research process involves copious notes, from recording field observations and equipment swaps, to data processing and analysis notes. Detailed notes help us track field issues that need to be addressed promptly, reducing sensor downtime. A thorough record is also beneficial for personnel and succession planning when site maintenance is inevitably handed over to another staff member.
A thorough record includes documenting who, where, when, and what happened. But also importantly, why. Why did you remove the soil moisture sensor? Did it need to be calibrated? Was the sensor cable chewed by critters? Were you just moving this sensor to another site that needed a soil moisture sensor? This level of detail helps you manage your sites and your sensors, and it will also help future data users interpret the data from that time frame. “Removed soil sensor (sn 11235) and brought back to lab because of chewed cable” and “Removed soil sensor (sn 11235) and installed at XX site to replace broken sensor there” both provide data users with much more information about the data from that particular soil sensor than if you just noted “Removed soil sensor.” Document serial numbers and firmware version, sensors installed or removed at YY depth, changes in wiring or datalogger program (use version control when possible), and changes in measurement location.
Most of us probably take handwritten notes in the field because it is faster, more portable, and somewhat more resistant to field hazards such as water, mud, and unexpected drops. Although it takes time to transfer these handwritten notes to a digital format, digital notes come with many advantages.
Digital search is a powerful tool you can’t access with a stack of notebooks, one that you and future site researchers and technicians can take advantage of. Here are some ways to make the most of your digital notes:
- Host the notes online, e.g., on Box, Google Drive, or Dropbox. Anyone will be able to look it up, even from a field site if you have cellular connectivity. It’s also easy to add to any time you are at a computer, or perhaps even using your phone. You don’t need to dig up any specific notebook.
- Transcribe your handwritten notes soon after the field visit, while the visit is still fresh in your mind.
- Add photos or plots to your notes when possible. This is much easier to do for digital notes over analog notes.
- Use a standard, consistent format for certain, repeated phrases. This will make it easier to search for. For example, if you’re having a problem with your CO2 analyzer with serial number 75H-2176, you can search for “sn 75H-2176” to track the history of this analyzer and find out the last time it was cleaned, calibrated, or otherwise maintained without also needing to search for “serial number,” “serial #,” “serial no.,” “s/n,” etc. Another example would be to use “YYYY-MM-DD” for dates. Using a standard, consistent format makes it easier for machines to read if, down the road, you want to use a pattern-recognition algorithm (e.g., regular expressions) to find certain words or phrases.
Beyond harnessing the power of digital search, digital notes also have other advantages. Taking time to type your notes can also help you organize and flesh out your notes. This will help future staff from needing to decipher your handwritten bullet points with your unique shorthand. Easily accessed and easily searchable digital notes help you and future staff learn from your mistakes and others’ mistakes and experience.
Data processing notes – why digital?
In addition to field notes, we also keep our processing notes in an online, digital format. This helps everybody in the group keep track of current processing status across our multiple sites. When a new student or postdoc joins the lab and takes over a particular site, they have quick access to past processing notes and can see past and current concerns. It is also easy to add plots and figures to these processing notes.
We hope these two approaches of automated data visualization and digital field notes and data processing notes can help you collect high-quality data with minimal data gaps. If you have any suggestions or feedback, please post in the comment section below or email me.