Abstract
Eddy Covariance measurements are often subject to missing values, or gaps in the data record. Methods to fill short gaps are well-established, but robustly filling gaps longer than a few weeks remains a challenge. Marginal Distribution Sampling (MDS) is a standard gap-filling method, but its effectiveness for long gaps (> 30 days) is limited. We compared the performance of a machine learning algorithm, eXtreme Gradient Boosting (XGB) against MDS, using various artificial scenarios of gap lengths and locations. We gapfilled half hourly CO2 flux from a temperate deciduous forest, Bartlett Experimental Forest, from 2010 to 2022. Whereas the standard implementation of MDS uses a narrowly-prescribed set of predictor variables, with XGB we were able to include additional variables. The Green Chromatic Coordinate (GCC), derived from PhenoCam imagery, and diffuse photosynthetic photon flux density, emerged as two of the three most important predictor variables. Compared to MDS, the root mean square error (RMSE) of XGB decreased by 9.5 %, and the R2 increased by 2.7 % in a randomized 10-fold cross validation test. XGB outperformed MDS for both day and night times across different seasons. But annual NEE integrals varied across methods, with weaker annual net carbon uptake, by -110 ± 74 g C m-2 y-1 for XGB compared to MDS (214 ± 11 g C m-2 yr-1). In artificial gap experiments, when trained using the 13-year data record, XGB reliably filled gaps, showing little change in RMSE for gaps up to 240 days. In contrast, the performance of MDS steadily decreased as gap lengths increased. MDS was unable to fill gaps longer than 2 months. In summary, XGB demonstrates excellent performance as an alternative method to MDS, providing reliable predictions for temperate deciduous forest carbon fluxes under different gap lengths and location scenarios. Implementation of XGB is facilitated by easy-to-use packages.