Uncertainties in model projections of carbon cycling in terrestrial ecosystems stem from inaccurate parameterization of incorporated processes (endogenous uncertainties) and processes or drivers that are not accounted for by the model (exogenous uncertainties). Here, we assess endogenous and exogenous uncertainties using a model-data fusion framework benchmarked with an artificial neural network (ANN). We used 18 years of eddy-covariance carbon flux data from the Harvard forest, where ecosystem carbon uptake has doubled over the measurement period, along with 15 ancillary ecological data sets relative to the carbon cycle. We test the ability of combinations of diverse data to constrain projections of a process-based carbon cycle model, both against the measured decadal trend and under future long-term climate change. The use of high-frequency eddy-covariance data alone is shown to be insufficient to constrain model projections at the annual or longer time step. Future projections of carbon cycling under climate change in particular are shown to be highly dependent on the data used to constrain the model. Endogenous uncertainties in long-term model projections of future carbon stocks and fluxes were greatly reduced by the use of aggregated flux budgets in conjunction with ancillary data sets. The data-informed model, however, poorly reproduced interannual variability in net ecosystem carbon exchange and biomass increments and did not reproduce the long-term trend. Furthermore, we use the model-data fusion framework, and the ANN, to show that the long-term doubling of the rate of carbon uptake at Harvard forest cannot be explained by meteorological drivers, and is driven by changes during the growing season. By integrating all available data with the model-data fusion framework, we show that the observed trend can only be reproduced with temporal changes in model parameters. Together, the results show that exogenous uncertainty dominates uncertainty in future projections from a data-informed process-based model.