AmeriFlux principal investigators are quickly adopting the new AmeriFlux CC-BY-4.0 Data License, which allows their sites’ AmeriFlux data to be shared under the widely-used Creative Commons BY 4.0 license (CC-BY-4.0). This license allows data use as long as attribution is provided. This new license is more open than the historical AmeriFlux data policy, which is now renamed the AmeriFlux Legacy Data Policy. In an interview, AMP’s Data Lead Deb Agarwal explains how the new CC-BY-4.0 license helps both AmeriFlux site teams and the larger scientific community.
Q: Why is the AmeriFlux Management Project (AMP) offering CC-BY-4.0?
DA: As we have grown as a network, we have outstripped parts of the Legacy policy and its feasibility. When we had 80 sites, saying “you must contact them all,” it felt doable—hard, but doable. When you talk about 400 sites, I think you have left the realm of what you can reasonably expect a person to do. That means many high profile uses of AmeriFlux data become hard to do. If someone wants to use the data from 400 AmeriFlux sites to compare to a climate model, do an analysis, and publish results, they don’t want to have to contact the 400 sites in order to use the data. As the network grows, we need simpler logistics for data users.
In addition, we have run into problems working with other regional networks, like NEON, where our licensing was more restrictive than theirs. They were hesitant to include their data in the AmeriFlux set under the Legacy policy.
Q: Why did AMP decide to provide a CC-BY-4.0 data use license?
DA: AmeriFlux offering CC-BY-4.0 licensing recognizes that the Data Publication and Data Sharing fields have evolved since the AmeriFlux network was started. One of the most important aspects of Data Sharing is how we license our data. Over time many groups that had built custom licensing have been shifting to more broadly accepted policies. The vast majority have gone to the Creative Commons standard licenses because they are widely used, accepted, and understood. Our AmeriFlux community and institutions embrace open data sharing. With several widely accepted data sharing licenses available, it makes sense to move to one of these licenses.
CC-BY-4.0 is very close to the AmeriFlux Legacy policy in that it requires appropriate attribution of the data. However, it doesn’t require contacting the data producer. For researchers using data with CC-BY-4.0 licensing, this reduces risk of encountering last-minute problems publishing their research results. We will still be tracking the downloads.
Q: What are the advantages of using CC-BY-4.0?
DA: The new license will make AmeriFlux data more compatible with other flux networks (e.g., ICOS, OzFlux and NEON) and more FAIR (Findable, Accessible, Interoperable, and Reusable). It is similar to FLUXNET2015 CC-BY-4.0 data policy. Journal publishers (e.g., Nature, AGU/JGR Biosciences, ESA, Elsevier) are moving toward requiring that any data cited in a journal article be in a FAIR repository and that data use a recognized open license. Funding agencies, such as the European Union, DOE, NSF and other US federal funding agencies, and several universities now encourage or require open data sharing as well. (Editor note: US universities have gathered together to create a Data Management Plan tool that incorporates funding agency data sharing requirements.) In addition, CC-BY-4.0 makes AmeriFlux data products compatible with DOE’s ESS data repository, ESS-DIVE.
We already have DOIs (Digital Object Identifiers) for AmeriFlux data products, which are unique identifiers for each site’s data. Citation of data DOIs make the data’s use in publications and reports trackable. The mechanisms for tracking data citations are evolving rapidly and will become robust and fully functional in the next year. We are working with the digital libraries community and other data repositories to develop citation and tracking standards. This will enable data providers to clearly see the impact of their data, while saving time on tracking down a site’s Site ID in individual publications. This is very important to us. This effort applies to Legacy data as well. We are also working on solving how to cite 400 sites’ data. It’s another step to facilitate citations in large synthesis papers using hundreds of data products.
Site teams are ready now to elect CC-BY-4.0. When FLUXNET2015 was published in 2020, 206 sites elected to release their data under CC-BY-4.0; only 6 stayed with the historical custom licensing. So far 355 AmeriFlux site teams have elected CC-BY-4.0, out of a total of 551 AmeriFlux sites. This represents roughly 75% of site-years of published data.
Site PIs should know that only data from sites agreeing to CC-BY-4.0 will be included in the upcoming ONEFlux processing. In addition, AMP will prioritize support such as site visits to sites agreeing to share their data under CC-BY-4.0. With limited resources, we want to make sure our support goes first to sites that share their data more openly with the scientific community.
Q: Can data shared under the different policies be used together?
DA: The way it works, a subset of AmeriFlux data is available under CC-BY-4.0. ALL of AmeriFlux data is available under Legacy. So if you choose only data from sites in the subset that are CC-BY-4.0, then you will follow the CC-BY-4.0 requirements. If you choose anything outside of that subset, then you follow the AmeriFlux Legacy data policy for the entire dataset.
The portal for downloading data products shared under CC-BY-4.0 will be available soon—you’ll be able to use the portal by the 2021 AmeriFlux Meeting. So the data user will be able to see what data products are available under each policy, and then select what they want.
Each site chooses a license, either CC-BY-4.0 or Legacy, for all of its data. Once a site chooses CC-BY-4.0, it cannot go back to Legacy. We did not make the license specific to data-year, because it adds great complexity to the maintenance of the system itself.
Q: Will new sites then automatically use the CC-BY-4.0 license?
DA: We considered that, and we decided that sites joining the AmeriFlux network can choose Legacy if they want. Of course, they can choose CC-BY-4.0 right away, and we hope that they do.
A site’s data will still be covered by Legacy when used in combination with other sites’ data that use Legacy. As long as some sites in AmeriFlux are under Legacy, researchers using data that combine Legacy and CC-BY-4.0 licenses will treat all of the data as being under the Legacy data policy terms.
CC-BY-4.0 license terms are compatible with Legacy terms, just less restrictive. Legacy has additional constraints for the user who plans to use the data, whereas CC-BY-4.0 has suggestions. Legacy says you must contact the site PI with information on what you’re working on. The CC-BY-4.0 says that we recommend you do that. So it’s not a requirement.
Q: Will there be a sunset for the Legacy policy? Will the license convert to CC-BY-4.0, say, in 10 years?
DA: The problem is, who do you ask? Who makes such a decision? AmeriFlux Management Project? The AmeriFlux Science Steering Committee? I don’t know the answer to that. We have discussed it—we have wrestled with this issue since 2014. But this decision belongs to the site team.
Q: CC-BY-4.0 does not differentiate between noncommercial and commercial use. Why allow commercial use as well?
DA: Limiting commercial use had unintended and unfortunate consequences. It means that for the manufacturers of the AmeriFlux instruments, they can’t use this data to even improve the performance of their own systems. This was pointed out to us by some of the manufacturers early on, so it was not included in the Legacy policy either. It’s very constraining in ways you never expect.
Q: How will CC-BY-4.0 affect future data use?
DA: In our experience CC-BY-4.0 is a preferred license. The license is easy to understand. So we expect that over time data users will increasingly choose data shared under CC-BY-4.0. We already see that the product from the ONEFlux processing is a preferred product—the first step to being included in the ONEFlux processing is to agree to CC-BY-4.0 for the site’s data. As our DOI citation tracking tools get better, we will be able to measure whether there is a preference for data use under CC-BY-4.0. Another incentive over time will be that publishers will increasingly expect papers to use data repositories that share data under widely accepted open licenses.
Q: What else should AmeriFlux site PIs know about next steps?
DA: We will announce the first sites included in our ONEFlux production run in the next few months. If sites want to be included in the following ONEFlux run to be released six months later, they will need to choose CC-BY-4.0 for their site’s data. Then they need to submit model and height information for their sensors, and provide variable aggregation information. We expect site PIs will be eager to be part of the ONEFlux production runs.
I really feel that moving to CC-BY-4.0 is a big sea change for AmeriFlux. Once we can count the citations and show site PIs citation lists of their datasets, they’ll see the evidence. Tracking dataset citations is the next big step, and we expect that to work within a year.
More to check out: