Toward more data publication of long‐term ecological observations
Abstract
Data papers, such as those published by Ecological Research, encourage the retrieval and archiving of valuable unpublished, undigitized ecological observational data. However, scientists remain hesitant to submit their data to such forums. In this perspective paper, we describe lessons learned from the Long‐Term Ecological Research, the Global Biodiversity Information Facility and marine biological databases and discuss how data sharing and publication are both powerful and important for ecological research. Our aim is to encourage readers to submit their unpublished, undigitized ecological observational data then the data may be archived, published and used by other researchers to advance knowledge in the field of ecology. Coupling data sharing and syntheses with the development of innovative informatics would allow ecology to enter the realm of big science and provide seeds for a new and robust agenda of future ecological studies.
Abbreviations
-
- GBIF
-
- Global Biodiversity Information Facility
-
- JAMSTEC
-
- Japan Agency for Marine‐Earth Science and Technology
-
- LTER
-
- Long‐Term Ecological Research
-
- OBIS
-
- Ocean Biogeographic Information System
-
- TEAMS
-
- Tohoku Ecosystem‐Associated Marine Sciences
1 THE NEED FOR DATA PAPERS
The important scientific mission of archiving and publishing observational data is being assisted by remarkable developments in information technology (e.g., cloud servers and data archiving systems on the internet) and governmental open‐science strategies and policies (e.g., in Japan; Allagnat, Allin, Baynes, Hrynaszkiewicz, & Lucraft, 2019; https://www.natureasia.com/en/info/press-releases/detail/8734; accessed November 2, 2019 and Europe; https://ec.europa.eu/research/openscience/index.cfm; accessed November 2, 2019). Observational data networks such as the International Long‐Term Ecological Research (ILTER) (https://www.ilter.network; accessed November 2, 2019), Global Earth Observation System of Systems (GEOSS; https://www.earthobservations.org/geoss.php; accessed November 2, 2019) and the Group on Earth Observations Biodiversity Observation Network (GEO BON; https://geobon.org; accessed November 2, 2019) have been playing a leading role in the archiving and publication of observational data. GEOSS is a set of comprehensively coordinated in situ and satellite Earth observations and information and processing systems (https://www.earthobservations.org/geoss.php; accessed November 2, 2019). GEO BON facilitates improvement of acquisition, coordination and delivery of biodiversity change data and related services to decision makers and scientific communities (https://www.earthobservations.org/activity.php?id=128; https://geobon.org; accessed November 2, 2019). In addition, peer‐reviewed data papers are another powerful medium for scientists to publish and share their observational data.
Many ecological observations begin as handwritten field notes. Before these data can be analyzed and the results published in a paper, however, scientists must digitize and archive handwritten records by using spreadsheet software on personal computers in most cases. Peer‐reviewed journals generally do not publish all the observational data gathered as part of a study, meaning that the only way for other scientists to access the data is to request it from the author(s). This limitation hinders the advancement of analyses from other viewpoints. Thus, many data owners are the sole scientists to determine how the observational data are utilized and to perceive their significance.
A large quantity of unpublished observational data faces the risk of disappearing for several reasons, such as data storage hardware failure and the retirement and/or death of the data owners. This crisis is especially acute for data owners and technicians who plan to retire but have a large quantity of undigitized data (e.g., recorded on paper and analog photographs). Such observational data may be innovative, unique and useful for progressive research, business opportunities, engineering projects or teaching practice. Secondary and tertiary data are often not reported in original papers and their existence is therefore not evident to readers. In conjunction with other data sources, however, archived long‐term, continuous observational data can provide an accurate record of ecosystem changes over time, such as succession (D'Amato et al., 2017; Vellend, Brown, Kharouba, McCune, & Myers‐Smith, 2013). Memorandums and pictures in old field notes are valuable information for those trying to restore the previous landscape of an ecosystem. Accurately, precisely, and carefully observed data are quite valuable for future use by various scientists who were not involved in the original research and publication of those data.
To deepen our understanding of the trend of ecosystem changes and their drivers—including climate change, natural and anthropogenic disturbances and the interaction between anthropogenic activities and ecosystems variables—we require long‐term, continuous environmental and ecological observations across multiple regional points at a spatially broad scale. From such observations, a team of researchers tends to publish original research papers that report novel patterns and causes of ecosystem changes. However, data papers, which simply report the original data rather than scientific findings, have three advantages in promoting both temporally and spatially large‐scale observations.
First, compared to research papers, data papers are free from publication bias. Publishers tend to select research papers that statistically support a given scientific hypothesis versus those showing no such support (Johnson, 1999), despite the fact that such negative‐support data are equally important for fair meta‐analyses. Second, data papers can capture the “standard” state of an ecosystem. In ecosystems, temporally or spatially rare events often play a definitive role in ecological dynamics. However, one would not be able to tell how rare those events are without a vast accumulation of normal background data. Third, data papers may provide data that people will later find real scientific value in. For example, a flora record from 1930–1931 is now being used to detect the effect of global warming on mountain biodiversity (Klanderud & Birks, 2003).
It is not easy for any single individual to conduct temporally and spatially large‐scale observations. For this reason, the scientific community relies on a spirit of reciprocity in which each investigator publishes his or her observational data and permits others to access them. Through such data sharing, the use of the data is expanded to allow for novel scientific research based on the observational data. In some cases, however, the observational data that were obtained after time‐consuming, laborious and costly efforts are published without respectful acknowledgements of the data providers or are used inappropriately; in other cases, the data are used without properly evaluating the original data acquisition methods.
To demonstrate how shared observational data can be successfully shared and used, we describe lessons learned from (a) the Long‐Term Ecological Research (LTER), (b) the Global Biodiversity Information Facility (GBIF) and (c) marine biological databases. These lessons are only part from ecological observation networks and the database. However, our representative activities as international or regional leaders in each observation network and database will provide concrete valuable information. We hope to encourage readers to submit their observational data to publications such as Ecological Research as data papers. Our aim is to promote the continuous retrieval, archiving and publication of ecological observational data and to advance their analysis and evaluation through the utilization of published open data. We summarized the scientific concept of this paper in Figure 1.

2 LESSONS FROM LTER
LTER is a powerful tool that allows researchers to analyze dynamic and complex features of ecosystem structures and functions from the perspective of broad temporal and spatial scales. The U.S. LTER, funded by the U.S. National Science Foundation, was established in 1980 as a national network of site‐based and long‐term research in the United States. Numerous outstanding studies in the fields of ecology, biodiversity and environment sciences have been published based on data from the U.S. LTER over the last few decades (https://lternet.edu; accessed November 2, 2019). Currently, the LTER network has expanded to a global scale, as the International LTER network, and includes more than 40 member networks and countries and more than 800 individual sites around the world (Mirtl et al., 2018). The network serves as a useful global research platform for various cross‐site comparisons, integrated analysis and experiments based on an array of research questions and hypotheses (e.g., Djukic et al., 2018; Shibata et al., 2015; Watanabe et al., 2019).
The archiving and sharing of observational data gathered at the LTER sites are critical for long‐term and cross‐site analyses leading to a general understanding of complex ecosystem features (Vanderbilt et al., 2015). For example, Mitchell et al. (1996) found a significant impact of winter climate extremes on riverine nitrogen exports from forest catchments based on stream nitrate concentrations measured over 10 years at multiple experimental sites in the northeastern United States. Groffman et al. (2018) analyzed long‐term continuous observations of nitrogen budgets and chemical compositions in precipitation and stream water collected over 50 years in Hubbard Brook Experimental Forest, a U.S. LTER site; the analyses revealed a long‐term shift of ecosystem nutrient limitations under changing climate and atmospheric deposition.
Matsuzaki, Suzuki, Kadoya, Nakagawa, and Takamura (2018) recently clarified the bottom‐up linkages between primary production, zooplankton and fishes in a shallow, hypereutrophic lake in Japan based on long‐term monitoring of Lake Kasumigaura, a JaLTER site. Aguilos et al. (2014) reported long‐term changes of the carbon budget more than 10 years after clear‐cutting and plantation initiation in a northern forest based on eddy‐covariance CO2 flux measurements and intensive ground observations made at the Teshio Experimental Forest, another JaLTER site. The long‐term storage of historical records, such as forestry inventory data, is also very useful when trying to predict long‐term changes of forest structures. For example, Yoshida and Noguchi (2009) noted the vulnerability to strong winds of major tree species in a northern Japanese mixed forest by analyzing the historical document archive and recent intensive monitoring at a JaLTER site.
These important studies highlight the need for sustainable and reliable data archives and the sharing of various types of observational data to promote further research in the future. The ILTER has global open database for various information in the LTER sites around the world on the Dynamic Ecological Information Management System—Site and dataset registry (DEIMS‐SDR; https://deims.org; accessed November 2, 2019) including site characteristics of the each LTER site, observation sensors and dataset information. The current contents of the DEIMS‐SDR are mainly dominated by LTER in Europe (https://www.lter-europe.net; accessed November 2, 2019), which needs further expansion, except for site information that fully covers the whole ILTER members (Mirtl et al., 2018). Besides, the LTER database is available for various governance levels such as member country (e.g., JaLTER Metacat service; http://db.cger.nies.go.jp/JaLTER/; accessed November 2, 2019, LTER Network Data Portal in the United States; https://portal.lternet.edu/nis/home.jsp; accessed November 2, 2019, Environmental Change Network; http://www.ecn.ac.uk; accessed November 2, 2019, South African Environmental Observation Network [SAEON]; http://www.saeon.ac.za/data-portal-access; accessed November 2, 2019 and others), regional level (e.g., Data and information management for the LTER‐Europe network; https://www.lter-europe.net/lter-europe/data; accessed November 2, 2019) and linkage to the other external data portal (e.g., Data Integration and Analysis System Program [DIAS] in Japan; https://www.diasjp.net/en/; accessed November 2, 2019, Data Observation Network for Earth [DataONE; https://www.dataone.org; accessed November 2, 2019] for U.S.‐LTER, LTER in Europe, TERN [https://www.tern.org.au; accessed November 2, 2019] and Chinese Ecosystem Research Network [CERN; http://www.cern.ac.cn/0index/index.asp; accessed November 2, 2019]).
3 LESSONS FROM GBIF
Data integration and the establishment of web‐based databases are key for promoting data sharing and reuse because they provide greater data availability for users (Osawa, 2019). LTER provides a web‐based database that integrates ecological data, and several more such databases have been established recently. The GBIF is a central biodiversity database project (https://www.gbif.org; accessed November 2, 2019) (Edwards, 2000) that contains more than 1,000,000,000 biodiversity records, including occurrences, taxonomic names, species profiles and records from ecological fieldwork, and users can derive and use these records as open data (described in detail later). After data are deposited in the database, they can be accessed from all over the world through the internet, which, like a data paper, increases the visibility of the provider's work.
Importantly, researchers can publish their ecological observations both as a data paper and in such databases simultaneously. For example, the data of some data papers published in Ecological Research are released both within the Ecological Research Data Paper Archives and the GBIF database (e.g., Fukasawa, Mishima, Yoshioka, Kumada, & Totsu, 2017; Osawa, 2013; Voraphab, Hanboonsong, Kobori, Ikeda, & Osawa, 2015). Likewise, researchers can publish their data and original articles simultaneously, as encouraged by Osawa (2019). However, a data paper should include all the available data, rather than just those described in the original article (Osawa, 2017). This simultaneous release of data can promote an author's reputation, employment opportunities, standing at work and ability to secure funding (Costello, 2009; Costello, Michener, Gahegan, Zhang, & Bourne, 2013). Thus, making data broadly available could provide several benefits to researchers—both data providers and users.
To promote broad data sharing and reuse, data licensing for the exploitation of intellectual property is a critical issue, because all data rights should be owned by the person(s) who collected the data. In the “open data” concept, data are shared and anyone, anywhere, can work with them for any purpose (Open Knowledge International; https://okfn.org; accessed November 2, 2019); it is a powerful idea to resolve intellectual property barriers (Osawa & Iwasaki, 2016; Osawa, Jinbo, & Iwasaki, 2014). By using the open data concept, data reuse and sharing can be easily promoted because open data minimize intellectual property regulations. The open data concept does not do away with intellectual property rights but minimizes barriers to reuse and sharing, such that the data owners can maintain their fundamental rights. The easiest way to apply open data is the use of the standardized and well‐known Creative Commons licenses CC BY 4.0 and CC BY‐SA 4.0 (https://creativecommons.org; accessed November 2, 2019) (Osawa & Iwasaki, 2016). We showed the summary of Creative Commons licenses 4.0 in Table 1. Indeed, recent data papers published by Ecological Research have applied this type of license, which is important for promoting the publication of data papers.
| License | Under the terms: | You are free to: | |
|---|---|---|---|
| Share | Adapt | ||
| CC BY 4.0aa
https://creativecommons.org/licenses/by/4.0/.
|
Attribution, no additional restrictions | ○ | ○ |
| CC BY‐SA 4.0bb
https://creativecommons.org/licenses/by‐sa/4.0/.
|
Attribution, ShareAlike, no additional restrictions | ○ | ○ |
| CC BY‐ND 4.0cc
https://creativecommons.org/licenses/by‐nd/4.0/.
|
Attribution, no derivatives, no additional restrictions | ○ | |
| CC BY‐NC 4.0dd
https://creativecommons.org/licenses/by‐nc/4.0/.
|
Attribution, noncommercial, no additional restrictions | ○ | ○ |
| CC BY‐NC‐SA 4.0ee
https://creativecommons.org/licenses/by‐nc‐sa/4.0/.
|
Attribution, noncommercial, ShareAlike, no additional restrictions | ○ | ○ |
| CC BY‐NC‐ND 4.0ff
https://creativecommons.org/licenses/by‐nc‐nd/4.0/.
|
Attribution, noncommercial, no derivatives, no additional restrictions | ○ | |
- a https://creativecommons.org/licenses/by/4.0/.
- b https://creativecommons.org/licenses/by‐sa/4.0/.
- c https://creativecommons.org/licenses/by‐nd/4.0/.
- d https://creativecommons.org/licenses/by‐nc/4.0/.
- e https://creativecommons.org/licenses/by‐nc‐sa/4.0/.
- f https://creativecommons.org/licenses/by‐nc‐nd/4.0/.
4 LESSONS FROM MARINE BIOLOGICAL DATABASES
Considering the shared data types in public databases, some geographic information system companies have been promoting the standardization of the format of terrestrial data. In marine research, however, a unique format for each different observation type and multidimensional formats are often used. This situation has arisen due to the nature of oceanography (i.e., the necessity for three‐ or four‐dimensional data) and the different developmental trajectory of software such as the Generic Mapping Tools (GMT; https://www.soest.hawaii.edu/gmt/; accessed November 2, 2019), which is a popular software used to map quickly, even on ship, by using a command line.
In recent years, international biological databases, such as the Ocean Biogeographic Information System (OBIS; https://obis.org; accessed November 2, 2019), have been developed and use a format common to databases of terrestrial data. This effort has also been led by the top‐down approach of the Intergovernmental Oceanographic Commission of UNESCO (http://www.ioc-unesco.org; accessed November 2, 2019). Maintaining data in such a simple and common format enables easy reuse of the data. For instance, Yamakita, Sudo, Jintsu‐Uchifune, Yamamoto, and Shirayama (2017) identified important marine areas for biodiversity in East and South East Asia by using 1,120,974 data from OBIS and 751,511 data from GBIF. The authors also manually input another 24,254 data from the literature (currently uploaded in the Japan node of OBIS, BISMaL; http://www.godac.jamstec.go.jp/bismal/e/index.html; accessed November 2, 2019).
How can we achieve a common data format from the bottom‐up approach? In recent years, the improvement of information technology has enabled scientists to develop individual databases even for project‐based research. For example, Tohoku Ecosystem‐Associated Marine Sciences (TEAMS; http://www.i-teams.jp/e/index.html; accessed November 2, 2019), a project initiated after the Great East Japan Earthquake in 2011, maintains a database for storing survey data. With regard to the data policy, the funding agency requested that TEAMS make the data public from the start. Thus, the survey plan, reports, collected data and metadata of those activities are available. In the case of deep‐sea data obtained during cruises run by the Japan Agency for Marine‐Earth Science and Technology (JAMSTEC), a special data policy is also applied. These data are managed by JAMSTEC (http://www.jamstec.go.jp/e/; accessed November 2, 2019) as a common heritage of mankind, and the data must be stored for a long time and be easy to use. In the TEAMS project, sensor data are the most advanced, with several sources collected and integrated. The database (Research Information and Data Access Site of TEAMS: http://www.i-teams.jp/catalog/rias/e/index.html; accessed November 2, 2019) once stored several different types of sensor data, but the technicians in the database maintenance division inspected all types of data and updated the function to transfer all the data into the common format (JAMSTEC, 2014). Now users can view and download the compiled data of most of the sensor data in the same format from that database (TEAMS Environment and Biogeochemistry data Information System: http://www.i-teams.jp/ebis/search.jsf?lang=en; accessed November 2, 2019). Maintaining large databases and ensuring a common data format on a system will not be easy, especially for ecological field data. However, this difficulty should not prevent us from collecting and sharing a variety of ecological data among global scientific communities.
Unfortunately, except for presence/absence data and sensor data, large amounts of which can be collected in the same format, the many present marine databases are not always useful more than the data storage. In fact, in order to find important data within the databases, it is necessary to know the search word or tree structure of such data in advance. For beginners, it is easier and faster to ask people who already know or those who gathered the data where to find it. This is one difficulty of building a database. The curation of a classical paper‐based database has the advantage of enhanced clarity because it can be viewed as reading material. Therefore, a data paper published in a peer‐reviewed journal offers the benefits of curation and better readability than the sea of data out in the world.
Some data published in a data paper may be integrated into a large database by transforming them to a common data type or their method which gathered the data will be standardized data acquisition methods in the near future. At the same time, it is important to have a diversity of data types. The data format, acceptable quality, acquisition method and the way a researcher visualizes the world are not always constrained by standard formats, as noted in regard to the uniqueness of marine spatial data. In addition, research on linguistic diversity has revealed that language defines how users perceive the world (e.g., Thierry, Athanasopoulos, Wiggett, Dering, & Kuipers, 2009). Thus, in any database, the quality of the data, as well as the format, acquisition methods, perspective from which it was gathered, the connections among the data, and so on will reflect how the data acquirer views the world. We would like to see the various ways of thinking about the world's oceans, land and sky to be reflected in the observational data without forcing it into a perfectly uniform format at this stage.
5 ENCOURAGING THE SUBMISSION OF DATA PAPERS TO ECOLOGICAL RESEARCH
The journal Ecological Research established data papers in 2011 as a new category of peer‐reviewed papers (http://www.esj.ne.jp/er/datapaper.html; accessed November 2, 2019). The purpose of publishing data papers is to facilitate the sharing of long‐term, high‐quality observational data with detailed metadata that provide documentation of the content of the data. Because all of the observational data and metadata of data papers are peer reviewed, data papers are treated as scientific articles. In the review process, a data paper is evaluated for its ecological significance and overall quality. The data should contribute significantly to the development of the field of ecology. In particular, the data must be suitable for long‐term and/or large‐scale ecological research and able to be reused.
Since the establishment of this category, the number of data papers published in Ecological Research has gradually increased (Figure 2). As of August 2019, 34 data papers have been published. The number of data papers of Ecology, one of the major journals in the field of ecology, also increased in recent years (Figure 2). In the last 8 years, the numbers in Ecology is about four times that of Ecological Research. The increased numbers in both journals suggest that researchers begin to realize the importance of archiving observational data for future ecological researches. Valuable observational data from a wide variety of natural ecosystems (e.g., forests, lakes, oceans, rivers) and agro‐ecosystems have been archived. For example, observational data from a tree census and studies of litter fall and ground‐dwelling beetle communities in forests from one of the largest monitoring network projects in Japan were published as data papers (Ishihara et al., 2011; Niwa et al., 2016; Suzuki et al., 2012); the project is Monitoring Sites 1,000 launched by the Ministry of the Environment, Japan, which covers subarctic to subtropical climate zones. Other papers report data relevant to the evacuation zone of the 2011 Fukushima Daiichi Nuclear Power Plant accident, which received high levels of radioactive contamination. Mammalian and avian species assemblages are being monitored inside and outside the evacuation zone, and these observational data were also reported as data papers (Fukasawa et al., 2017; Keita et al., 2016). Many of the data papers published in Ecological Research refer to sites in East Asia, with more than 80% being in Japan.

Thus, data papers published in Ecological Research contribute to the development of the field of ecology by providing open observational data collected in East Asia. The continuation of this activity is expected to lead to the retrieval of more unpublished observational data from East Asian countries. The following limitation and challenges of data papers also still remain. How can we publish the know‐how to perform field studies? How can we continuously maintain the observation organization, facilities and environments? How can we continuously receive the financial and expedient supports from funding and public agencies? In order to solve these issues, first of all, we encourage readers to submit your valuable ecological observations as data papers in Ecological Research.
ACKNOWLEDGEMENTS
We thank all the participants of the workshop organized by the authors' group at the 66th annual meeting of the Ecological Society of Japan in March 2019 for their valuable comments and discussion to develop this manuscript. We also thank all the contributors for this special issue. We thank the editor and the three anonymous reviewers for their kind and constructive comments. This paper contributes to the activities of the Japan Long‐Term Ecological Network (JaLTER) and the International Long‐Term Ecological Research Network (ILTER).




