Lessons learned from the data management of large scientific projects
Large scientific projects dispose of data management plans which mostly do not work satisfactorily. Though technology now allows to archive and distribute data from observational programs with nearly no quantity and access limitations. This paper describes the lessons learned from the IGBP JGOFS Project (Joint Global Ocean Flux Study). At the end of JGOFS the ICSU World Data Center for Marine Environmental Sciences (WDC-MARE) was asked to took over the responsibility to harmonize the available data produced by member countries having participated in JGOFS. The conversion process lasted for three years and resulted in a coherent, organized compilation of 67418 data sets, consisting of 4144 different parameters. Each data set, accompanied by its metadata, is available via the various access clients of the information system PANGAEA (http://www.pangaea.de), i.e. its search engine PangaVista, an advanced retrieval tool or can be harvested following OAI-PMH standards. The collection was also published as a static database in the WDC-MARE Reports on CD-ROM accessible through a local search engine.Through the efforts of the JGOFS data management task team at least many data have been made available in various formats on a number of CD and thus the work of this group has been very successfull in the first step. However the challenge to archive a data collection in size and complexity as described for JGOFS is to find a suitable data model, running on a performant technology and providing an access through the Internet which fulfills all the requirements of the scientific community. Data of such a variety can only be usefull for future projects with an added value if the archiving and distribution process also follows the most recent technological standards and the geo-data are properly described as defined by the ISO19115 fields.Even more important is to increase the acceptance of data management and thus to improve the flow of data from the source to the archives. One of the most important points in this regards is to give credit to the data provider. There will only be success in the future if data are handled similar to publications. A data set needs to have a proper citation, following bibliographic rules. Part of the citation must be a persistent identifier (e.g. DOI, Digital Object Identifier) to assure a reliable access to the object on the long-term. Any user of the data is urged to use this citation in the reference list of any resulting new publication. To improve the value of and the trust in data, a peer-review process for data publications must be examined. The data citations - including the link to the data - must be mirrored to portals, search engines and library catalogs to improve its accessibility. Centers and systems holding data need to have a well established status, policy and mission comparable to libraries.
AWI Organizations > Infrastructure > Computing and Data Centre > PANGAEA
Helmholtz Research Programs > MARCOPOLI (2004-2008) > I-MARCOPOLI