Translate to multiple languages

Subscribe to my Email updates

https://feedburner.google.com/fb/a/mailverify?uri=helgeScherlundelearning
Enjoy what you've read, make sure you subscribe to my Email Updates

Thursday, December 08, 2016

Preserving Scientific Software . . . in a Usable Form? | EDUCAUSE Review

Photo: Craig Stewart

"Reusability in the form of virtual machine images provides a way for scientific software to be preserved as it is used in a particular research project and thus enables reproducibility of scientific research." according to Craig Stewart, Executive Director of the Indiana University Pervasive Technology Institute and Associate Dean for Research Technologies.


The scientific method is based on scientific experiments being verifiable through reproduction.Today, however, some in the scientific community are concerned about the replicability of scientific research. Part of the problem is that given page limits imposed by most publishers, it is generally difficult, if not impossible, to describe scientific methods in sufficient detail to enable the research to be replicated. Furthermore, scientific research is today particularly likely to be viewed as a subject of political debate because there are so many topics of intense concern and interest to the public—from global climate change to genetically enhanced food sources. Those of us who are scientific researchers owe it to each other to make our research as easily replicable as possible. And we owe it to the taxpayers, who fund our research, to minimize the ability of people who critique research to distort its meaning or question the results of research on the basis of politically or financially motivated concerns.

Recently, interesting work has been published related to the replicability of experimental findings, such as an attempt to replicate studies in experimental economic research.1 In this article, I follow earlier distinctions between replicability and reproducibility. By replicability, I mean the ability to replicate an experiment as closely as possible and get generally the same result. The words "as closely as possible" are important: an experiment about an animal population in the wild in a particular area cannot be exactly replicated if the range of the species has moved and is no longer present in the area where an experiment was conducted. On the other hand, if one has the data from a research project, then it should be possible to precisely reproduce the analyses of the data in every detail, extend the analysis if appropriate, and/or correct errors if they exist.

In theory, one should be able to exactly reproduce the data analysis done as part of any research publication. In practice, doing so is hard. Part of the reason this is difficult is that so much of the data analysis is simply not included in published papers, as depicted in figure 1 from James Taylor, one of the creators of the Galaxy bioinformatics software environment.

Another reason it is hard to reproduce scientific data analyses and simulations is because of all the things one must specify in order to make reproducibility possible. Even if you use open-source software, "I used version x of the commonly used open-source software package X" is often all the space one gets. Important details such as which patches, from where, running in what operating system, compiled with what compilers, and using which mathematical libraries are usually omitted. Describing the software environment used for a particular analysis gets very tricky very quickly. 

Source: James Taylor, "Analysis Reproducibility," Speaker Deck, June 4, 2015 (CC-BY), adapted from J. T. Leek and R. D. Peng, "P Values Are Just the Tip of the Iceberg," Nature 520, no. 7549 (2015), 612.
Another reason it is hard to reproduce scientific data analyses and simulations is because of all the things one must specify in order to make reproducibility possible. Even if you use open-source software, "I used version x of the commonly used open-source software package X" is often all the space one gets. Important details such as which patches, from where, running in what operating system, compiled with what compilers, and using which mathematical libraries are usually omitted. Describing the software environment used for a particular analysis gets very tricky very quickly. 
Read more...