COMPUTATIONAL RESEARCH AND APPLICATIONS SEMINAR – Reproducible Research & Best Practices for Computational Science – Glasbrenner

December 4, 2017 @ 4:30 pm – 6:00 pm
Exploratory Hall, Room 3301, Fairfax Campus
Joseph Marr


James Glasbrenner, PhD
Assistant Professor
George Mason University

Reproducible Research & Best Practices for Computational Science

Monday, December 4, 4:30-5:45
Exploratory Hall, Room 3301

ABSTRACT:  Have you ever had one of the following thoughts while working on your research?

  • I can’t remember where I put that data file.
  • I knew what these variables meant when I wrote them last year.
  • Did I accidentally delete that email with the final version of our research paper attached?
  • Why does my collaborator’s program delete the last row and column of this array before entering the main loop?

If so, then you’re not alone, because “most researchers are never taught the equivalent of basic lab skills for research computing” [1]. This situation persists even as the average scientific researcher devotes as much as 30% of their time developing and 40% of their time using scientific software [2]. Underdeveloped skills in programming, project organization, and documentation can lead to general frustration, productivity losses, an increase in the risk that a researcher won’t be able to reproduce his or her work, and can even result in serious computational errors that invalidate a study’s general conclusions [3]. At the same time, the number of scientific research groups that are integrating data science topics and methods into their programs is increasing at a rapid pace1 , further increasing the overall need to address this disparity. In response, a growing movement of researchers has emerged that are interested in tackling this problem, leading to the creation of organizations like the Software Carpentry Foundation [4], guidelines for reproducible research [5], and suggestions of “best practices” for scientific computing [1, 6, 7]. However, although there is more awareness about these potential solutions than in past years, these ideas are still not common knowledge. In this seminar, I will review the general background behind these ideas and what computational researchers can learn from other fields such as the software industry. Drawing on my own experience with implementing these ideas, I will provide examples of how you can integrate reproducible research ideas into your work using open source tools. Using the “best practices” suggestions as a guide, I will also show ways in which you can better organize your projects and some ways to make your code more readable, and then explain how this can help streamline scientific collaboration. Finally, I will close by reflecting on the role that automation can play in achieving these principles and goals.

[1] G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, and T. K. Teal, PLoS Comput. Biol. 13, e1005510 (2017).
[2] J. E. Hannay, C. MacLeod, J. Singer, H. P. Langtangen, D. Pfahl, and G. Wilson, in Proc. 2009 31st Int. Conf. Softw. Eng. ICSE Workshops (2009) pp. 1–8.
[3] Z. Merali, Nature 467, 775 (2010).
[4] “Software Carpentry,” .
[5] R. D. Peng, Science 334, 1226 (2011).
[6] G. Wilson, D. A. Aruliah, C. T. Brown, N. P. C. Hong, M. Davis, R. T. Guy, S. H. D. Haddock, K. D. Huff, I. M. Mitchell, M. D. Plumbley, B. Waugh, E. P. White, and P. Wilson, PLoS Biol. 12, e1001745 (2014).
[7] V. Stodden and S. Miguez, J. Open Res. Softw. 2, e21 (2014). 1An arXiv query for all pre-prints with metadata containing the term ”data science” reveals exponential growth, with the number of submissions approximately doubling every year since 2007.