The aim of this course is to both verify and extend published computational statistical results, with the goal for students to produce publishable findings. The topics will include technical issues from the papers covered, tools to facilitate reproducible computational science, such as version control, data structuring, and scripting, as well as legal and policy topics relating to research sharing. Students will be expected to present and discuss their work in class as well as in written submissions.
Reproducibility is at the core of scientific progress: it is not enough to publish a finding, rather, we must publish a description sufficient to allow other researchers to duplicate and verify our findings. The rapid growth of complexity in empirical computational research, however, is failing to preserve this principle: many papers in engineering and other fields are published describing the results of computational experiments without the slightest chance that other researchers will be able to reproduce the results except approximately and/or by extreme good luck. This is particularly ironic since the reality of commodity hardware and negligible cost of duplication of software and data mean that in this area, more than any other scientific domain, exact reproduction of results is attainable.
The key objective of this course is to give students an in-depth understanding of the importance and difficulties of reproducing computational research through a series of lectures and discussions, combined with an individual project to reproduce the results of a computational research paper of their choice and then to release a "eproduction package" to make it as easy as possible for other researchers to repeat their reproduction. Our past experience is that this project turns out to be much harder than initially expected, and the process of identifying and overcoming the barriers to reproduction will encourage students to support reproducibility of their own future research.
The weekly meetings will be a mix of presentations and discussions of the history, issues, and tools relating to reproducibility and code/data sharing, reports of ongoing progress from the students, and some guest speakers with varied perspectives on reproducibility and sharing.
Note: This course was initiated at Columbia by Victoria Stodden of the Statistics Department, and this course description is adapted from her earlier instances.
We will adhere to the GSAS Academic Integrity standards.
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Dan Ellis <email@example.com>
Last updated: Thu Dec 12 10:13:45 AM EST 2013