Department of Electrical Engineering - Columbia University

[SEAS logo]

EECS E6891 - Spring 2014

REPRODUCING COMPUTATIONAL RESULTS

Home page

Course outline

Resources

Projects

Columbia Courseworks E6891

Projects

The main focus of the course is the final project, whose goal is to reproduce and extend previously published results, and deliver the work in such a way that others may use the code and data to regenerate the work you have done for the class. The final project will result in a paper to be turned in for grading. Subsequently, you may wish to submit this paper for publication.

Begin by identifying an article in your field whose techniques interest you, and whose results you would like to reproduce. Your goal is to reproduce the specific numerical results in the tables and/or figures in that analysis, ideally based on the data used in the article. This article should have been published in a peer-reviewed top-tier journal or conference, preferably within the last 4-5 years; the more recent and prominent the better. The better the article, journal, and author you choose, and the more often the article has been cited, the more likely your paper will be publishable.

Please show us a copy of the article you choose before proceeding. We will advise you on what is unlikely to work, although unfortunately our approval is no guarantee that you will be able to replicate the work chosen and successfully complete the assignment. Your assignment is to pick an article according to the criteria above and to reproduce it. The choice of the article is part of the assignment and so, just as happens to faculty researchers, you may find that you need to change your choice of topic along the way depending on what you find or difficulties in reproduction and do it all again. (If you change articles, please show us the new article as well.)

Beware: Reproducing an article, even if you obtain access to the original data, is normally a highly uncertain and difficult process. Analyses that look neat and clean in published articles often prove to be far from that in reality. Most students find that prominent articles by leading scholars in the field contain errors, confusions, lack of essential information about how the analysis was conducted, and other problems. Some of these issues do not matter to substantive conclusions, and some do, but all make reproduction more difficult. As such, completing the reproduction will likely be more troublesome and time consuming than you anticipate -- even after you adjust for the information in this sentence!

After you have done everything you can do on your own, you may need to contact the author of the article; please consult with us about how to do this respectfully and diplomatically. By the same token, you should respectfully ask the original author(s) for comments on your paper before you share it with anyone outside of class, or submit it for publication; this can be a sensitive topic.

Around the middle of the semester, you will prepare a "reproduction package" consisting of the data you are using, the code you have produced, and explicit instructions on how to reproduce your results using these components. We will then give this to another student, who will try to replicate your results (without talking with you). That student will then write a memo to you about your paper, with copy to us. In science, we compete to advance knowledge about the world, not to tear each other down. Thus, the purpose is to improve the student's work. The remarkable difficulties students have in replicating published articles teaches more about the state of the literature, and conveys more about the sometimes shaky foundations of academic knowledge, than reading all the published literature one person could possibly consume on his or her own.

General specifications and requirements for the final report:

  • Papers should be no longer than about 15 pages (single-spaced, one-inch margins, 12pt, including figures, tables, and references). Think in terms of a short research note, not a full-length article. Journal space is scarce and so the longer the paper you write, the harder it will be to publish. If you can do it in 10 pages, so much the better.
  • You should adhere the guidelines for reproducibility in ICERM Workshop Report to ensure your own work is reproducible, and to describe and quantify the difficulties you encountered during replication. The Report provides a set of Best Practices for publishing research (Appendix D), which encompass and extend the "reproduction package" aspect of your project. They specify that each work should include:
    1. A precise statement of assertions to be made in the paper.
    2. A statement of the computational approach, and why it constitutes a rigorous test of the hypothesized assertions.
    3. Complete statements of, or references to, every algorithm employed.
    4. Salient details of auxiliary software (both research and commercial) used in the computation.
    5. Salient details of the test environment, including hardware, system software and the number of processors utilized.
    6. Salient details of data reduction and statistical analysis methods.
    7. Discussion of the adequacy of parameters such as precision level and grid resolution.
    8. Full statement (or at least a valid summary) of experimental results.
    9. Verification and validation tests performed by the author(s).
    10. Availability of computer code, input data, and output data, with some reasonable level of documentation.
    11. Curation: where are code and data available? With what expected persistence and longevity? Is there a site for site for future updates, e.g. a version control repository of the code base?
    12. Instructions for repeating computational experiments described in the paper.
    13. Terms of use and licensing. Ideally code and data "default to open", i.e. a permissive re-use license, if nothing opposes it.
    14. Avenues of exploration examined throughout development, including information about negative findings.
    15. Proper citation of all code and data used, including that generated by the authors
  • In addition to ensuring your paper observes these criteria, you should evaluate how well the original article met them, assessing each one individually and in as much detail as you think is necessary, and where possible quantifying the impact, for example "I was able to determine the curation details but to do that I had to ... which took two full days." Suggest, in a detailed way, how the original article should have met each of these criteria, if it did not meet them all. Finally, evaluate how your own replication study met each of these criteria; you can discuss how you met them or why it was impossible to do so.
  • For instructions on the style of the paper, see the Style section of Gary King's "Publication, Publication" paper. We will follow his description in this class.

Advice on Academic Articles

Below are a number of tips for your final project report, adapted in part from Gary King's "Publication, Publication". Many of these points attempt to anticipate the comments of journal reviewers.
  1. To justify publication, your paper should address a substantive problem in your field of interest and contain one or a few clear points; one point with several supporting points is better than a lot of unrelated points. You should be very clear about whose mind you are aiming to change and about what. If you're not sure of the answer to those questions, then it's doubtful you're making a contribution and there's little reason for the paper to be published.
  2. If you decide that the conclusions of the original article are incorrect, then show why you think that but also what led the authors of the original article to think otherwise. You should never discuss it in the paper -- directly or indirectly -- but you should assume, unless you have overwhelming evidence to the contrary (and maybe even then), that the authors were well-intentioned, smart, honest, and hard-working. Your article is about the author's findings, not about the author.
  3. Clarify with precision the extent to which you were able to reproduce the author's results. If you can't replicate the author's results even with the help of the author that is important information that needs to be on the public record, but it also means you can't build on this work to make further progress. And if you can't find out what the problem is, it might mean that you do not have a publishable paper and so might need to start with a different article. So try hard, and you may have to try very hard, to replicate.
  4. Unlike previous papers you may have written, do not allocate space in your paper in proportion to how much work you put in accomplishing each task. The point of this paper is to make your scholarly point, not to show how smart (or hard-working) you are. This paper should not be about you, or a report of what you did; it should be about what you contribute to our collective knowledge about the world. For example, a large fraction of your effort will probably go into reproducing a prior result -- and thus getting up to the cutting edge of the field -- but only in rare cases will that take more than a page or two of your paper. Space in your paper should be allocated in proportion to how much of a contribution it makes to changing the minds of someone in the literature about something important.
  5. After reproducing the article, follow the logic of King, Tomz, and Wittenberg (2000) and try to improve the presentation of the original results. See whether you can find useful, additional, or even contradictory information not discussed in the article without changing any assumptions in the original paper. If you are able to do this, then you need not defend anything other than your method of presentation, which would put you on very strong grounds in your claim for journal space.
  6. Where possible, you should run some controlled experiments designed to advance the state of knowledge. That is, make one improvement, or the smallest number of improvements possible to produce new results, and show the results so that we can attribute specific changes in substantive conclusions to particular methodological changes. If you are able to produce an interesting substantive result that is different from the original article, with only one completely justifiable methodological change, then you only need to defend this change fully and carefully.
  7. If you are able to improve or change the author's results in some important way with a minimal change (and that is maximally justifiable), write that up separately. Then, in a separate section, go ahead and make all the changes you think are desirable and see what difference that makes to your results. But make sure the minimal changes necessary to produce the new conclusions are described and justified first with results fully presented. Once you've done that, then you're home free in your quest for journal space.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Dan Ellis <[email protected]>
Last updated: Thu Dec 12 10:15:28 AM EST 2013