Discussion Paper

No. 2017-74 | September 28, 2017
A replication recipe: list your ingredients before you start cooking
(Published in Special Issue The practice of replication)


The author argues that researchers should do replications using preanalysis plans. These plans should specify at least three characteristics: (1) how much flowtime the researchers will spend, (2) how much money and effort (working hours) the researchers will spend, and (3) the intended results and the precision of the replication necessary for “success”. A researcher’s replication will be “successful” according to context-specific criteria in the preanalysis plan. The author also argues that the two biggest drawbacks of preanalysis plans—(1) that they discount unexpected but extraordinary findings and (2) that they make it difficult for researchers to prespecify all possible actions in their decision trees—are less relevant for replications compared with new research. The author concludes with describing a preanalysis plan for replicating a paper on housing demand and household formation.

JEL Classification:

B41, C80, C81, R21


  • Downloads: 110


Cite As

Andrew C. Chang (2017). A replication recipe: list your ingredients before you start cooking. Economics Discussion Papers, No 2017-74, Kiel Institute for the World Economy. http://www.economics-ejournal.org/economics/discussionpapers/2017-74

Comments and Questions

Anonymous - Referee report 1
October 18, 2017 - 09:42

see attached file

Andrew C. Chang - Referee#1 Reply
November 22, 2017 - 04:25

See attached.

Anonymous - Referee report 2
November 21, 2017 - 11:50

Summary of the Paper’s Findings/Contributions:

The paper does not have empirical or theoretical findings, per se. It is a think piece, arguing the general merits of preanalysis plans for the case of replication studies and commenting on how to construct them. It raises several provocative issues and, ...[more]

... in my opinion, has several thoughtful insights on the subject. However, the author has crafted a preanalysis plan that tailors itself very closely to his own current situation, dramatically limiting the usefulness of this discussion to the field at large. I had hoped to find some sort of scientific approach to how one might select and execute a replication study – and the current write up reflects something far different from that.

If the currently submitted paper is not useful for laying out a defendable scientific scheme for others to follow, again because of the way situational information is called upon so heavily, then this manuscript becomes a simple addendum to the actual replication study of interest (Haurin and Rosenthal, 2007) that was conducted by the author, such that an abbreviated version of this discussion belongs paired with that work, rather than independently published in Economics: The Open-Access, Open-Assessment E-Journal. This becomes even more obvious once one considers the way the selection criteria were specifically crafted around the mentioned special issue’s call for replications. [As a side note, the special issue’s call was referred to multiple times but never actually pinned down. What Journal? Perhaps a link to that call for papers could be included. It is hard to evaluate the author’s choices in the preanalysis plan without seeing the specific constraints of the call.] Most of my comments fall along these lines.


1. I fail to see how the statements made about replication attempts do not apply more generally to essentially all scholarly research projects. Consider the use of statements like “Without prespecification, the amount of flowtime and budget that you could invest in a replication could grow uncontrollably” and “Your budget for a ‘successful’ replication is, most likely, less than that of the Bill & Melinda Gates Foundation”. Replace the word ‘replication’ with ‘study’ and you still have equally true statements.

2. Several arbitrary and/or unnecessary criteria are included in the selection method. For example:

a. The statement about not selecting from the authors own previous replication work is a pure redundancy from the second (more general) criteria, which stated that previously replicated papers were to be selected. By definition, if the author had previously replicated a study then someone had previously replicated it.
b. To exclude work by those with a connection to the current place of employment and those with a personal correspondence history seems completely arbitrary. When I started into the description of how the paper to target was “selected”, I suspected some sort of broad/general criteria that would produce a large population of potential choices that was not subject to obvious biases, and then with that pool some sort of random selection procedure would be applied. [For example, alphabetize the entire list of articles then use a random number generator to select one by position on the list.]
c. The criteria of “A paper that I read within a year prior to the special issue’s call” illustrates the arbitrary nature of this discussion even more clearly than points a and b above. I do not actually know how to interpret this. Is the author really suggesting others follow this approach? How does this ‘selection criteria’ not boil down in the end to picking a paper the individual author is comfortable with and interested in for independent reasons? Again, I am not challenging the idea that this ‘comfortable’ method is a fine way to proceed with picking a study to replicate, but my point is that any ‘comfortable’ and/or ‘clearly non-scientific’ selection method does not merit independent publication as a stand-alone academic contribution.

3. Several arbitrary and/or unnecessary criteria are included in the definition of “success”. For example:

a. Point 4 is quite vague. The author would wait a “prespecified amount of time” (later quantifying as a ‘few’ weeks) and would engage in a “prespecified number of attempts” (never quantifying).
b. Similarly, point 5 mentions a “flowtime of around two months” to do the replication. How is this different from a normal research not just thinking to themselves of their research goals? Would the author’s own life/circumstances that played out during the time in question not be accounted for in a reasonable way? Again, I just don’t see a rigorous scientific contribution here that others could benefit from.
c. Most importantly, point 9 indicates that “If the data that I downloaded was obviously flawed, then I would give up and work on another research paper.” The mind (ok, MY mind) reels upon reading this. I trust the authors of the original Haurin and Rosenthal paper because I have no reason at all not to. I trust the author of this paper for the same reason. Setting that aside, it seems that in the case of a purely doctored research endeavor, this step in the process would actually cause the research conducting a replication to give up when they should not. Assume the original study misrepresented the steps they took to include/exclude observations. When the replicator checked simple things like raw observation counts they would get something quite different and ‘give up’ on that replication.

4. While I understand the researcher, and in turn the journal to which the paper is submitted to, carries a direct interest in the field of Economics. However, Economics has direct connections to other disciplines including Finance, Accounting, Marketing, Political Science, Geography, Sociology, Urban Planning, and many others. [A simple review of the 750+ journals indexed by Econlit substantiates this point.] In every way, the paper was written as if other fields do not exist. For some topics, I could see that an acceptable choice, but in this case is difficult to defend given the position of the hard sciences, medicine, and psychology as being so dramatically ahead of Economics in terms of replication protocol. Since the paper does not shy away from making normative assertions, I’ll make one of my own. Academic research should seek to reflect the current state of knowledge on the investigated topic, regardless of what field has produced the key insights that determine that knowledge. For example, relevant classic research from psychology that considers the key differences between replications and the scientific method in physical sciences relative to social/behavioral sciences is ignored.

Anonymous - Referee#2 Reply
November 25, 2017 - 22:17

See attached.

Anonymous - Referee report 3
November 28, 2017 - 11:48

see attached file

Andrew C. Chang - Referee#3 Reply
December 04, 2017 - 22:58

See attached.

Anonymous - Referee report 4
November 28, 2017 - 11:51

The submitted manuscript both provides a general discussion on the importance of replication plans and how they should be conducted as well as a brief "own" replication plan. I enjoyed reading the manuscript and fully agree with the importance of replication plans.

I will evaluate the manuscript in light ...[more]

... of the four issues the authors were asked to touch in their replication plans.

"(i) a general discussion of principles about how one should do a replication":
I particularly liked the argument that a results-free defense is important to defend replications against (some) original authors. This is, in my view, one of the largest problems in publishing replications: original authors who become referees of their (maybe only partly) replicated papers and do not like to see that they cannot be replicated. I have no remarks on this part.

"(ii) an explanation of why the "candidate" paper was selected for replication":
This is not done awfully convincing. The choice appears quite arbitrary were, eventually, it turns out that the author just chose one paper he is familiar with anyway in order to demonstrate his ideas without a real interest in its replication. One could also debate whether one (!) citation in Google Scholar makes a paper “influential”. But then, this is not the focus of this manuscript, and, therefore, fine. (Also, as of today, the paper has 58 citations so the result is fine even though the criterion is a bit lax). In particular, as it is not required that the replication is actually performed this whole point seems to be of minor importance.

"(iii) a replication plan that applies these principles to the "candidate" article,":
This has been done convincingly.

"(iv) a discussion of how to interpret the results of the replication (e.g., how does one know when the replication study "replicates" the original study).":
In my view, this is the only weak part. One of the author’s three criteria for a replication plan is to "set (...) the set of estimates and the degree of precision that will define a "successful" replication." (lines 3 and 4 of the article). In his proposed replication plan, step 10 on page 8 reads: "I would be "successful" if I was able to replicate the Figures (...) to a reasonable degree of accuracy." This is quite vague and, in my view, does not meet the author’s own criterion.

I understand that it is much easier when it comes to regression coefficients (or sample means, numbers of observations) to pre-specify up to what difference between original and replicated coefficient we can speak of a successful replication. Here, the plan is about replicating figures. Nevertheless this is supposed to be part of the plan and the author might want to try laying out what a "reasonable degree of accuracy" could be and what the figures should look like in order to not speak about a reasonable degree. For sure there will not be an objective metric on what a successful replication of a figure is. Yet, this would be quite an interesting and potentially helpful discussion, in particular for researchers who actually plan to replicate figures.

Andrew C. Chang - Referee#4 Reply
December 04, 2017 - 23:47

See attached.