s-1 Replication in Second Language Research:
s-2 Narrative and Systematic Reviews and Recommendations for the Field
s-3 Replication studies are considered by many to play a fundamental role in any scientific endeavor.
s-4 When using the same materials and procedures as a previous study, replication studies serve to test the reliability of the previous study’s findings.
s-5 When altering specific methodological or participant characteristics of a previous study, they serve to test generalizability of the earlier findings under different conditions.
s-6 One indication of the importance of replication is found in the 50 or more calls for replication research in the field of second language (L2) research alone (see references for 50 calls and commentaries in Appendix S1 in the Supporting Information online): from Santos (1989) through Polio and Gass 1997 to very recent proposals for specific replication studies, such as Vandergrift and Cross 2017 and even a book-length treatment (Porte, 2012).
s-7 Beyond these calls, efforts to actively promote and facilitate replication studies have also emerged.
s-8 For example, the Instruments for Research into Second Languages (IRIS) repository ( was established in 2011 and holds, at the time of writing, over 3,800 materials that can be used for replication, among other purposes, in L2 research (Marsden & Mackey, 2014; Marsden, Mackey, & Plonsky, 2016).
s-9 The Open Science Framework (, also established in 2011, provides a web infrastructure to facilitate collaboration and has been used for large replication efforts in psychology (e.g., Open Science Collaboration, 2015), which continue to make waves in academia (Laws, 2016; Lindsay, 2015; Martin & Clarke, 2017) and the general media (Baker, 2015; Devlin, 2016).
s-10 In some fields, a flourishing metascience, that is, the scientific study of science (see Munafò et al., 2017), has included syntheses assessing the quantity and nature of replication efforts, for example, in education (Makel & Plucker, 2014 and in psychology (Makel et al., 2012).
s-11 The driving force behind this battery of calls, commentaries, infrastructure, and metascience is a perceived crisis in the state of replication research.
s-12 The severe concerns underpinning the alleged crisis have several dimensions relating to: (a) the (small) amount of published replication research; (b) the (poor) quality of replication research; and (c) the (lack of) reproducibility, which refers to the extent to which findings can (not) be reproduced in replication attempts that have been undertaken.
s-13 These concerns speak to the very core of science, raising fundamental questions about the validity and reliability of our work.
s-14 Indeed, some commentators have called replication the gold standard of research evidence (Jasny, Chin, Chong, & Vignieri, 2011, p. 1225) and a linchpin of the scientific process (Let’s replicate, 2006, p. 330).
s-15 In the field of L2 research, given the importance of replication and the 50 calls for replication in L2 research that we identified, one might expect a substantial number of published replication studies by now.
s-16 However, a perceived lack of prestige, excitement, and originality of replication plagues L2 research (Porte, 2012), as it does other disciplines (Berez-Kroeker et al., 2017; Branco, Cohen, Vossen, Ide, & Calzolari, 2017; Chambers, 2017; Schmidt, 2009), and these perceptions are thought to have caused, at least in part (directly or indirectly), alleged low rates and a poor quality of published replication studies.
s-17 However, a systematic metascience on replication research has not yet been established in the field of L2 research, leaving a poor understanding of the actual number and nature of replication studies that have been published.
s-18 The current study begins to address this gap through narrative and systematic reviews.
s-19 The narrative review considers challenges in replication research and is largely informed by commentaries and metascience from psychology, given that the cognitive and social subdomains of psychology are highly influential in L2 research, and also from education, another key sister discipline.
s-20 The narrative review is organized around four broad themes: (a) the quantity of replication research, (b) the nature of replication research, (c) the relationship between initial and replication studies, and (d) the interpretation and extent of reproducibility of the findings of initial studies.
s-21 To gain insight into these issues in the context of L2 research, the systematic review provides a synthesis of L2 studies in journal articles that self-labeled as replications.
s-22 The research questions and methods of the systematic review were largely determined by the narrative review but also emerged through the design and piloting of the coding instrument.
s-23 Finally, we offer further discussion and 16 recommendations for future replication work that draw on our narrative and systematic reviews and on our experience of carrying out multisite (Morgan-Short et al., 2018) 1 and single site (Faretta-Stutenberg & Morgan-Short, 2011; Marsden, Williams, & Liu, 2013; McManus & Marsden, 2017; Morgan-Short, Heil, Botero-Moriaty, & Ebert, 2012) replications.
s-24 We start from the widely agreed premise that testing the reproducibility of findings should have an essential role in the testing and refinement of theory, at least for hypothesis-testing epistemologies that seek to ascertain generalizability and for other epistemologies in which constructs are deemed to be definable and observable.
s-25 Thus, our overall aim is to provide conceptual clarification and an empirical base for future discussion and production of replication studies, with a view to improving the amount and quality of L2 replication research.

