Barbara Jasny Deputy Editor, Em Science [email protected] A Brief History of Science Science was founded in 1880 on $10,000 of seed money from Thomas Edison What is Different Now • • • • Big data Computer modeling Team and interdisciplinary science Increased ability to study questions of societal significance • Increased ability to make predictions/approach causality • Increased pressures for funding/tenure etc. • Increased volume of scientific information published Where are we now? >100,000 Individual Subscribers ~1 million online Readers >13,000 submissions, ~1000 published Spectrum of Reproducibility* • Low End (minimum standard) Repeatability: Another group can access the data, analyze it using the same methodology, and obtain the same result. • High End (gold standard) Replication: The study is repeated start to finish, including new data collection and analysis, using fresh materials and reagents, and obtain the same result. *Ioannidis and Khoury, Science, Special Issue onData Replication & Reproducibility, 334, December 2011. What Can Go Wrong? • The system under investigation maybe more complex than previously thought, so that the experimenter is not actually controlling all independent variables. THIS IS HOW SCIENCE EVOLVES • Authors may not have divulged all of the details of a complicated experiment, making it irreproducible by another lab. • Through random chance, a certain number of studies will produce false positives. Authors need to set appropriately stringent significance tests for their results • Publication bias Where are the problems coming from? • Insufficient student training eg- experimental design, statistics • Pressure to publish, renew grants, and get promoted—(Difficulties in sharing) • Industry researchers must promote the goals of their company. (Difficulties in sharing) • Editors and reviewers for journals and grants, under space and time pressure and in a situation where community standards are evolving, may not be maintaining sufficiently high standards • Studies were underfunded from the start 2014 Workshop at the Center for Open Science • TOP (Transparency and Openness Promotion) standards published in Science • Now 753 journals representing 63 organizations have signed on TOP Guidelines B. A. Nosek et al. Science 2015;348:1422-1425 Published by AAAS What if Reproducibility/Replicability Aren’t Options? Michael Tomasello, and Josep Call Science 2011;334:1227-1228 www,nasa.gov Second Arnold Workshop: Data Sharing in the Field Sciences 2015 • Establish metadata standards – Fund data repositories and support data professionals • Education: importance of quality control • Culture changes to: – Relinquish ownership of data – Start treating data as citable objects – Liberating field science samples and data M. McNutt et al. Science 04 Mar 2016 Third Arnold Workshop: Code and Computational Methods 2016 • Access to data is of little use without having code used to process data to derive results • Need standards for accessibility, interoperability, attribution Ideally: Share data, software, workflows, and details of the computational environment in open repositories. Persistent links should appear in the published article and include permanent identifier for data, code, and digital artifacts Science Policy: Data Must Be Available —in SM or Archived “Data and materials availability: All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. After publication, all reasonable requests for materials must be fulfilled.” There are still some exceptions----Any restrictions on the availability of data or materials, including fees and original data obtained from other sources (Materials Transfer Agreements), must be disclosed to the editors upon submission. “ It’s still complicated 14 What Needs to be Shared? • Exposure to ideologically diverse news and opinion on Facebook Eytan Bakshy, Solomon Messing Lada Adamic • Science 05 Jun 2015: Vol. 348, Issue 6239, pp. 1130-1132 DOI: 10.1126/science.aaa1160 Bakshy et al. • The following code and data are archived in the Harvard Dataverse Network, http://dx.doi.org/10.7910/DVN/LDJ7MS, “Replication Data for: Exposure to Ideologically Diverse News and Opinion on Facebook”; R analysis code and aggregate data for deriving the main results (e.g., Table S5, S6); Python code and dictionaries for training and testing the hard-soft news classifier; Aggregate summary statistics of the distribution of ideological homophily in networks; Aggregate summary statistics of the distribution of ideological alignment for hard content; shared by the top 500 most shared websites. What Needs to be Shared • Unique in the shopping mall: On the reidentifiability of credit card metadata Yves-Alexandre de Montjoye1,*, Laura Radaelli2, Vivek Kumar Singh1,3, Alex “Sandy” Pentland • Science 30 Jan 2015: Vol. 347, Issue 6221, pp. 536-539 DOI: 10.1126/science.1256297 de Montjoye et al. • • • • • • • • • For contractual and privacy reasons, we unfortunately cannot make the raw data available. Upon request we can, however, make individual-level data of gender, income level, resolution (h, v, a), and unicity (true, false), along with the appropriate documentation, available for replication. This allows the re-creation of Figs. 2 to 4, as well as the GLM model and all of the unicity statistics. A randomly subsampled data set for the four points case can be found at http://web.media.mit.edu/~yva/ uniqueintheshoppingmall/ When Release is Against Public Safety/Interest • Heads-up limit hold’em poker is solved Michael Bowling, Neil Burch, Michael Johanson, Oskari Tammelin • Science 09 Jan 2015: Vol. 347, Issue 6218, pp. 145-149 DOI: 10.1126/science.1259433 Bowling et al. • As heads-up no-limit Texas hold’em is commonly played online for high stakes, the scientific benefit of releasing source code must be balanced with the potential for it to be used for gambling purposes. As a compromise, an implementation of the DeepStack algorithm for the toy game of nolimit Leduc hold’em is available at https://github.com/lifrordi/DeepStack-Leduc. Reproducibility as it Affects Industry/Academia Partnerships Fourth Arnold Workshop - 2016 Current Problems Widely accepted that academic standards are significantly lower than industry Venture firms and Biopharma replicate before investing- 2-6 researchers, 1-2 years, $500K- 2 million Proprietary/IP concerns Agreements can restrict rights of academics and/or delay publication Privacy issues—data can be linked to provide unexpected info about you and your network Recommendations • Michael Rosenblatt (Chief Medical Officer, Flagship Ventures published SciTM April) –incentives-based approach, where industry provides incentives if universities guarantee research. • Universities could do random quality assurance checks of faculty— test if auditing makes a difference in practice and if journals see improvements. • Crowdsourcing to check data deposition. Must be seen as helpful – not punitive • Form a working group to develop a toolkit for establishing standards for partnerships. • Organize high level meeting to examine ways to sustain existing databases and build new ones • Promote better education of students and faculty regarding reproducibility and data sharing— experimental design Industry-Academia Agreements • • • • • • • What data are needed? What will be published? What happens to data an academic produces with industry data? What approvals are needed for publication/ speaking? What will be proprietary/trade secrets? When will data be released? Can released data be used to extend research findings or only reproduce them? If Replication is a Public Good--• How should replication projects be rewarded? • Do we have a journal of replication? • Do people post replications to their blog? • Who publishes negative results? Data Policies of Elsevier • https://www.elsevier.com/abo ut/ourbusiness/policies/researchdata • Is it compulsory to share my research data? • No. Our policy is clear in that we encourage and support authors to share their research data rather than mandating them to do so ----Where there is community support for (often discipline-specific) mandates regarding data deposit, submission and sharing, some of our journals may reflect this with their own mandatory data sharing policies. Quarterly Journal of Political Science • Authors of empirical papers may be asked to supply a replication data set for editors or referees. Upon acceptance of a manuscript, authors will be required to submit a replication dataset/archive prior to publication. The dataset, documentation, command files, etc. will be reviewed in-house and made available at this site coincident with publication. Online appendices, if any, will be handled similarly.
© Copyright 2026 Paperzz