Software Writing Skills for Your Research – Develop, Publish, Cite, and Get Credit for Scientific Code Event location: GFZ German Research Centre for Geosciences, Potsdam, Germany Event date(s): Three workshops lasting for three days to be scheduled in 2015 Training description: Software has become an integral part of science, yet software is not properly integrated into the scientific discourse. Findings presented in papers are based on data and once in a while they come along with data – but not commonly with software. However, the software used to gain findings plays a crucial role in the scientific work. Nevertheless, software is rarely seen publishable. Thus researchers may not reproduce the findings without the software which is in conflict with the principle of reproducibility in sciences. For both, the writing of publishable software and the reproducibility issue, the quality of software is of utmost importance. For many programming scientists the treatment of source code, e.g. with code design, version control, documentation, and testing is associated with additional work that is not covered in the primary research task. This includes the adoption of processes following the software development life cycle. However, the adoption of software engineering rules and best practices has to be recognized and accepted as part of the scientific performance. Most scientists have little incentive to improve code and do not publish code either with their papers or self-contained because software engineering habits are rarely practised by researchers or students. Software engineering skills are not passed on to followers as for paper writing skill. Thus it is often felt that the software or code produced is not publishable. The quality of software and its source code has a decisive influence on the quality of research results obtained and their traceability. So establishing best practices from software engineering not only adopted but also adapted to serve scientific needs is crucial for the success of software publications. Moreover, a direct release of software like a scientific publication is not possible. Thus the scientific achievements of software and its contributions to sciences are poorly perceived and hardly measurable. The resulting gap in interdisciplinary communication regarding scientific software might be closed by a common understanding of how to handle scientific software with defined processes and appropriate software publications in media other than journals. Even though scientists use existing software and code, i.e., from open source software repositories, only few contribute their code back into the repositories. So writing and opening code for Open Science means that subsequent users are able to run the code, e.g. by the provision of sufficient documentation, sample data sets, tests and comments which in turn can be proven by adequate and qualified reviews. This assumes that scientist learn to write and release code and software as they learn to write and publish papers. Having this in mind, software, which accounts for an increasingly prominent space in research and which has become an indispensable part of science, could be valued and assessed as a contribution to science. But this requires the relevant skills that can be passed to colleagues and followers. The three workshops planned address the passing of these skills to young scientists, the next generation of researchers in the Earth, planetary and space sciences. The writing of code in science following minimal but vital software engineering rules, best practices and processes shall be imparted as fundamental skills. So the three workshops address young scientist with different levels of writing software for their research work: 1 The first workshop addresses novices using spreadsheets or similar tools for data analysis rather than writing code on their own The second workshop addresses intermediate writing a few lines of code for personal use only The third workshop addresses proficient writing multi-page programs which may be shared with others For each workshop it is expected that participants register with describing their research and specifically their need of writing software in their own work thus helping to find solutions. A workshop is an on-site workshop, which lasts for three days and covers the core skills needed to be productive in a small research team, e.g.: Unix shell – How to automate repetitive tasks; Python or R – How to grow a program in a modular, testable way; Git and GitHub, Mercurial – How to track and share work efficiently; and SQL, NoSQL – The difference between structured and unstructured data. Hands-on – Write code, either provided examples or own software Release and Publish Scientific Software – Prepare a release, publish, and mint a DOI EC Policies and Pilots – Why bother Open Access, Open Data, and Open Science in research The shift from Open Access to Open Science – The broader sense of ‘Open’ and implications for workshop participants’ future activities and responsibilities in sciences These topics mainly apply for the novice and intermediate workshops. The workshop for proficient covers individualised topics dependent on the participant information provided in the registration process. Each workshop is run by two instructors and supported by several helpers. Dependent on the level and dependent on the individual components of the different workshops the participants’ number ranges from 15-40. Whereas the workshop for proficient might range from 15-25 participants, the workshop for novice and intermediate might range to 40 participants. However, each workshop closes with a recap session emphasizing why scientist have to care about their software writing skills along with their data wrangling and paper writing skills and why it’s important to pass the relevant skills to colleagues and followers. The role of these skills will be covered in the context of reproducibility in sciences and in the context of publications and what that means for software, especially in the context of policies and pilots for Openness of the European Commission. It is worth to mention, that Earth, planetary and space sciences integrate optionally proper data management plans in research projects as foreseen by H2020’s Data Pilot. So addressing the young and early career scientists with the intended workshops in these disciplines creates awareness for future mandatory measures beyond Open Access and puts responsibility on the next generation of researchers for making science reproducible not only by opening research results in papers and by opening research data but also by opening the software – and it creates awareness for considering the required resources. 2
© Copyright 2026 Paperzz