proposal - sciforge

Software Writing Skills for Your Research –
Develop, Publish, Cite, and Get Credit for Scientific Code
Event location: GFZ German Research Centre for Geosciences, Potsdam, Germany
Event date(s): Three workshops lasting for three days to be scheduled in 2015
Training description:
Software has become an integral part of science, yet software is not properly integrated into
the scientific discourse. Findings presented in papers are based on data and once in a while
they come along with data – but not commonly with software. However, the software used
to gain findings plays a crucial role in the scientific work. Nevertheless, software is rarely
seen publishable. Thus researchers may not reproduce the findings without the software
which is in conflict with the principle of reproducibility in sciences. For both, the writing of
publishable software and the reproducibility issue, the quality of software is of utmost
importance. For many programming scientists the treatment of source code, e.g. with code
design, version control, documentation, and testing is associated with additional work that is
not covered in the primary research task. This includes the adoption of processes following
the software development life cycle. However, the adoption of software engineering rules
and best practices has to be recognized and accepted as part of the scientific performance.
Most scientists have little incentive to improve code and do not publish code either with
their papers or self-contained because software engineering habits are rarely practised by
researchers or students. Software engineering skills are not passed on to followers as for
paper writing skill. Thus it is often felt that the software or code produced is not publishable.
The quality of software and its source code has a decisive influence on the quality of
research results obtained and their traceability. So establishing best practices from software
engineering not only adopted but also adapted to serve scientific needs is crucial for the
success of software publications.
Moreover, a direct release of software like a scientific publication is not possible. Thus the
scientific achievements of software and its contributions to sciences are poorly perceived
and hardly measurable. The resulting gap in interdisciplinary communication regarding
scientific software might be closed by a common understanding of how to handle scientific
software with defined processes and appropriate software publications in media other than
journals.
Even though scientists use existing software and code, i.e., from open source software
repositories, only few contribute their code back into the repositories. So writing and
opening code for Open Science means that subsequent users are able to run the code, e.g. by
the provision of sufficient documentation, sample data sets, tests and comments which in
turn can be proven by adequate and qualified reviews. This assumes that scientist learn to
write and release code and software as they learn to write and publish papers.
Having this in mind, software, which accounts for an increasingly prominent space in
research and which has become an indispensable part of science, could be valued and
assessed as a contribution to science. But this requires the relevant skills that can be passed
to colleagues and followers.
The three workshops planned address the passing of these skills to young scientists, the next
generation of researchers in the Earth, planetary and space sciences. The writing of code in
science following minimal but vital software engineering rules, best practices and processes
shall be imparted as fundamental skills. So the three workshops address young scientist with
different levels of writing software for their research work:
1
 The first workshop addresses novices using spreadsheets or similar tools for data
analysis rather than writing code on their own
 The second workshop addresses intermediate writing a few lines of code for personal
use only
 The third workshop addresses proficient writing multi-page programs which may be
shared with others
For each workshop it is expected that participants register with describing their research and
specifically their need of writing software in their own work thus helping to find solutions.
A workshop is an on-site workshop, which lasts for three days and covers the core skills
needed to be productive in a small research team, e.g.:
 Unix shell – How to automate repetitive tasks;
 Python or R – How to grow a program in a modular, testable way;
 Git and GitHub, Mercurial – How to track and share work efficiently; and
 SQL, NoSQL – The difference between structured and unstructured data.
 Hands-on – Write code, either provided examples or own software
 Release and Publish Scientific Software – Prepare a release, publish, and mint a DOI
 EC Policies and Pilots – Why bother Open Access, Open Data, and Open Science in
research
 The shift from Open Access to Open Science – The broader sense of ‘Open’ and
implications for workshop participants’ future activities and responsibilities in
sciences
These topics mainly apply for the novice and intermediate workshops. The workshop for
proficient covers individualised topics dependent on the participant information provided in
the registration process. Each workshop is run by two instructors and supported by several
helpers. Dependent on the level and dependent on the individual components of the different
workshops the participants’ number ranges from 15-40. Whereas the workshop for proficient
might range from 15-25 participants, the workshop for novice and intermediate might range
to 40 participants.
However, each workshop closes with a recap session emphasizing why scientist have to care
about their software writing skills along with their data wrangling and paper writing skills
and why it’s important to pass the relevant skills to colleagues and followers. The role of
these skills will be covered in the context of reproducibility in sciences and in the context of
publications and what that means for software, especially in the context of policies and pilots
for Openness of the European Commission.
It is worth to mention, that Earth, planetary and space sciences integrate optionally proper
data management plans in research projects as foreseen by H2020’s Data Pilot. So
addressing the young and early career scientists with the intended workshops in these
disciplines creates awareness for future mandatory measures beyond Open Access and puts
responsibility on the next generation of researchers for making science reproducible not only
by opening research results in papers and by opening research data but also by opening the
software – and it creates awareness for considering the required resources.
2