programming_skills

ADVANCED
PROGRAMMING SKILLS
Sascha Meiers, 25th Nov 15, EMBL-EBI
Overview

Choice of programming language

Good programming practice

A few “advanced” programming concepts
 List
comprehension
 Generators
Choice of a programming language
Number crunching,
statistics, nice plots?
Reusable, robust
and fast programs
R
Parsing information
from text files
Perl
Python
Java
C++
?
PHP
Typical tasks
in genomics
Bash
Javascript
Anything in a
browser
Run a series of
programs after
another?
Programming time
slow
Choice of a programming language
Compiling languages
C
C++
Java
Perl Interpreter languages
Javascript
fast
Python
Bash
fast
Run time of the program
slow
What I use in my daily work: 50% Python, 20% Bash, 20% R, 10% C++
Good programming practice





Code conventions
Documentation
Version control
Tests
Modularization (and avoiding repeat code)
Code conventions







Max. 80 characters per line
Indentation (partly forced in Python)
Consistent use of tabs or spaces
Consistent variable/function names, e.g.
lower_case_with_underscores or camelCaseNames
Break long text over multiple lines
Self-explanatory variable names
For Python: “Pythonic” programming
For a Python guideline, see https://www.python.org/dev/peps/pep-0008/
Bad example

d
What do these variables mean?
What is going on here? (no comments)
Ugly indentation
Long lines
Better

d
“pythonic” equivalent to
try…except for opening
a file
“demultiplexing”
Read a file containing
sequencing reads from
multiple experiments, that
can be distinguished by
their first few bases
(barcode)
Documentation



Inline comments
“Notebooks” that are a mix of code and
documentation (e.g. iPython or Jupyter)
Functions/Classes etc. can be nicely documented
with a block comment
 This

can even be read by Python’s help function
Argparse to “document” input and parameters of
your programs
Function documentation
…
Program description in argparse
Parse the given arguments. If
they don’t match the
requirements, an error message
will be shown and the program
stops here
Require 1 argument,
the name of the
fastq file – and
document what kind
of input is expected
ArgumentParser
containing a
description of the
program
Version control



Tools like SVN or Git to keep track of changes
Repositories like github.com or git.embl.de as
external backup and means of publication
At least: version info in your script (e.g. in arparse)
Tests

Be sure your program does the right thing by
writing small test cases
 e.g.
is there an alternative method (e.g. for
intermediate steps) that you can compare to?

Deal with expections and border cases?
 File
missing? (try…exept)
 What if file content is not as expected?
Make code re-usable

When a piece of code turns out to be useful, invest
the time to make it re-usable
 Generalize,
document, modularize
 Opposite scenario: copy code
Generalized function…
File and list of
barcodes as
arguments
Results will be
returned instead of
printed
…and how to call it
Read fastq file from
command line
Read barcodes from
command line
Give them to the
function
Output results
Good programming practice

Do you always follow these rules?
Importance of
good
programming
practice
Tests for
your actual
code
Just trying
something
quickly
Reproducible
analysis for
a paper
“negative
controls”
The more people (should) see your code
Publish a
method
“Advanced” programming concepts








Recursion
Dynamic programming
Functional programming
Regular expressions
Object-orientation
Data structures
Streaming algorithms
…
Functional programming


Realizes the mathematical concept of functions
You declare the combination of functions to be
applied to the input
 In
contrast, in the imperative paradigm you describe the
order of statements to be executed

Functions depend on their arguments only
…

not on the external state. There are no side effects
You don’t care about how it gets computed
 Focus

only on what you want to find out
There are purely functional languages (e.g. Haskell)
 But
most languages adopt only a few concepts from it
List comprehension
Streaming algorithms


Idea: data is not present all at once, but is streamed
through
Needed when available memory is limited
 E.g.


data too big to fit into memory
You might know the concepts from Linux pipes
Depends on the problem
 E.g
to sort data, all data must have been read*
 Filtering data can be done online (while streaming)
 Stream large file instead of reading all at once
Python generators
Thank you for your attention