ADVANCED PROGRAMMING SKILLS Sascha Meiers, 25th Nov 15, EMBL-EBI Overview Choice of programming language Good programming practice A few “advanced” programming concepts List comprehension Generators Choice of a programming language Number crunching, statistics, nice plots? Reusable, robust and fast programs R Parsing information from text files Perl Python Java C++ ? PHP Typical tasks in genomics Bash Javascript Anything in a browser Run a series of programs after another? Programming time slow Choice of a programming language Compiling languages C C++ Java Perl Interpreter languages Javascript fast Python Bash fast Run time of the program slow What I use in my daily work: 50% Python, 20% Bash, 20% R, 10% C++ Good programming practice Code conventions Documentation Version control Tests Modularization (and avoiding repeat code) Code conventions Max. 80 characters per line Indentation (partly forced in Python) Consistent use of tabs or spaces Consistent variable/function names, e.g. lower_case_with_underscores or camelCaseNames Break long text over multiple lines Self-explanatory variable names For Python: “Pythonic” programming For a Python guideline, see https://www.python.org/dev/peps/pep-0008/ Bad example d What do these variables mean? What is going on here? (no comments) Ugly indentation Long lines Better d “pythonic” equivalent to try…except for opening a file “demultiplexing” Read a file containing sequencing reads from multiple experiments, that can be distinguished by their first few bases (barcode) Documentation Inline comments “Notebooks” that are a mix of code and documentation (e.g. iPython or Jupyter) Functions/Classes etc. can be nicely documented with a block comment This can even be read by Python’s help function Argparse to “document” input and parameters of your programs Function documentation … Program description in argparse Parse the given arguments. If they don’t match the requirements, an error message will be shown and the program stops here Require 1 argument, the name of the fastq file – and document what kind of input is expected ArgumentParser containing a description of the program Version control Tools like SVN or Git to keep track of changes Repositories like github.com or git.embl.de as external backup and means of publication At least: version info in your script (e.g. in arparse) Tests Be sure your program does the right thing by writing small test cases e.g. is there an alternative method (e.g. for intermediate steps) that you can compare to? Deal with expections and border cases? File missing? (try…exept) What if file content is not as expected? Make code re-usable When a piece of code turns out to be useful, invest the time to make it re-usable Generalize, document, modularize Opposite scenario: copy code Generalized function… File and list of barcodes as arguments Results will be returned instead of printed …and how to call it Read fastq file from command line Read barcodes from command line Give them to the function Output results Good programming practice Do you always follow these rules? Importance of good programming practice Tests for your actual code Just trying something quickly Reproducible analysis for a paper “negative controls” The more people (should) see your code Publish a method “Advanced” programming concepts Recursion Dynamic programming Functional programming Regular expressions Object-orientation Data structures Streaming algorithms … Functional programming Realizes the mathematical concept of functions You declare the combination of functions to be applied to the input In contrast, in the imperative paradigm you describe the order of statements to be executed Functions depend on their arguments only … not on the external state. There are no side effects You don’t care about how it gets computed Focus only on what you want to find out There are purely functional languages (e.g. Haskell) But most languages adopt only a few concepts from it List comprehension Streaming algorithms Idea: data is not present all at once, but is streamed through Needed when available memory is limited E.g. data too big to fit into memory You might know the concepts from Linux pipes Depends on the problem E.g to sort data, all data must have been read* Filtering data can be done online (while streaming) Stream large file instead of reading all at once Python generators Thank you for your attention
© Copyright 2026 Paperzz