A few BI libraries

The ArgParse module
Henrike Zschach
About argparse
What’s wrong with sys.argv?
Nothing but argparse can make your life a lot easier. It is the
recommended command-line parsing module in the Python standard
library.
From the documentation:
“The argparse module makes it easy to write user-friendly commandline interfaces. The program defines what arguments it requires, and
argparse will figure out how to parse those out of sys.argv. The
argparse module also automatically generates help and usage
messages and issues errors when users give the program invalid
arguments.”
https://docs.python.org/3.6/library/argparse.html
2
DTU Systems Biology, Technical University of Denmark
About argparse 2
With argparse you can:
- define different types of argument (positional, flag)
- define the internal variable type (string, int, float, ect)
- set a default value
- have required and optional arguments
- have a nice looking help message automatically
3
DTU Systems Biology, Technical University of Denmark
The basic construct
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(“Whatever you want to call it”)
args = parser.parse_args()
The first, second and last lines are required. You technically don’t
need to add arguments to the parser and it will still run, but it
obviously doesn’t make a lot of sense to do that.
4
DTU Systems Biology, Technical University of Denmark
Parsing positional arguments
Positional arguments are usually required for the program to run.
Those could be input file names, accession numbers, numbers
you want to multiply, ect.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("square", help="display a square
of a given number")
args = parser.parse_args()
print args.square**2
Positional arguments can have a type and a default value:
parser.add_argument("square", help="display a
square of a given number", type=int, default=4)
5
DTU Systems Biology, Technical University of Denmark
Multiple arguments
What if you expect several arguments of the same type, f.x. a list
of accession numbers to extract or a list of integers to sum up?
Argparse allows you to specify the number of expected
arguments with nargs .
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("int_list", help="A list of
integers to sum", nargs="+", type=int)
args = parser.parse_args()
print(args.int_list)
print sum(args.int_list)
6
DTU Systems Biology, Technical University of Denmark
Nargs options
Nargs can interpret the following values:
7
Value
Meaning
N
Exactly N arguments
?
0 or 1 arguments
*
0 or more arguments (i.e. optional argument)
+
1 or more arguments
DTU Systems Biology, Technical University of Denmark
Parsing flags
An optional argument, sometimes called a flag, can be used to
switch certain behaviors on or off. For example, you may want
the user to be able to get more detailed output if they ask for it.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("int_list", help="A list of
integers to sum", nargs="+", type=int)
parser.add_argument("--verbose", help="use verbose
output", action="store_true")
args = parser.parse_args()
if args.verbose: print("You supplied the following
integers:", ' '.join([str(x) for x in args.int_list]))
print (sum(args.int_list))
8
DTU Systems Biology, Technical University of Denmark
Parsing flags 2
Flags can have a long and a short version of their name.
parser.add_argument("-v", "--verbose", help="use
verbose output", action="store_true")
The internal name of the flag does not have to be the same as
the external name (the name the user will be using):
prog.py 1 2 3 4 –v
parser.add_argument("-v", "--verbose", dest= "long_output",
help="use verbose output", action="store_true")
#every time we want to refer to the verbosity in our
program, we will now call args.long_output
if args.long_output: print("You supplied the following
integers:", ' '.join([str(x) for x in args.int_list]))
9
DTU Systems Biology, Technical University of Denmark
Further reading
https://pymotw.com/2/argparse/
https://docs.python.org/3/howto/argparse.html#id1
https://docs.python.org/3/library/argparse.html#moduleargparse
10
DTU Systems Biology, Technical University of Denmark
The Numpy library
Henrike Zschach
11
DTU Systems Biology, Technical University of Denmark
About numpy
Why numpy?
Numpy is a powerful library that features many kinds of
mathematical manipulations and is useful for working with data in
tables or matrices. The big advantage of numpy is that many
calculations people commonly use are already implemented in a very
efficient way, for example matrix multiplication.
All your numpy needs:
http://www.numpy.org/
12
DTU Systems Biology, Technical University of Denmark
The n-dimensional array
The primary data structure of numpy is the n-dimensional array
(ndarray).
“It is a table of elements (usually numbers), all of the same type,
indexed by a tuple of positive integers. In NumPy dimensions are
called axes. The number of axes is rank.”
https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
ndarrays have a shape attribute that describes how many dimension
they have and how long those are.
You can construct empty array structures and then fill them with data,
by using either 'zeros' or 'empty'. You can also read data directly into
the array (see next slide)!
new_array = np.zeros((2, 2))
#what is the shape of my new ndarray?
new_array.shape
13
DTU Systems Biology, Technical University of Denmark
Reading in data
There are several ways to read in data in numpy, but the most
common is from a text file into an ndarray. The text file can have any
kind of separator, tab, comma, semicolon, ect.
import numpy as np
data = np.loadtxt(infile, delimiter=',')
infile can be a filename, an open file handle or even a generator.
loadtxt will return an ndarray object.
The dimensions of your original data are preserved but the source
file needs to have the same number of items in each row.
15
DTU Systems Biology, Technical University of Denmark
Multi-dimensional slicing
ndarrays can be sliced like lists in regular python, but it is a bit
more complicated since they have more dimensions. Here are
some pointers from the documentation:
- The basic slice syntax is i:j:k where i is the starting index, j is
the stopping index, and k is the step.
- All of the above can be negative (starting counting from the
back). If k is negative, we move through the sequence from
the end to the start.
- Ellipsis expand to the number of : objects needed to make a
selection tuple of the same length as x.ndim. There may only
be a single ellipsis present.
- The dimensions are separated by comma in the index.
16
DTU Systems Biology, Technical University of Denmark
Multi-dimensional slicing Examples
Here are some useful examples of array slicing, taken from:
https://www.tutorialspoint.com/numpy/numpy_indexing_and_sli
cing.htm
#init our array
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
#extract second column:
print(a[...,1])
#in a 2d array, the above is equivalent to:
print(a[:,1])
#extract second row:
print(a[1,:])
#extract from the second column onwards:
print(a[:,1:])
17
DTU Systems Biology, Technical University of Denmark
Array operations
Arithmetic operators on arrays apply elementwise. A new array is
created and filled with the result. If one of the involved objects is a
constant, the desired operation will be performed between the
constant and each array element.
a = np.array( [20,30,40,50] )
c = 3
new_array = a * c
#What is the shape and content of new_array?
b = np.array( [0,1,2,3] )
my_diff = a-b
my_sq = b**2 #b squared
#what are my_diff and my_sq?
18
DTU Systems Biology, Technical University of Denmark
Array operations 2
Boolean operators can also be applied to ndarrays:
a = np.array( [20,30,40,50] )
big_nums = a>35
#What is the shape and content of big_nums?
#How do I get only the elements of a that are greater
than 35 in a new array?
Dot product (matrix multiplication):
np.dot(a, b) #dimensions of a and b have to be
suitable of course
Transposing:
np.transpose(a)
19
DTU Systems Biology, Technical University of Denmark
Reshape
The reshape operation can be used to manipulate the ndarray’s
dimensions, but without changing the overall size. The ndarray must
still be a regular matrix afterwards.
d = np.array([[1,2,3,4],[5,6,7,8], [9,10,11,12]])
print(d)
d_new = d.reshape((4,3))
print(new_d)
#What has changed?
20
DTU Systems Biology, Technical University of Denmark
21
DTU Systems Biology, Technical University of Denmark
22
DTU Systems Biology, Technical University of Denmark
23
DTU Systems Biology, Technical University of Denmark
24
DTU Systems Biology, Technical University of Denmark

Download Report

A few BI libraries

Paperzz.com

Your Paperzz