Python3 for Data Analysis

Python3 for Data Analysis
Running
python
standard python shell
ipython [qtconsole|notebook]
improved interactive shell
python file.py run file.py in batch mode
%run file.py
ipython: run file.py,
stay in interactive mode
Getting Help
interactive Help
help for object
ipython: help for object
ipython: help on magic commands
Official Python documentation: docs.python.org.
Import Syntax, e.g. for π
import math
import math as m
from math import pi
from math *
use:
use:
use:
use:
math.pi
m.pi
pi
pi (apply sparingly)
Basic Types
1
1.0
1+2j
integer
float
complex
None
True, False
’abc’, "abc"
string concatenation
list concatenation
append item to list
add item to set
add key-value pair to dictionary
unpacking
get number of elements
create sorted list
null value
boolean
string (immutable)
if cond : ... else: ...
while cond : ...
for item in iterable: ...
class Foo: ...
def bar(args): ...
try ... except Error : ...
branching
while loop
for loop
class definition
function/method definition
exception handling
Standard Input & Output
input(”foo”)
print(1, 2.0, ”bar”)
print(”foo%d” % 42)
print(”foo{}”.format(42))
read string from stdin
write to stdout
(options: sep, end, file, flush)
old style formatted output
new style formatted output
File Handling
# read from file to string
data = open ( " myfile . txt " ). read ()
# read from file to string list
lines = open ( " myfile . txt " ). readlines ()
Container Types
(1, "x"), ()
[1, "x"], []
{1, "x"}, set()
{1: "y", "x": 20}, {}
tuple (immutable)
list
set (not ordered)
dictionary (not ordered)
Conversion
int(2.3)
int(”10”)
float(2)
float(”4.5”)
str(20)
str(6.7)
list((1, 2))
tuple([1, 2])
set([1, 2])
dict([(1, 5), (2, 6)])
list({1: 5, 2: 6)}.items())
addition
subtraction
multiplication
float division
integer division
power
modulo
=
==
!=
<
<=
>
>=
# example : p r o c e s s i n g a CSV file
for line in open ( " myfile . txt " , " w " ):
tok = line . split ( " ," )
...
# example : writing to a file
f = open ( " myfile . txt " )
f . write ( " foo \ n " )
f . close ()
Indexing
Operators
+
*
/
//
**
%
assignment
equal
unequal
less
less-equal
greater
greater-equal
Numpy Basics
np.array([1, 2, 3])
np.array([[1, 2], [3, 4]])
np.arange(min, max, step)
np.zeros((2, 3))
np.ones((4, 5))
np.zeros(6)
A + B, A - B, A * B, A / B
np.dot(A, B) or A.dot(B)
np.linalg.solve(A, b)
Basic Syntax
To quit use exit() or Ctrl+D.
help()
help(object)
object?
%magic
”foo” + ”bar”
[1, 2] + [3, 4]
l.append(10)
s.add(10)
d[10] = 20
x, (y, z) = 1, (2, 3)
len(container )
sorted(iterable)
and
or
not
seq[0]
seq[-1]
seq[:-1]
seq[start: stop: seq]
select first element
select last element
select elements except the last one
general slicing notation
Comprehensions
in
not in
[expr for item in iterable]
{expr for item in iterable}
{key: val for item in iterable}
list comprehension
set comprehension
dict comprehension
create vector
create matrix
integer sequence
from min to max
create all-zeros matrix
create all-ones matrix
create identity matrix
elementwise operations
matrix multiplication
solve linear system
Pandas Basics
pd.Series
pd.DataFrame
1D data column with index
2D data table with index
# create series
se = pd . Series ([1 , 2 , 3])
se . ix [0] # <= select row by index
se . values # <= get raw data as np . array
# create data frame
df = pd . DataFrame ({ " a " : [1 , 2] , " b " : [3 , 4]})
df [ " a " ]
# <= select column by name
df . values # <= get raw data as np . array
# example : load CSV file & print stats
df = pd . read_csv ( " myfile . txt " , sep = " \ t " )
df . info ()
print ( df . describe ())
for column in df :
print ( column , df [ column ]. nunique ())
Scikit-Learn Basics
Some important classes and functions:
–
–
–
–
–
–
–
sklearn.cluster.KMeans
sklearn.corss validation.{ShuffleSplit, KFold}
sklearn.ensemble.{GradientBoosting*, RandomForest*}
sklearn.linear model.{LogisticRegression, Ridge}
sklearn.neighbors.KNeighbors*
sklearn.metrics.{mean squared error, log loss}
sklearn.preprocessing.StandardScaler
# example : ridge r e g r e s s i o n with cross - v a l i d a t i o n
X = ... # feature matrix ( type : np . array )
y = ... # target vector ( type : np . array )
cv = KFold ( len ( y ) , 3 , shuffle =1 , random_state =42)
scores = []
for tr , te in cv :
cl = Ridge ( alpha =0.1)
cl . fit ( X [ tr ] , y [ tr ])
yhat = cl . predict ( X )
score = m e a n _ s q u a r e d _ e r r o r ( y [ te ] , yhat [ te ])
scores . append ( score )

Download Report

Python3 for Data Analysis

Paperzz.com

Your Paperzz