Data Analysis for Physics and Astronomy with Python Prof. Joanna Kiryluk Stony Brook University Spring 2017 Semester Course web site: http://skipper.physics.sunysb.edu/~joanna/Lectures/PHY390/ Target audience: Freshman/Sophomore students (Physics & Astronomy majors) Meeting Schedule: The class will meet twice a week. There will be one 80-minute long session on Data Analysis every week (Tuesdays 8.30am, Math SINC site S-235 -TBD), and one 80-minute long session on Python programming every week (Thursdays 8.30am Math SINC site S-235) Office hours: (by Joanna Kiryluk and by TA Anthony Catanese) We’ll find time which works for everybody. Homework: will be posted weekly. 1st homework will be posted on Thursday February 2nd. Exams: There will be two midterm exams, one on Data Analysis methods, and one on Python programming. There will be one (final) take home project. Students will analyze a data set provided to them by using data analysis techniques, writing a Python computing program and writing a report (in Latex). Motivation a)Data Analysis •Expand on data analysis methods, which students learn during introductory labs •Prepare Physics and Astronomy major students for more advanced labs such as PHY252 •Teach students modern tools which are useful for performing data analysis in physics labs & research Motivation b) Python programming •Teaching a programming language has educational value • Important to expose Physics and Astronomy students to programming as early as possible • Students can work on their laptops (free & fast installation), can be done on Windows (freshman students preference), Mac and Linux. •Teach a modern programming language at a basic level with direct data analysis applications • Object oriented language, but easier than C/C++ . “Python reads like kindergarten math and is easy on the layman’s eye. It requires less code to complete basic tasks, making it an economical language to learn.” • Tools exist for integrating C/C++ and Fortran code This course will have essentially no overlap with PHY277 and will benefit students who take PHY277 Source: https://xkcd.com/353/ In-demand programming language Data Scientist: average 1. Easy-to-Learn: Python was designed with the newcomer in mind. Python reads like kindergarten math and is easy on the layman’s eye. Python also requires less code to complete basic tasks, making it an economical language to learn. 2. Your Stepping Stone Python can be your stepping stone into the programming universe. Employers are looking for fully stacked programmers and Python will help you get there. Python is an object-oriented language, just like Javascript, C++, C#, Perl, Ruby, and other key programming languages. 3. How About Some Raspberry Pi? It is a card-sized, inexpensive microcomputer that is being used for a surprising range of exciting do-ityourself stuff such as robots, remote-controlled cars, and video game consoles. With Python as its main programming language, the Raspberry Pi is being used even by kids to build radios, cameras, arcade machines, and pet feeders! Lecture Plan q Part A (Tuesdays) Data Analysis q Part B (Thursdays) Python Computing Part A: Introduction to Data Analysis 1. Introduction: what is a measurement, random and systematic uncertainties 2. Data characteristics: distribution, mean and variance 3. Graphic representation of data: histograms, plots, linear and logarithmic scales 4. Statistics: binominal, Poisson and Gaussian probability distributions 5. Central Limit Theorem 6. The meaning of sigma 7. Partial differentiation, propagation of small uncertainties 8. Covariance and correlation 9. Least squares method 10. Combining results of different experiments, weighted averages 11. Straight line fit 12. Parameter and distribution testing and comparing results: test 3 sigma, chi-squared test, p-values, confidence levels Textbooks: We’ll use examples and problems from both books My preference is the book by Lyons (shorter) Part B: Python Programming for Data Analysis: 1.Python from scratch: a. Installation and setup b. IPython: An Interactive Computing and Development Environment c. Variables, basic math, types of data, input, print formatting and strings d. Decisions, loops, lists, functions, objects, modules e. Pandas, data structures f. Data files: input and output, file formats g. Data wrangling: Clean, transform, merge, reshape h. Plotting and Visualization Textbook: + python textbook or online tutorial (covered in lectures) E.g. https://docs.python.org/2/tutorial/index.html http://www.greenteapress.com/thinkpython/thinkCSpy/thinkCSpy.pdf Part B: Python Programming for Data Analysis: 1.Python from scratch: a. Installation and setup b. IPython: An Interactive Computing and Development Environment c. Variables, basic math, types of data, input, print formatting and strings d. Decisions, loops, lists, functions, objects, modules e. Pandas, data structures f. Data files: input and output, file formats g. Data wrangling: Clean, transform, merge, reshape h. Plotting and Visualization 2.Data analysis modules a. SciPy Basics b. NumPy Basics http://www.scipy.org/index.html Textbook: Part B: Python Programming for Data Analysis: 1.Python from scratch: a. Installation and setup b. IPython: An Interactive Computing and Development Environment c. Variables, basic math, types of data, input, print formatting and strings d. Decisions, loops, lists, functions, objects, modules e. Pandas, data structures f. Data files: input and output, file formats g. Data wrangling: Clean, transform, merge, reshape h. Plotting and Visualization 2.Data analysis modules a. SciPy Basics b. NumPy Basics http://www.numpy.org Textbook: Part B: Python Programming for Data Analysis: 1.Python from scratch: a. Installation and setup b. IPython: An Interactive Computing and Development Environment c. Variables, basic math, types of data, input, print formatting and strings d. Decisions, loops, lists, functions, objects, modules e. Pandas, data structures f. Data files: input and output, file formats g. Data wrangling: Clean, transform, merge, reshape h. Plotting and Visualization 2.Data analysis modules a. SciPy Basics b. NumPy Basics 3.Data analysis report a. Latex Textbook: S-235 https://it.stonybrook.edu/help/kb/sinc-site-general-policies Learning Python for Data Analysis and Visualisation 1. Anaconda (open source) – high performance python distribution o recommended installer for IPython/Jupyter, Pandas, SciPy,.. o installation using conda (package manager) Virtual SINC Site: Ø anaconda has been installed & ready for use Your laptop/computer: Ø Download anaconda2 (Windows/OSX/Linux) https://www.continuum.io/downloads and install it (YOU CAN DO IT WITH HELP OF YOUR TA – office hours) DO NOT INSTALL THE LATEST version3 We’ll use anaconda2 version with Python 2.7 (textbook requires it) Ø Anaconda2 includes Python 2.7 Ø Anaconda3 includes Python 3.5 Python2 versus Python3 This class Examples of differences http://ptgmedia.pearsoncmg.com/imprint_downloads/informit/promotions/python/python2python3.pdf WINDOWS (this class). You can access it off-site, e.g. from your laptop, independently of your operating system, just use your web browser and go to: https://it.stonybrook.edu/services/virtual-sinc-site Needed: NetID login, Citrix (if accessing from outside SINC classrooms e.g. on your laptop, it will need to be installed on your laptop) Launch Virtual SINC Site Desktop & start Python IDLE (Programming & Development -> Python 2.7 -> Python IDLE) IDLE is Python’s Integrated Development and Learning Environment. IDLE: cross-platform: works mostly the same on Windows, Unix, and Mac OS X IDLE: o Python shell window (interactive interpreter) with colorizing of code input, output, and error messages Lecture Plan q Python Computing (Thursdays) q Data Analysis (Tuesdays) Lecture DA1 Why do we do experiments ? Introduction to Data Analysis Data samples, histograms, means, RMS, standard deviations L. Lyons, "A Practical Guide to Data Analysis for Physical Science Students” – chapter 1: Sections 1.1, 1.2, 1.3 (compact) OR J.R. Taylor “An introduction to Error Analysis” – chapter 1, 22 chapter 2 Sections 2.1 and 2.2 Why do we do experiments? Two types of experiments to learn about the physical world: R.Muller, “The Instant Physicist” § parameter determination e.g. measure body temperature § hypothesis testing e.g. testing whether body temperature increased since this morning The numerical value of the quantity we want to measure is not enough Our conclusion, e.g.“We have made a world shattering discovery!” depends on the accuracy of our measurement. Acceleration due to gravity Measurement 1: Measurement 2: “True” value= ? Are measurements 1 and 2 consistent with the “true” value? https://en.wikipedia.org/wiki/Gravitational_acceleration Measurement of the same quantity Experimental Data / Results e.g. number of students Histograms e.g. exam score One entry (x) in this histogram means one measurement (e.g. one score for every student) 26 “Binning” - important
© Copyright 2025 Paperzz