Report - University of Cincinnati

Skills Workshop #4: Statistical & Uncertainty Analysis
Speaker: Dr. Lilit Yeghiazarian, Assistant Professor of Environmental Engineering,
University of Cincinnati
Date: June 20, 2014
Time: 10:00 AM- 12:00 AM
Venue: Baldwin 749
Prepared by:
Victoria Sumner- Junior, Chemical Engineering, University of Cincinnati
Stephanie Palmer- Pre-Junior, Chemical Engineering, University of Cincinnati
Dorien Clark- Sophomore, Chemical Engineering, University of Cincinnati
REU Participants for Project 1: Interaction of Nanoparticles with Microbial Biofilm
in Water Treatment Facility Processes
Dr. Yeghiazarian presents on Statistical and Uncertainty Analysis to the students
participating in the REU Program.
Dr. Lilit Yeghiazarian gave a presentation on the Statistical and Uncertainty Analysis
to the students participating in the 2014 Summer REU Program.Dr. Yeghiazarian has
obtained degrees in electrical, operations research and biological engineering. She has
completed her doctoral research at Cornell University on stochastic modeling of
waterborne pathogen transport and risk assessment in complex environmental systems.
Later at UCLA, Dr. Yeghiazarian was involved in research in epidemiological, first as a
Postdoctoral Fellow at the Department of Biostatistics and then as a Research Assistant at
the Department of Epidemiology. Her research involved the development of multi-scale
models of HIV transmission and computationally intensive evolutionary models of
influenza based on large international datasets.
Dr. Yeghiazarian is currently an Assistant Professor of Environmental Engineering
at the University of Cincinnati. Her research interests range from water quality modeling,
environmental surveillance, and situational awareness for biosecurity to rapid decisionmaking and policy development. It also covers the uncertainty and multi-scale nature of
complex environmental systems and processes.
The main purpose of the presentation was to inform and educate the REU
participants on how to perform statistical analysis on the data from experiments. Collecting
accurate and precise data is important for any research project or lab work, such as the
REU projects. Knowing how to estimate errors and their sources can help in interpreting
the research data more appropriatelyand to draw valid conclusions. Without a statistical
analysis of the data, it remains limited and inconclusive. In her presentation Dr.
Yeghiazarian outlined different types of errors, uncertainty in measurements, and
statistical methods for curve fitting.
The first thing Dr. Yeghiazarian explained was the different types of errors. She
grouped them in two broad categories: general errors and numerical errors. General errors
occur due to three reasons: human errors, which are called blunders, formulation or model
errors, which result due to use of incomplete mathematical models, and data uncertainty,
which are limited to significant figures considered in recording physical measurements.
Numerical errors occur due to two reasons: round-off errors, which are due to computer
limitations in representing numbers, and truncation errors, which are due to mathematical
approximations.
Dr. Yeghiazarian then went further into explaining round-off errors as it applies to
computer systems. Such errors occur in digital computers due to their limited ability to
represent numbers. All computers store code using binary digits called bits: in a computer
numbers are stored as a word, and a word consists of binary digits. The limitation occurs
due to the size of the data path of computer. Anything bigger or smaller than the limit
would cause underflow and overflow. For example, computers recognize real numbers to
be between -1.797693134862316x10308 and -2.225073858507201x10-308and
2.225073858507201x10-308and 1.797693134862316x10308.If numbers are used that go
below the first range, they result in an underflow error, and if numbers are used above the
second range, they result in an overflow error. Anything in between the two ranges will be
computed as zero. To test a computer for its range, Dr. Yeghiazarian, suggested trying the
following functions within MATLAB: format long and realmax and realmin.
An important point Dr. Yeghiazarian mentioned is that when it comes to precision,
computers cannot represent certain numbers with significant digits because they have an
infinite sequence, such as e, 𝜋, and √7. Round-off errors can also occur when not properly
adding or subtracting and not paying attention to significant figures. Sometimes adding a
small and large number together will contribute to this problem or even subtracting two
numbers that are close to each other in value.
Dr. Yeghiazariannext talked about truncation errors. These errors occur when exact
mathematical operations are represented by approximations. For example, in the Taylor
Series, the more terms are considered, the more accurate the result will be. But,
considering beyond a certain number of terms does not add to the significant number of
decimal places considered for the result.
Dr. Yeghiazarian next described errors that result due to uncertainties in
measurements, which are bound to occur. She described them as interchangeable to
numerical errors. An error in scientific measurement means the inevitable uncertainty that
accompanies all measurement, which must be taken into account at all times. These errors
are thus not mistakes, for they cannot be eliminated by just being careful. Some rules for
how to report and use uncertainties are: to always use the best estimate ± uncertainty
(∆𝑥), where ∆x represents uncertainty, which is error or margin of error. Note that ∆x is
always positive. Also, ∆x cannot be known with too much precision and the last significant
figure in any stated answer should usually be of the same order of magnitude as the
uncertainty.
Dr. Yeghiazariannext spoke about the modeling of experimental data using
regression analysis or curve fitting. She mentioned three different curve-fitting
techniques:least-squares regression of data with scatter plots; linear interpolation
regression for precise data; and curvilinear interpolation regression for precise data. Trend
analysis is conducted to see which statistical regression technique best fits the
experimental data, which compares existing mathematical models with measured data.
Before discussing regression techniques, Dr. Yeghiazarianwent over basic terminology and
descriptive statistics, which included: arithmetic mean, standard deviation, variance,
coefficient of variance, histogram of data, and confidence intervals. For some students it
was review, but for others, it was new information.
The linear least-squares regression is the simplest example of a least-squares
approximation because it fits the data as a straight line, which makes a very easy
mathematical equation. Dr. Yeghiazarianstated that one can assume, that each x has a fixed
value and the y values are independent random variables with some variance and must be
normally distributed. Minimizing the sum of squares of the residuals (which represents the
vertical distance between a data point and the regression line) will optimally give the best
fit line for a set of data. This can be done with equations, derivation of normal equations, as
well as graphing due linearization transformation. Dr. Yeghiazarian also emphasized the
importance of using computer programs such as MATLAB. She said that MATLAB is a
useful tool which utilizes simple commands to perform a variety of tasks. She gave some
examples showing how to use the built-in functions of MATLAB. She also mentioned that
MATLAB offers a linear, polynomial and higher regression analysis features.
In addition to descriptive statistics on errors and linear regression, Dr. Yeghiazarian
discussed comparing the means of two different data sets which can be done
mathematically by using a t-test. For cases when more than two sets of data are to be
compared, she described a tool called ANOVA, which stands for Analysis of Variance that
can be used to compute the F ratio statistic for analysis similar to the t-test.
Dr. Yeghiazarian covered a lot of information on these topics that are usually
covered within a semester long class. The REU participants will now be able to collect,
analyze, and present datausing the guidelines and information that Dr. Yeghiazarian gave in
this presentation. The REU students now have multiple ways to go about analyzing their
data and can use this information when presenting their data for their REU project and any
other future endeavors like, pursuing graduate school, working on co-op jobsas a student
or working with an engineering company after graduation.