A Deep Dive with Data… And Sharks! An example of real world data analysis Introductions David Pilkington CJ Clingler Director of Software Engineering at Ontario Systems where he has worked for the past 17 years. One time Business Intelligence Product Manager and is currently working on his MSPA degree at Northwestern University. Graduated from Ball State University with a B.S. in Mathematical Economics and has spent the last 2.5 years at Ontario Systems conducting complaint and financial analyses. Currently working on her MS in Statistics at Texas A&M. Installing R and RStudio R is open-source software and may be downloaded freely at http://cran.r-project.org/. RStudio, an integrated development environment (IDE) for R. Install the most current version of R and RStudio. Step 1: Download and install R from http://cran.r-project.org/. Documentation is readily available on CRAN and on many R community and resource sites. Step 2: Download and install RStudio. http://www.rstudio.com/products/rstudio/download/ Quick Review Step #1 - Prepare 1.What are the Context and Goals? 2.Source of the Data (gathered or not): 3.Methods of Collection: 4.Relevant Data: Step #2 - Analyze 1.Assess Data Quality 2.Explore the Data: Fully understand what type of data is within your dataset. 3.Apply Statistical Methods: This is the math part STEP #3 - Conclude 1.Determine Results a.Statistical Significance b.Practical Significance 2.Communicate Results: To whom are you communicating 3.Presentation of the Data Key terms Data set & subset Numeric variable Categorical variable Summary Minimum Maximum Mean Binary data Frequency table Bar graph Box plot Histogram Correlation Know Thy Data “Read in” the data set Data summary Data subsets Read in the dataset Data summary and subsets Data Summary • • • Describes each variable Provides max and min information for numerical variables (numbers) Provides frequency for categorical variables (words) Data Subsets • • Extract variables to make smaller, more manageable datasets Group by numbers, concepts, etc. When and Where do shark attacks happen? Table and Plot • • The table() function creates a frequency table how many attacks happen in each country? The plot() function is used with categorical variables to create a bar graph showing the frequency of attacks in each country • Bar graphs are used to show the frequency of a category Barplot • • The barplot() function provides a similar graph to the plot() function, but is used to create frequency bar graphs from numerical data What do you notice about the frequency of shark attacks by time of day? Why do you think this is? Binary Data • Binary data captures True/False information • Our data set has two binary variables • Was a warning sign posted? • Was the victim in a group during the attack? • What conclusions might we be able to draw between this information and the time of shark attacks? How were victims attacked? Bar graph - victims’ Activity Bar graph - types of attacks Bar graph - victims’ injury Breakdown of Injury by Attack Type What about water conditions? Boxplots and Histograms Boxplots • Standardized way to show the distribution of data • • • • • • Histograms Minimum First quartile Median Third quartile Maximum Easy way to identify outliers • Similar to bar graphs • Bar graphs show frequency of categories • Histograms show frequency of values in a range Boxplot - Water Depth Histogram - Water Depth Boxplot - Water temperature Histogram - Water Temperature Victim Details Boxplot & Histogram - Victim Age Frequency Table Fatal Not Fatal Female 483 52 Male 2645 464 What are the odds? Given that a female was attacked, what is the probability that female dies? NOT Fatal Fatal Total Female 483 52 535 Male 2645 464 3109 Total 3128 516 3644 483/535 = 90.3% Given that there is a victim of a fatal shark attack, what is the probability the victim is male? 2645/3128 = 84.6% What are the odds? 1. Given someone is attacked by a shark, what is the probability they will die? 2. Given a male is attacked by a shark, what are the chances he survives? 3. What percentage of shark attack victims are female? NOT Fatal Fatal Total Female 483 52 535 Male 2645 464 3109 Total 3128 516 3644 Describe the Sharks Bar graph - Shark Species How big were the sharks? Summary Boxplot Histogram Boxplot & Histogram - Shark Length Correlations Correlations • Correlation does NOT equal causation; it indicates a connection or relationship • Variables can be positively correlated or negatively correlated • Positive • • Both variables increase or decrease together Value of correlation is close to 1 • Negative • • One variable increases as the other decreases Value of correlation is close to -1 • Variables close to zero are not related What might be Connected? • Shark length and bite size • Others? • Water temp and time of year • Length & depth • Species & temp Questions? Hands on Activity https://statsguys.wordpress.com/2014/01/03/first-post/ Step by step instructions with R code to copy and paste Predicting whether Titanic passengers survived or died. Kaggle data set Thank you! David Pilkington Senior Director, Technology Ontario Systems, LLC [email protected] CJ Clingler Compliance Analyst Ontario Systems [email protected]
© Copyright 2026 Paperzz