Data Visualization in R Solutions to Hands On Exercises L. Torgo [email protected] Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2016 Hands on Data Visualization - the Algae data set Using the Algae data set from package DMwR answer to the following questions: 1 Create a graph that you find adequate to show the distribution of the values of algae a6 solution 2 Show the distribution of the values of size 3 Check visually if it is plausible to consider that oPO4 follows a normal distribution solution © L.Torgo (FCUP - LIAAD / UP) Data Visualization solution Oct, 2016 2 / 10 Solution of Exercise 1 Create a graph that you find adequate to show the distribution of the values of algae a6 hist(algae$a6,main="Distribution of Algae a6",xlab="Frequency of a6", ylab="Nr. of Occurrences") 100 50 0 Nr. of Occurrences 150 Distribution of Algae a6 0 20 40 60 80 Frequency of a6 © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 3 / 10 Solution of Exercise 1 with ggplot2 Create a graph that you find adequate to show the distribution of the values of algae a6 ggplot(algae,aes(x=a6)) + geom_histogram(binwidth=10) + ggtitle("Distribution of Algae a6") Distribution of Algae a6 150 count 100 50 0 0 25 50 75 a6 Go Back © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 4 / 10 Solution to Exercise 2 Show the distribution of the values of size barplot(table(algae$size), main = "Distribution of River Size") 0 20 40 60 80 Distribution of River Size large © L.Torgo (FCUP - LIAAD / UP) medium Data Visualization small Oct, 2016 5 / 10 Solution to Exercise 2 with ggplot2 Show the distribution of the values of size ggplot(algae,aes(x=size)) + geom_bar() + ggtitle("Distribution of River Size") Distribution of River Size 80 count 60 40 20 0 large medium small size Go Back © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 6 / 10 Solutions to Exercise 3 Check visually if it is plausible to consider that oPO4 follows a normal distribution library(car) qqPlot(algae$oPO4,main = "QQ Plot of oPO4") QQ Plot of oPO4 500 ● ● 200 300 ● 0 100 algae$oPO4 400 ● ● ● −3 ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●● −2 −1 0 1 2 3 norm quantiles Go Back © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 7 / 10 Hands on Data Visualization - Algae data set Using the Algae data set from package DMwR answer to the following questions: 1 Produce a graph that allows you to understand how the values of NO3 are distributed across the sizes of river solution 2 Try to understand (using a graph) if the distribution of algae a1 varies with the speed of the river solution © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 8 / 10 Solutions to Exercise 1 Produce a graph that allows you to understand how the values of NO3 are distributed across the sizes of river. ggplot(algae,aes(x=NO3)) + geom_histogram(binwidth=5) + facet_wrap(~size) ggtitle("Distribution of NO3 for different river sizes") Distribution of NO3 for different river sizes large medium small 60 count 40 20 0 0 20 40 0 20 40 0 20 40 NO3 Go Back © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 9 / 10 Solutions to Exercise 2 Try to understand (using a graph) if the distribution of algae a1 varies with the speed of the river ggplot(algae,aes(x=speed,y=a1)) + geom_boxplot() + ggtitle("Distribution of a1 for different River Speeds") Distribution of a1 for different River Speeds ● ● ● 75 ● ● ● ● ● ● ● a1 50 ● ● 25 0 high low medium speed Go Back © L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2016 10 / 10
© Copyright 2025 Paperzz