Transcript

Transcript
R-Script 2 – Part E
01. 00:01 / 00:08 - Using cbind can help you combine things together, cbind and rbind. What I'm going to do is create
02. 00:11 / 00:18 - three new vectors that are called mpg, cylinder, and weight and if we look at those we see
03. 00:18 / 00:25 - that we get the results for just that particular variable. So we're going to make a new thing
04. 00:27 / 00:34 - in this case it's going to be a matrix, by column binding these three variables together.
05. 00:36 / 00:43 - So what this gives me is indeed a matrix if we look at the attributes and the class we
06. 00:45 / 00:50 - will see that the attributes tell me that the dimensions are 32 by 3, the dim names
07. 00:50 / 00:56 - don't exist for the rows but for the columns it went ahead and gave the variable names
08. 00:56 / 01:03 - and the class is a matrix. We could do the same thing by row binding information. So
09. 01:03 / 01:10 - I'm going to have cars 1, 5, 10, and 15. So car 1, we have the information just for that
10. 01:11 / 01:18 - particular car and I can bind that together using rbind to bind the rows together, car
11. 01:19 / 01:26 - 1, car 5, 10 and 15. And we can see here's the dataset. We have only four cars and we
12. 01:31 / 01:38 - can see the attributes. In this case the attributes, it gives me names, row names and the class
13. 01:39 / 01:46 - is a data frame. So rbind and cbind depend upon the object that you start with. If you
14. 01:46 / 01:53 - start with the vector, it will make a matrix but because each of these individual cars
15. 01:55 / 02:01 - was also a data frame if we look at the attributes of car1, it's a data frame, then when it binds
16. 02:01 / 02:08 - those together in rows it keeps the type of objects that it starts with. You can get yourself
17. 02:09 / 02:15 - into trouble if you're trying to bind vectors and the vectors are a mixture of character
18. 02:15 / 02:22 - and numeric but if you have those vectors defined as individual data frames then it
19. 02:22 / 02:29 - will bind them correctly. So let's look at a little bit of fun with data now that we've
20. 02:29 / 02:35 - talked about importing data. We can create a table of a particular in this case categorical
21. 02:35 / 02:42 - variable, the number of cylinders, at least a discrete variable. And here I have created
22. 02:42 / 02:49 - cylinder.table by using the table function and then I can print that so we end up with
23. 02:49 / 02:55 - 4, 6, and 8. There were 11 four cylinder cars 7 six cylinder cars and 14 eight cylinder
24. 02:55 / 03:02 - cars. If I want to get the percentages I could divide the cylinder.table numbers by the length
25. 03:03 / 03:08 - of the cylinder vector so what I'm saying is how many observations were in the cylinder
26. 03:08 / 03:15 - vector and then divide each of those by the total. If you look at the attributes of cylinder.table,
27. 03:17 / 03:23 - it's a thing unto itself, it's a class table the dim names are 4, 6, and 8 so four, six
28. 03:23 / 03:30 - and eight are not data they are the labels for particular columns. There's only one piece
29. 03:31 / 03:36 - of information in this table which is the 11, 7, and 14 and now what we're going to
30. 03:36 / 03:43 - do is ask for it to present that divided into proportions, not percentages, if we wanted
31. 03:43 / 03:48 - actually to get it as percentages well then I have to multiply that by 100 then I could
32. 03:48 / 03:54 - get the percentages. We can do contingency tables where we have two variables so let's
33. 03:54 / 04:00 - look at cylinders by automatic or manual. Here cylinders were the rows and automatic/manual
34. 04:00 / 04:05 - was the columns. There are ways to translate, but it's not quite as easy as other packages
35. 04:05 / 04:11 - and I'll be letting you play around a little bit with how you want to use R to present
36. 04:11 / 04:18 - information throughout the semester. So for a particular variable we can use a cut feature
37. 04:18 / 04:24 - to cut that variable into specific groups, this is very useful for categorizing a quantitative
38. 04:24 / 04:30 - variable. Here what we're going to do is categorize this quantitative variable using quantiles
39. 04:30 / 04:36 - and so I'm going to first just remind ourselves we have the miles per gallon variable and
40. 04:36 / 04:41 - I'm going to ask R to calculate the quantiles so it gives us the minimum, the twenty-fifth
41. 04:41 / 04:48 - percentile, the 50th percentile, the 75th percentile, and the maximum. So I'm using
42. 04:48 / 04:54 - these numbers to decide how I want to break the data up. I could do it equal intervals
43. 04:54 / 05:01 - 10, 20, etc. but here we're going to say go ahead and split into you quartiles so the
44. 05:01 / 05:08 - lower 25 percent, the next, the upper, and maximum 25 percent. So we are going to use
45. 05:09 / 05:16 - ten as the lower bound about 15.4 for the next, 19.2, 22.8, and 34. So let's go ahead
46. 05:21 / 05:27 - and create this mycars1$new, by doing it this way what I'm saying is create a new variable
47. 05:27 / 05:34 - inside the data frame mycars1 by cutting up the miles per gallon. If I just said create
48. 05:35 / 05:40 - new, it's not going to associate it with mycars1 so if you want to add a variable to a data
49. 05:40 / 05:45 - frame this is the best way to do it, assign it as a new variable inside the data frame
50. 05:45 / 05:52 - that you already have. So when I run these two pieces of code what I see is that it runs
51. 05:54 / 06:01 - the first thing and then in mycars1$new what I get are intervals. And so this variable
52. 06:01 / 06:08 - let's look at the attributes of this variable. So the attributes are that it is a factor
53. 06:11 / 06:18 - variable and those are just the levels the raw data underneath is actually numeric. And
54. 06:20 / 06:26 - we can put that into a table and here we see the counts for how many in each group. Should
55. 06:26 / 06:33 - be about equally distributed since we made it by the quartiles and we could do a breakdown
56. 06:34 / 06:36 - of that grouping by cylinders.