Part 1 - JustAnswer

Part 1: RBI
RBI
8
7
6
Frequency
5
4
3
2
1
0
35
51
67
83
99
115
RBI
The solution is not unique since the value of the mean is affected by the
individual values in the sample. The team with the highest RBI is 5, 7,
11, 13, 14, 16, 17, 18, 22, 24. They have a team RBI of 94.2
Part 2: Strikeouts
1
1
2
5
11
12
(2)
11
11
8
5
3
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4
5
248
45779
0
59
44
116
46
46
3
This solution is also unique for the same reason as in part 1. This team
has a team mean of 55.6. The team is 1, 4, 8, 9, 12, 15, 19, 21, 23, 25.
Part 3: Homeruns
In this case, the solution is not unique. In fact, we can replace, for
example, the best or worst member of the team with anyone else on the same
side of the median and the median is not affected. That’s because median
only considers the relative order of the observations. Hence, a team with
the best team median homeruns is 24, 17, 5, 14, 13, 7, 2, 22, 18, 23.
They have a median of 32.
Part 4:
Batting Average:
The calculated batting averages are shown below:
23
12
4
7
6
5
18
11
22
PLAYER
Ian Kinsler
Ryan Theriot
Brian Giles
Aubrey Huff
Shane
Victorino
Jermaine Dye
Matt Kemp
James Loney
Garrett
Batting
Average:
0.319
0.307
0.306
0.304
0.293
0.292
0.290
0.289
0.286
8
17
16
1
10
24
15
Atkins
Ivan
Rodriguez
Prince
Fielder
David Murphy
Carl Crawford
Aaron Rowand
Carlos
Delgado
Edgar
Renteria
0.276
0.276
0.275
0.273
0.271
25
2
0.271
20
14
9
13
21
19
0.270
3
Jeff
Keppinger
Cody Ross
Kosuke
Fukudome
Pat Burrell
Pedro Feliz
Jason Giambi
Jason Kendall
Emil Brown
Jeff
Francoeur
0.266
0.260
0.257
0.250
0.249
0.247
0.246
0.244
0.239
Again, the team with the highest team median is not unique. So we choose
this team with a highest median: 23, 12, 4, 7, 6, 5, 18, 11, 22, 8. The
team median is 0.292.
Part 5: Stolen Bases.
This data is right skewed with large outliers.
To find the team with the least variation, we observe that most of the
data in this distribution are located at the lower values. Hence we
choose the lowest 10 stolen bases to make up the team with the least
variability in their number of bases stolen. The team is
Part 6:
Probability of Base Stealing.
6, 18, 23, 1, 12, 20, and 8 are the players who have stolen more than 10
bases. Hence, we choose these 7 players to be on the team and the other
three players don’t matter. The probability of choosen a player with more
than 10 stolen bases is 0.7. We choose, for the remaining three players
(though this choice doesn’t matter) 14, 3, and 9.
Part 7:
Confidence interval
We choose numbers from 1-25 out of a hat (without replacement, obviously)
in order to select our team. The following team results: 4, 7, 8, 10, 15,
19, 21, 22, 24, 25.
Sample average batting average is: 0.274
Sample standard deviation is 0.0208.
Critical value for the CI: 2.262. Hence:
0.274 +/- 2.262*0.021/sqrt(10) = (0.258, 0.289)
Since we noted earlier than the distribution of batting averages is skewed
to the right, we choose the players with the lowest batting averages to
make the narrowest confidence interval. That is because in right skewed
data the lowest values are the most tightly grouped. Hence, they will
have a lower standard deviation. Hence, the confidence interval will be
narrower. The team is: 15, 25, 2, 20, 14, 9, 13, 21, 19, 3
The sample mean is 0.253. The sample standard deviation is 0.010.
the confidence interval in this case is:
So,
0.253 +/- 2.262*0.010/sqrt(10) = (0.246, 0.260)
This interval, though, is not useful or interpretable. That is because
the sample is not random. So we can’t interpret it in the same way as a
regular confidence interval.
Part 8: p-value.
I’m assuming here that the hypotheses for this test are:
H0:
Ha:
p>=0.5
p<0.5
The sample proportion for the random team is 0.2.
The test statistic is:
z=(0.2-0.5)/sqrt(.5*.5/10)= -1.897
The p-value for this test is P(z<-1.897) = 0.0289
Hence we would reject the null hypothesis (at 0.05 level). And conclude
that the proportion of players with a batting average of 0.3 is less than
0.5.
Part 9:
Correlation.
Runs v. Homeruns
40
35
Homeruns
30
25
20
15
10
5
0
38
58
78
98
Runs
The red highlighted squares represent the chosen players. They were
chosen because they seem to fall on the same line. The team that is
represented by these points is 21, 25, 1, 15, 20, 3, 4, 11, 18, 22. The
correlation coefficient of these points is 0.9126.
Part 10: The best team!
The best team that I can find to maximize (or minimize) the
characteristics:
Numbe
r
PLAYER
24 Carlos Delgado
5 Jermaine Dye
17 Prince Fielder
22 Garrett Atkins
7 Aubrey Huff
13 Jason Giambi
2 Cody Ross
11 James Loney
23 Ian Kinsler
4 Brian Giles
AB
598
590
588
611
598
458
461
595
518
559
H
162
172
162
175
182
113
120
172
165
171
R
96
96
86
86
96
68
59
66
102
81
SB
1
3
3
1
4
2
6
7
26
2
SO
124
104
134
100
89
111
116
85
67
52
HR
38
34
34
21
32
32
22
13
18
12
Their team stats are:
Average RBI
Average Strikouts
Median HR
Median Batting Average
St. Dev. Of SB
r of runs v. homeruns
P(SB<10)
Width of confidence interval for mean batting average
p-value for hypothesis test:
RBI
115
96
102
99
108
96
73
90
71
63
Batting
Average
0.271
0.292
0.276
0.286
0.304
0.247
0.260
0.289
0.319
0.306
91.300
98.200
27.000
0.288
7.472
0.357
0.900
0.031
0.1188