Statistical Computing and Simulation

Statistical Computing and Simulation
Spring 2013
Assignment #2, Due March 29, 2013
1. (a) Write a computer program using the Mid-Square Method using 6 digits to generate
10,000 random numbers ranging over [0, 999999]. Use the Kolmogorov-Smirnov
Goodness-of-fit test to see if the random numbers that you create are uniformly
distributed. (Note: You must notify the initial seed number used, and you may adapt
0.05 as the  value. Also, you may find warning messages for conducting the
Goodness-of-fit test, and comment on the Goodness-of-fit test. )
 The following is a program of “mid-square” generator:
midsq=function(seed,n) {
temp=seed
z=NULL
for (k in 1:n) {
temp=temp^2
temp=temp%% (10^9)
temp=floor(temp/1000)
z=c(z,temp)
}
return(z)
}
a =midsq(123456,100)
cor(a[-1],a[-100])
[1] -0.04486407
# It looks good using 123,456 as the seed.
a =midsq(123456,10000)
a1=floor(a/10^5)
table(a1)
# The result looks bad if 10,000 numbers are generated.
0
1
2
3
4
5
6
7
8
9
10 18 15 13 11 16 9871 13 14 19
ks.test(a1,”punif”)
Because the Kolmogorov-Smirnov Goodness-of-fit test cannot handle data with ties, we
suggest using the  2 Goodness-of-fit test which gives similar conclusion.
(b) Similar to the above, but consider X i 1  69,069 X i (mod 232 ) , i.e., the generator
used by Vax before 1993. Use both the  2 and Kolmogorov-Smirnov Goodness-of-fit
tests to check if the data are from U(0,1) distribution.
 data=modula(2^32,69069,0,123456,10000)
modula=function(m,a,cc,seed,n) {
temp=seed
z=NULL
for (k in 1:n) {
temp=a*temp+cc
temp=temp-floor(temp/m)*m
z=cbind(z,temp)
}
z1=z/m
z2=apply(z1,2,sum)
z2=z2-floor(z2)
return(z2)
}
Separating (0,1) into 10 equal space intervals and the numbers of observations in each
interval are 1064, 972, 960, 982, 962, 1008, 1012, 998, 1000, and1042. The testing
statistics and p-value of chi-square test are 10.224 and 0.3327. Using the ks.test, the
p-value is 0.5437, not rejecting that the data are from U(0,1).
(c) The calculators use U n1  (  U n ) 5 (mod 1) to generate random numbers between
0 and 1. Compare the result with those in (a) & (b), and discuss your finding based on
the comparison.
 First, we will write down a program:
palm.cal=function(seed,irr,n) {
temp=seed
z=NULL
for (k in 1:n) {
temp=(temp+irr)^5
temp=temp-floor(temp)
z=c(z,temp)
}
return(z)
}
Plugging the random number seed 100 and generating 10,000 numbers, the number of
observations in 10 equal-spaced intervals are 948, 1060, 1028, 964, 975, 964, 1029, 1028,
997, 1007. The test statistic is 5.985 and the p-value is 0.741.
2. (a) Verify that the sequence of numbers from X i 1  397,204,094 X i (mod 231  1) . Use
graphical tools and statistical tests to explore if this is a good random number generator.
 For X i 1  397,204,094 X i (mod 231  1) , you can use the command “modula” from #1
to create random numbers, such as
data=modula(2^31-1,397204094,0,123456,10000)
I will skip the tests of uniformity (“ks.test” or “chisq.test”) and independence from
problems #5 and #6. Instead, the histogram can be used to check if the random numbers are
uniform-like, and they do:
For the independence test, we can use the type of plots such as x(i) vs. x(i+1) to check. Of
course, you can use the “acf” and “pacf” plots, but the problem is that the data are not
from normal distribution (although the plots look fine).
(b) In class, we often use simulation tools in R, such as “sample” or “ceiling(runif),” to
generate random numbers from 1 to k, where k is a natural number. Using graphical
tools (such as histogram) and statistical tests to check which one is a better tool in
producing uniform numbers between 1 and k. (Hint: You may check if the size of k
matters by, for example, assigning k a small and big value.)
We can try k = 15 & 100 and see if it makes any differences. We shall first check the
uniformity of 10,000 random numbers, and separating them into 15 & 10 groups for k = 15
& 100, respectively.
t1=sample(c(1:15), 10000, T)
t2=sample(c(1:100), 10000, T)
t3=ceiling(15*runif(10000))
t4=ceiling(100*runif(10000))
a1=table(t1)
a2=table(ceiling(t2/10))
a3=table(t3)
a4= table(ceiling(t4/10))
The following tables show the numbers in each class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
a1
677
633
648
664
668
713
671
651
659
651
696
667
664
648
710
a2
1012
1061
990
1030
979
965
1020
948
997
998
a3
668
626
665
682
723
672
654
662
671
676
659
638
687
686
631
a4
990
1017
986
1011
1016
986
997
1003
954
1040
It is obvious that these numbers do not violate the assumption of uniformity (via chi-square
goodness-of-test). We can also use the “acf” and “pacf” to briefly check the independence,
and the results seem random.
3. There are several ways for checking the goodness-of-fit for empirical data. In specific,
there are a lot of normality tests available in R. Generate a random sample of size 10, 50,
and 100 from N(0,1) and t-distribution (with degrees 10 and 20) in R. You may treat
testing random numbers from t-distribution as the power. For a level of significance  =
0.05 test, choose at least four normality tests in R (“nortest” module) to check if this
sample is from N(0,1). Tests used can include the Kolmogorov-Smirnov test and the
Cramer-von Mises test. Note that you need to compare the differences among the tests
you choose.
 I generate 10, 50, and 100 random numbers from N(0,1), repeating 1,000 times. The
normality tests chosen are ks.test, ad.test, cvm.test, and lillie.test, where the last three
tests can be found at the package “nortest.” The following are the numbers of p-values
smaller than 0.05: (Numbers larger than 63 or smaller than 37 indicate a significant
difference.)
ks.test
47
61
56
n = 10
n = 50
n = 100
ad.test
48
58
48
cvm.test
51
60
52
lillie.test
51
58
53
It seems that the all four normality tests give satisfactory results. Also, unlike the test,  2
the normality tests considered here can apply to the cases with few observations.
ks.test
n = 10
n = 50
n = 100
ad.test
cvm.test
lillie.test
n=10
n=20
n=10
n=20
n=10
n=20
n=10
n=20
69
62
58
43
66
53
65
123
164
51
70
81
68
106
140
50
61
75
66
79
104
49
56
57
As expected, generally more observations and fewer degrees of freedom lead to more
rejections. It seems that the ad-test is most powerful (cvm-test comes second) in
distinguishing t-distribution random numbers from normal distribution random numbers.
Also, the ks-test is too conservative and is not a good tool to differentiate t-distribution and
normal distribution.
4. Write your own R programs to perform Gap test, Permutation test, and run test. Then
use this program to test if the uniform random numbers generated from Minitab (or
SAS, SPSS, Excel) and R are independent.
 The programs for Gap test and Permutation test are as following:
gap.test=function(data,a,b){
n=length(data)
x=c(1:n)*(a<data & data<b)
x1=x[x>0]
y=x1[-1]-x1[-length(x1)]-1
return(table(y))
}
I suggest looking at the counts of each possible numbers before doing the  2 -test. I
generated 6,000 random numbers from U(0,1) in R and got 3,622 numbers from the
program above using   0.2,   0.8 (i.e., from geometric distribution) with table
values:
x
 0
1
2
3
count  2179 867 357 134
4
52
5
20
6
9
7
1
8
2
9
1
The classes are 0, 1, 2, 3, 4, 5, 6, and 7+, which gives the expected counts of 2173.2,
869.28, 347.712, 139.0848, 55.63392, 22.253568, 8.901427, and 5.934285. The  2
statistics is 1.552602, indicating that we do not reject the numbers are independent.
The program of the permutation test is as following:
permute.test=function(data,k){
y=rep(10,k)^c((k-1):0)
x=matrix(data,ncol=k,byrow=T)
x1=apply(x,1,rank)
yy=apply(x1*y,2,sum)
return(table(yy))
}
I choose k = 3, i.e., 6 possible combinations. Check with 6,000 numbers from runif in
R and it seems that the random number generator performs fine.
Comb. 123 132 213 231 312 321
Counts 340 353 325 298 360 324
Using the 2-test, the numbers above do not fail the uniform distribution.
The following is the program of up-and-down test:
run.test=function(data) {
n=length(data)
ave=(2*n-1)/3
std=sqrt((16*n-29)/90)
x=(data[-1]>data[-n])^1
x1=sum(x[-1]!=x[-(n-1)])+1
value=(x1-ave)/std
return(pnorm(value))
}
I generate 10,000 uniform random numbers 1,000 times and see if the testing statistics
generated follow uniform distribution. Theoretically, the p-values of 1,000 replications
shall follow the distribution U(0,1). Out of 1000 runs, I found that there are 52 and 99
times of p-value smaller than 0.05 and 0.10 respectively. Also, the p-value of the ks-test
applying to these 1,000 p-values for U(0,1) is 0.1396.
5. Generate a random sample of size 10 from Exp() in R. For a level of significance  =
0.05 test, using the following tests to check if this sample is from Exp(): (a) the
Kolmogorov-Smirnov test and (b) the Cramer-von Mises test.
 I generate 10 random numbers from Exp(5) and the following is the graph of sample cdf
and theoretical cdf:
0.4
Y
0.6
0.8
1.0
Sample CDF vs. CDF
0.0
0.2
Sample CDF
CDF
0
5
10
15
20
X
x=rexp(10,0.2)
x1=pexp(sort(x),0.2)
matplot(cbind(sort(x),sort(x)),cbind((0:9)/10,x1),type=”l”,lty=c(1,1),col=c(1,2),
xlab=”X”, ylab=”Y”,main=”Sample CDF vs. CDF”)
legend(12,0.4,c(“Sample CDF”,”CDF”),lty=c(1,3))
The following table summarizes the testing statistics of Kolmogorov-Smirov and
Cramer-von Mises tests:
Obs.
Xi
FN(xi)
F0(xi) |FN(xi)-F0(xi)| (2i-1)/N F0(xi-1)-(2i-1)/2N
1
0.4579
0.0
0.0875
0.0875
0.050
0.0375
2
1.4688
0.1
0.2545
0.1545
0.150
0.1045
3
2.7790
0.2
0.4264
0.2264
0.250
0.1764
4
2.7872
0.3
0.4273
0.1273
0.350
0.0773
5
4.6190
0.4
0.6031
0.2031
0.450
0.1531
6
6.8775
0.5
0.7473
0.2473
0.550
0.1973
7
9.8394
0.6
0.8602
0.2602
0.650
0.2102
8
10.536
0.7
0.8784
0.1784
0.750
0.1284
9
10.623
0.8
0.8805
0.0805
0.850
0.0305
10
19.733
0.9
0.9807
0.0807
0.950
0.0307
Thus, Dn = 0.2602 (check withα=0.05 of Kolmogorov-Smirnov test, ks.test(x,”rexp”,0.2),
in R) and Y = 0.1164 (< 2.492 withα=0.05 in Cramer-von Mises test) and we don’t reject
the null hypothesis that the data are from Exp(5) distribution.
6. Tools in Time Series can also be used to check if the random numbers generated are
independent, but the normality assumption is required. Still, we can apply the “acf” and
“pacf” plots to check if the random numbers violate the independence assumption.
12
Apply the time series tools to the random numbers generated by ( U i  6) and
i 1
comment on your findings.
In R, the sample acf and pacf values can be found by using “acf(data)[*]” and the
standard error approximately equals to 1 / n , where n is the sample size. The following
program can be used to check there are unusual acf or pacf values for the random numbers
12
generated by ( U i  6) . Note that, as a demonstration, I only checked the first 24 acf and
i 1
pacf values, and the first value of acf should be 1.
t1=NULL
t2=NULL
for (i in 1:1000) {
x=matrix(runif(120000),ncol=10000,byrow=F)
x1=apply(x,2,sum)-6
a=acf(x1,plot=F)
a1=unlist(a)
a2=as.numeric(a1[2:25])
b=pacf(x1,plot=F)
b1=unlist(b)
b2=as.numeric(b1[1 :24])
t1=cbind(t1,a2)
t2=cbind(t2,b2)
}
bound=2/sqrt(10000)
a3=(t1 > bound)*1
b3=(t2 > bound)*1
a4=apply(a3,1,sum)
b4=apply(b3,1,sum)
I did not find the numbers of acf or pacf values larger than the bound exceeding 5%. The
independence test of Fibonacci numbers is similar and is omitted.

Download Report

Statistical Computing and Simulation

Paperzz.com

Your Paperzz