Problem 1. Assume we have random variable X which corresponds

Problem 1.
Assume we have random variable X which corresponds to n independent
Bernoulli trials such that for one trial ki
𝑝𝑝,
𝑘𝑘𝑖𝑖 = 1
𝑃𝑃(𝑘𝑘𝑖𝑖 ) = �
1 − 𝑝𝑝,
𝑘𝑘𝑖𝑖 = 0
Then we have
𝑛𝑛
𝐸𝐸 [𝑋𝑋] = 𝐸𝐸 �� 𝑘𝑘𝑖𝑖 �
𝑖𝑖=1
𝐸𝐸 [∑𝑛𝑛𝑖𝑖=1 𝑘𝑘𝑖𝑖 ] = ∑𝑛𝑛𝑖𝑖=1 𝐸𝐸 [𝑘𝑘𝑖𝑖 ] by the linearity of the expected value
We have 𝐸𝐸 [𝑘𝑘𝑖𝑖 ] = 𝑝𝑝 ∀𝑖𝑖 so we can rewrite E[X] as ∑𝑛𝑛𝑖𝑖=1 𝑝𝑝 = 𝑛𝑛𝑛𝑛.
Variance also has the property of linearity, so the argument is equivalent
for Var[k] = p(1-p).
%%%% Problem 2
lower = -6;
upper = 6;
steps = 10000;
stepsize = (upper-lower)/steps;
inputs = lower:stepsize:upper; % generate 10000 points
between -6 and 6
testpdf = gaussian(inputs,0,1)*stepsize; % use my
function to make the pdf
% normalize by stepsize
testcdf = cumsum(testpdf); % sum points
figure;
plot(inputs,testcdf); % plot the cdf against the input
range
xlabel('x','FontSize',16)
ylabel('cumulative probability','FontSize',16)
title('cdf of Gaussian distribution from -6 to 6, mean
zero, std 1','FontSize',16)
set(gca,'FontSize',16)
%%%% a
est_1_std = 1-(testcdf(find(inputs<1,1,'last'))testcdf(find(inputs>-1,1,'first')));
% est_1_std = 0.3175
%%%% b
est_2_std = 1-(testcdf(find(inputs<2,1,'last'))testcdf(find(inputs>-2,1,'first')));
% est_2_std = 0.0456
%%%% c
est_3_std = 1-(testcdf(find(inputs<3,1,'last'))testcdf(find(inputs>-3,1,'first')));
% est_3_std = 0.0027
function f = gaussian(xvals,mean,stdev)
% gaussian creates a pdf of the Gaussian distribution
with mean mean and
% variance stdev for the points xvals.
%
% INPUTS:
% xvals: vector of points to use to make the pdf
% mean: scalar center of distribution
% stdev: scalar standard deviation of distribution
%
% OUTPUTS:
% f: vector containing pdf points
%
% NH 9/30/2016
coeff = (stdev*(2*pi)^(1/2))^-1;
f = coeff*exp(-((xvals-mean).^2)/(2*stdev^2));
%%%% Problem 3
nrtests = 1000;
mycdf = [testcdf;inputs];
testdraws = cdf_random_draw(mycdf,nrtests);
%%%% a
calc_1_std = sum(abs(testdraws)>1)/nrtests;
% calc_1_std = 0.354
%%%% b
calc_2_std = sum(abs(testdraws)>2)/nrtests;
% calc_2_std = 0.048
%%%% c
calc_3_std = sum(abs(testdraws)>3)/nrtests;
% calc_3_std = 0.003
function draws = cdf_random_draw(cdf,nrpoints)
% cdf_random_draw simulates data from a given cdf by
drawing nrpoints from
% the uniform distribution and mapping that cdf value
to the x value that
% generated it.
%
% INPUTS:
% cdf: 2xm matrix indicating the cdf of the
distribution to test and x
% values at each point
% nrpoints: a scalar indicating the number of random
draws to do
%
% OUTPUTS:
% draws: a vector containing the data simulated by the
random draw
%
% NH 9/30/2016
draws = NaN(1,nrpoints); % initialize output array
for i=1:nrpoints % loop over number of draws
[~,idx] = min(abs(cdf(1,:)-rand(1))); % find the
index of the matching point
draws(i) = cdf(2,idx); % look up x-coord at that
index
end
%%%% Problem 4
% p = 0.5;
n = 50;
k = 0:50;
probs = 0.05:0.01:0.95; % set probabilities to test
diffs = NaN(1,numel(probs)); % initialize storage for
differences
for i = 1:numel(probs)
f_k = binom(probs(i),n,k); % use my binomial function
to find the pdf
mu = n*probs(i); % mu = np
sigma = ((1-probs(i))*probs(i)*n)^(1/2); % sigma =
root(np(1-p))
Y = normpdf(k,mu,sigma); % use Matlab Gaussian pdf
diffs(i) = sum(abs(f_k-Y));
end %i
figure;
plot(probs,diffs)
xlabel('p','FontSize',16)
ylabel('distance between Gaussian and
binomial','FontSize',16)
title('Distribution distance as a function of
p','FontSize',16)
set(gca,'FontSize',16)
% The distances get larger as p deviates from 0.5. The
normal distribution
% is symmetrical. In the case of p = 0.5, the binomial
distribution is
% also symmetrical, which means the Gaussian can do the
best job
% approximating it. As the distribution gets less
symmetrical, the
% approximation by the Gaussian gets worse and worse.
We could make it
% more symmetrical by increasing the number of trials.
function f_k = binom(p,n,k)
f_k = NaN(1,numel(k));
for i = 1:numel(k)
f_k(i) = nchoosek(n,k(i))*p^k(i)*(1-p)^(n-k(i));
end
%%%%% Problem 5
nrdraws = 100000;
mean = 10;
stdev = 5;
draws = normrnd(mean,stdev,1,nrdraws); % generate
1000000 random draws from Gaussian
% distribution with mean 10 and stdev 5
figure; % Plotting
hist(draws,50);
xlabel('value','FontSize',16)
ylabel('number of values','FontSize',16)
set(gca,'FontSize',16)
title('Does this look Gaussian? Yes it
does.','FontSize',16)
% a
nrtrials = 1000; % set number of times to run
nrsamps = 5; % set number of
samples = NaN(nrsamps,nrtrials); % initialize storage
for j = 1:nrtrials
idx = round(rand(1,nrsamps)*nrdraws); % find idx
for draws
samples(:,j) = draws(1,idx); % store
end
% 1 calculate means
means = mean(samples);
% 2 & 3 standard deviations using n and n-1 formulae
s_sub_nminus1s = NaN(1,nrtrials);
s_sub_ns = NaN(1,nrtrials);
for j = 1:nrtrials
% calculate s using n-1
s_sub_nminus1s(j) = sum(((samples(:,j) means(j)).^2)/(nrsamps-1))^(1/2);
% calculate s using n
s_sub_ns(j) = sum(((samples(:,j) means(j)).^2)/(nrsamps))^(1/2);
end
% 4
% calculate standard errors
std_err_mean_nminus1s = s_sub_nminus1s/(nrsamps-1)^.5;
std_err_mean_n = s_sub_ns/nrsamps^.5;
% Compare to true standard deviation
nminus1diffs = abs(s_sub_nminus1s - stdev); % absolute
difference
sum(nminus1diffs)/1000; % average difference
% The average difference is 1.4244.
ndiffs = abs(s_sub_ns - stdev);
sum(ndiffs)/1000;
% The average difference is 1.4549.
% The n-1 standard deviation is slightly better, but
the difference between
% the two cases is small.
%%%% b
std_of_means = std(means); % find standard deviation of
estimate of means
nminus1_diff = abs(std_err_mean_nminus1s std_of_means); % find difference
sum(nminus1_diff)/1000;
% The average difference is 0.6840
n_diff = abs(std_err_mean_n - std_of_means);
sum(n_diff)/1000;
% The average difference is 0.6402
% The estimate based on n better reflects the actual
variability of the
% means.
%%%% c
% Calculation using n-1
figure;
[h_nminus1 b_nminus1] =
hist(std_err_mean_nminus1s,nrtrials/10); % use 100 bins
h_nminus1 = h_nminus1/nrtrials; % divide by nr trials
subplot(1,2,1)
bar(b_nminus1,h_nminus1)
xlabel('standard error of mean','FontSize',16)
ylabel('probability of estimate','FontSize',16)
title('Standard error pdf for n-1
calculation','FontSize',16)
set(gca,'FontSize',16)
subplot(1,2,2)
plot(b_nminus1,cumsum(h_nminus1))
ylim([0 1])
xlabel('standard error of mean','FontSize',16)
ylabel('cumulative probability','FontSize',16)
title('Standard error cdf for n-1
calculation','FontSize',16)
set(gca,'FontSize',16)
% Calculation using n
figure;
[h_n b_n] = hist(std_err_mean_n,nrtrials/10);
h_n = h_n/nrtrials;
subplot(1,2,1)
bar(b_n,h_n)
xlabel('standard error of mean','FontSize',16)
ylabel('probability of estimate','FontSize',16)
title('Standard error pdf for n
calculation','FontSize',16)
set(gca,'FontSize',16)
subplot(1,2,2)
plot(b_n,cumsum(h_n))
ylim([0 1])
xlabel('standard error of mean','FontSize',16)
ylabel('cumulative probability','FontSize',16)
title('Standard error cdf for n
calculation','FontSize',16)
set(gca,'FontSize',16)

Download Report

Problem 1. Assume we have random variable X which corresponds

Paperzz.com

Your Paperzz