Download the Abstract Book

Book of Abstracts
2013 NC-ASA Symposium: Celebrating the
International Year of Statistics
October 12, 2013
Raleigh NC USA
Special thanks to our sponsors:
2013 NC-ASA Symposium: Celebrating the International Year of Statistics – Book of Abstracts
Social media usage by enloe high school students [Poster]
Utshab Chakraborty
Enloe High School, Raleigh, NC
Abstract. Background: Social media has become a major influence in modern day culture
and society. Studies (Wohna, et. al. 2013 and Feldman 2012) showed that most common
use of the Internet in class was to access social media sites, followed by listening to music,
playing games, and sending text messages and photos. In recent years, social media has
played a big role in the way we communicate with each other and share information. It can
also be used for educational purposes. More emphasis has been given to the use of social
media in school and the benefits that it can bring to the classroom for instructional purposes.
Objectives: To find out how Enloe High School students use social media to interact
with other classmates, organize group events, socialize, do school work, and communicate
with teachers. Also to determine to what extent students use social media as a medium to
communicate with classmates and teachers for homework and if use of social media enhances
productivity for completing classwork.
Methods: I surveyed ninth to twelfth grade Enloe High School students from September
22nd, 2013 to October 5th, 2013. The students I surveyed belong to a social media network
shared by me and my friends. I used surverygizmo.com to conduct my survey.
Results: A total of 103 students participated in the survey. Majority of the participants
(38%) are 10th grader and 53% male. The most common social media is Facebook (67%),
Twitter (53.4%), Instagram (24.3%) and (84%) use social media to communicate with the
teachers and classmates for school related assignments. They also used social media to socialize (94.2), communicate with classmates (87%), and School work (80.6), and organizing
events or school related extracurricular activities (80.6), teacher communicate through social
media (27.2). On an average students use 2.8 hours social media during the school days and
6.1 hours of sleep at night. The study showed that only quarter of the students using social
media to communicate with teachers but they are using social media for other school related
activities. Most of the students are sleep deprives and spending significant amount of time
on social media compare to their school hours.
Leveraging auxiliary information for snp selection in genetic association studies [Technical Session II]
Adrian Coles
Statistics
North Carolina State University
Abstract. Genetic association studies aim to find marginal or joint effects of multiple SNPs
on outcome(s) of interest, and several approaches have been proposed with varying degree
of success. Most of these approaches assume that (a priori) each SNP has equal chance of
being associated with an outcome. However, in some cases there exists substantial auxiliary information from different studies in the same disease area that can be incorporated
into the analyses for more refined inference. Examples of such auxiliary information can
Chakraborty
1
Coles
2013 NC-ASA Symposium: Celebrating the International Year of Statistics – Book of Abstracts
be disease-specific alternate domain knowledge such as those obtained from transcriptomic
(expression-based studies), epigenetic and integrative studies. Our aim is to leverage this
information and refine selection of disease-associated SNPs. We do so in a Bayesian variable
selection framework and incorporate the auxiliary information as structural priors on the
probabilities of selection of the SNPs-thus allowing simultaneous selection and sparse modeling. We illustrate our methods by leveraging auxiliary information obtained from both
Genecard and NIH’s SNP function database to investigate the association between the epidermal growth factor receptor and head and neck cancers.
Individual treatment assignment as a decision problem [Poster]
Qi Dong
Statistical Science
Duke University
Abstract. Previous studies put great weight on drawing causal inferences and assign treatment based on the average treatment effect. This paper explores an alternative strategy.
We focus on individual level treatment assignment and frame it as a decision problem using
Bayesian modeling. We propose a scheme that helps decision makers (e.g. doctors, policy makers) to decide whether to assign a treatment (e.g. medical procedure, job training
program) to any particular individual. Under the assumption that there is no unmeasured
confounding factor in the data, we adopt the Rubin Causal Model framework and build a
Bayesian model based on past data to predict any incoming individual’s potential outcomes
(e.g. probability of survival from a disease, probability of increase in annual income) with
and without treatment applied. Based on that comparison, we assign treatment to the individual with the objective of maximizing the individual’s probability of obtaining a desirable
result. The paper examines the advantage and implication of this framework by applying
it to the RHC dataset, which was collected at five medical centers in the U.S. and contains
the information about 5735 hospitalized adult patients’ treatment assignment, life status on
the 30th day after admission and measurements of 52 potential confounding factors. We
show that our framework can be used as a meaningful and reliable tool that enables decision
makers to assign treatment effectively and efficiently.
Generalized functional concurrent model [Technical Session I]
Janet Kim
Statistics
North Carolina State University
Abstract. We consider the generalized functional model, where both the response and the
covariate are functional data and are observed on the same domain. In contrast to typical
functional linear concurrent models, we allow the relationship between the response and covariate to be nonlinear, depending on both the value of the covariate at a specific time point
as well as the time point itself. In this framework we develop methodology for estimation
of the unknown relationship and construction of point-wise confidence bands, allowing for
correlated error structure as well as sparse and/or irregular design. We investigate this ap-
Dong
2
Kim
2013 NC-ASA Symposium: Celebrating the International Year of Statistics – Book of Abstracts
proach in finite sample size through simulations and a real data application.
Classical testing in functional linear models [Technical Session I]
Dehan Kong
Biostatistics
University of North Carolina, Chapel Hill
Abstract. We extend four tests common in classical regression - Wald, score, likelihood ratio
and F tests - to functional linear regression, for testing the null hypothesis, that there is no
association between a scalar response and a functional covariate. Using functional principal
component analysis we re-express the functional linear model as a standard linear model,
where the effect of the functional covariate can be approximated by a finite linear combination
of the functional principal component scores. In this setting, we consider application of the
four traditional tests. The proposed testing procedures are investigated theoretically when
the number of principal components diverges, and for both densely and sparsely observed
functional covariates. Using the theoretical distribution of the tests under the alternative
hypothesis, we develop a procedure for sample size calculation in the context of functional
linear regression. The four tests are further compared numerically in simulation experiments
and using two real data applications.
Likelihood-based estimation of structural nested mean models in
randomized clinical trials with non-compliance [Technical Session III]
Roland Matsouaka
Epidemiology
Harvard University
Abstract. Current estimating equation methods for logistic structural nested mean models
(SNMMs) either rely heavily on possible ”uncongenial” modeling assumptions or involve a
cumbersome integral equation needing to be solved, for each independent unit, at each step
of solving the estimating equation. These drawbacks have impeded widespread use of these
methods. In this paper, we present an alternative parametrization of the likelihood function
for the logistic SNMM that circumvents computational complexity of existing methods while
ensuring a congenial parametrization of SNMM. We also provide a goodness-of-fit (GOF)
test statistic for evaluating parametric assumptions made by the likelihood model. Our
method can be easily implemented using standard statistical softwares, and is illustrated via
a simulation study and two data applications.
A binary optional unrelated question rrt model [Technical Session II]
Jeong S. Sihm
Mathematics and Statistics
University of North Carolina at Greensboro
Abstract. We propose a new binary unrelated question randomized response technique
(RRT) model which allows respondents the option of answering a sensitive question directly
Kong
3
S. Sihm
2013 NC-ASA Symposium: Celebrating the International Year of Statistics – Book of Abstracts
without using the randomization device if they find the question non-sensitive. This situation has been handled before (2013a and 2013b ) using a split sample approach. In this
work we avoid the split sample approach which requires larger sample. Instead, we estimate
the prevalence of the sensitive characteristic by using an Optional Unrelated Question RRT
Model and the corresponding sensitivity level from the same sample by using a simple Binary
Unrelated Question RRT Model. We compare the simulation results of this new model with
those of the split-sample based Optional Unrelated Question RRT Model and with the usual
Unrelated Question RRT Model. Computer simulations show that the new model has the
smallest variance among the three models when they have the same sample size.
Iterative selection using orthogonal regression techniques [Technical Session II]
Bradley Turnbull
Statistics
North Carolina State University
Abstract. Variable selection techniques play a key role in analyzing high dimensional data.
Recently, penalized forward selection has been introduced as a procedure, which selects
sparser models than comparable methods without compromising predictive power. The motivation for this approach comes from the fact that penalization techniques like LASSO give
rise to closed form expressions when used in one dimension. Hence, one can repeat such a
procedure in a forward selection setting until it converges. However, when predictors are
highly correlated, unnecessary duplication can occur in the selection step. We show it is
possible to improve stability and computation efficiency by introducing an orthogonalization
step. At each selection step, variables are screened on the basis of their correlation with
variables already in the model, thus preventing unnecessary duplication. This new strategy,
called the Selection Technique in Orthogonalized Regression Models (STORM), is extremely
successful in further reducing the model dimension and also leads to improved predicting
power. We carry out a detailed simulation study which compares STORM to existing methods and analyze a gene expression dataset.
Interaction models for functional data [Technical Session I]
Joseph Usset
Statistics
North Carolina State University
Abstract. We consider a functional regression model with a scalar response and multiple
functional predictors that accommodates two-way interactions in addition to their main
effects. We develop an estimation procedure where the main effects are modeled using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to
generalized linear models and data observed on sparse grids or with error are also presented.
Our proposed method can be easily implemented through existing software. Through numerical study we find that fitting an additive model in the presence of interaction leads to both
poor estimation performance and lost prediction power, while fitting an interaction model
Turnbull
4
Usset
2013 NC-ASA Symposium: Celebrating the International Year of Statistics – Book of Abstracts
where there is in fact no interaction leads to negligible losses. We illustrate our methodology
by analyzing the brain tractography data and the AneuRisk65 study data.
Human odor voc elimination analysis by pca and l-2 norm [Technical
Session III]
Christopher Vanlangenberg
Mathematics and Statistics
University of North Carolina at Greensboro
Abstract. Human scent is one of the most complex mixtures available in the human body
and influenced by various internal and external factors. A method was developed to extract
maximum human body odor with minimum non-skin odor contamination with minimum
subject discomfort. The method was further developed to identify and compare the VOC
profile produced by humans to determine the effect of 4 different scent control products. A
randomly selected 65 human subjects were tested with and without the selected products
using our navel technique with the use of active SPME GCMS, a total of approximately 5280
unique compounds were found among the subjects. Then standardized gas chromatography
data were shortlisted by three conditions, an ad-hoc method to identify the best compounds,
ranking method, and literature (based on historic data). Discriminant analysis(DA) and
Principal component analysis (PCA) were used to simplify the complex outcomes associated
with competitive scent elimination mechanisms of various agents in each product. Finally
an L-2 norm approach on the principle components was proposed to evaluate the scent reduction and hence the 4 different scent elimination products were compared.
Vanlangenberg
5
Vanlangenberg