Affirmation 3_29_17 This week, I've worked on more formal adaptations of the models from last week. Specifically, while last week, I looked at how temporal breadth was associated with quantities derived from other models, this week, I've incorporated temporal breadth directly into the models. Additionally, I've switched all these models over to bayesian estimation methods rather than frequentist estimation. At a superficial level, this wont make much of a difference for what we're looking at. We're interested in the degree to which temporal breadth is associated with academic outcomes. Bayesian and frequentist statistics each have a set of methods of estimating this relationship. However, the frequentist approaches carry a set of assumptions (often left unstated) that I think are undesireable. Bayesian methods also carry some assumptions, but those assumptions are made explicit in the model definition (through the specification of priors and model structure), and the statements one can logically make about bayesian estimates are more in line with what I believe we would like to say about these data. Just to unpack this idea a little further, I've included a short section on Bayesian estimation. Bayes In bayesian estimation, the posterior distribution is a description of what parameter values are credible in light of the data and our prior beliefs. More succinctly, the posterior tells us what we can believe after seeing some data. In this setup, our prior beliefs can be quite vague, and they describe the range of values that we think are probable before seeing any data. For example, our prior beliefs about temperature on the surface of planet earth might run something like -150 to 150 degrees Fahrenheit. It's essentially impossible that we would see a temperature value of ten million degrees, and so our prior would not include that value. Using a more appropriate comparison, for GPA data, our prior beliefs should include scores that range from 0 to 4.3 or so. However, we would never expect to see a GPA of -50, and so including this value in our prior belief is nonsensical. Practically speaking, the prior should be chosen to balance the demands of effective computations and convincing a skeptical critic. The former demand is flexible and thanks to computational advances, is becoming increasingly so. The latter demand is obviously less flexible. In contrast, frequentist estimation is based on a set of assumptions that are typically overlooked. I will not belabor the point that has been made numerous times elsewhere (Kruschke, 2014; Berger & Wolpert, 1988; Jeffreys, 1961; Wagenmakers, 2007), but conclusions drawn from p-values and confidence intervals estimated using standard procedures are misleading at best and many scientists and statisticians evaluate them as though they are quantities derived from Bayesian posterior distributions. Some of the logic behind these problems is quite technical, but superficially, the problems boil down to three points, popularly highlighted by Wagenmakers (2007). Specifically, p-values: 1. Depend on data that were never observed That is, the p-value is dependent upon the null distribution, which includes hypothetical data expected under the null. 2. Depend on possibly unkown subjective intentions of the experimenter For instance, the sampling plan of the researcher may be to collect π participants, or it may be to collect data until π time passes, or it may be to collect as much data as possible in a given cohort of students in a given school. These differences in sampling plan will lead to different forms of the null distribution, leading to differen p-values depending on which null distribution the analyst specifies. 3. Do not quantify statistical evidence For instance, identical p-values do NOT reflect identical statistical evidence. A p-value of .03 when computed with a sample of 5 individuals is not perceived to offer the same strength of evidence as the same p-value when computed with a sample of 500 individuals (e.g. Rosenthal & Gaito, 1963; Nelson, Rosenthal & Rosnow, 1986). Much more could be said (and indeed, has been said) about the problems of this statistical framework as it is typically applied in modern science. If this is of interest, I will be happy to allocate a fuller treatment in the future. For the present, we can rest easy knowing that the parameters and intervals I present can be interpreted in the way that we want, and not in the convoluted nature of frequentist p-values, which necessitate invoking an imaginary sampling distribution. Temporal Breadth Recall once again that temporal breadth is computed using the Herfindahl-Hirschman index (H-H index). This index is based on three proportions - the proportion of words from the LIWC future category, the LIWC present category, and the LIWC past categories. The denominator for each of these categories is the total number of words across the three (i.e. for a given person, the value of the three categories should sum to 1). As before, I've reversed this metric such that higher scores indicate more breadth. A longitudinal model of academic outcomes. In my view, the best approach to modeling academic outcomes that I previously described was the one incorporating a slope and intercept for each student. To this model, I will now add condition and writing information. Specifically, I will be interested in the extent to which the intercept varies as a function of temporal breadth, and how this variation differs by condition. Information about the intercept corresponds to our knowledge of where the student is estimated to be at the first measurement occurrence after the intervention. A second interest in this model is how the slope varies as a function of temporal breadth and condition. This information corresponds to our knowledge of a student's trajectory of performance over time. Finally, we can obtain the model's estimations for academic outcomes at time points along the trajectory. These estimations incorporate intercept and slope estimates into a holistic prediction of whether variation in temporal breadth leads to boosts in academic outcomes at any point along the longitudinal measurement of students (e.g. a small bump in the beginning combined with a small deflection in their trajectory could yield large effects after 2 years, even if the initial changes were small). More formally, the model estimated is of the form: πππ = π½0π + π½1π πππ + π½2 πΆπ + π½3 π΅π + π½4 πππ πΆπ + π½5 πππ π΅π + π½6 πΆπ π΅π + π½7 πππ π΅π πΆπ + π½8 ππ + πππ Where π½0π = πΎ00 + π0π π½1π = πΎ10 + π1π In less formal english, this model reads that the observed grade at time π for person π is a function of quarter (π), condition (πΆ), temporal breadth (π΅), all interactions between these terms, pre-intervention gpa (π), and a random error (π). Additionally, the parameters for the intercept (π½0) and slope (π½1) vary by person. For each of these parameters, there is an overall coefficient (πΎ00 and πΎ10 for intercept and slope respectively), and individual deviations from these coefficients (π0π and π1π ). The figure below plots the coefficient estimates from this model. Though the direction of the effect is consistent with they hypothesized relationship (bottom two coefficients), neither one is robust enough for us to put much credibility in the effect described by these parameters. Figure 3: Model fit to the data. Line represents the mean of the posterior distribution. Bands represent the 95% uncertainty interval. Plot is faceted by the approximate mean and +/-1 standard deviation of temporal breadth I next broadcast this model out for the full timeline of participation. As indicated in Figure 3, the direction of the effect is consistent with our hypothesis, though the effect is small. After 8 quarters, there is no clear separation between the control and treatment groups for this specification. However, focusing on within the treatment, there is modest evidence for modulation. Specifically, after 8 quarters, a student in the treatment group whose language is 1 standard deviation above the mean in terms of temporal breadth is estimated to have a GPA of 2.46, (95% uncertainty interval = [2.3, 2.61]. In contrast, the comparable treatment student whose language is 1 standard deviation below the mean in temporal breadth is estimated to have a GPA of approximately 2.31 [2.21, 2.41]. Concretely, the probability that a student in the treatment whose language is 1 standard deviation above the mean in terms of temporal breadth will have a higher GPA after 8 quarters than the equivalent student who is 1 standard deviation below the mean is approximately 0.06, 0.94. Figure 4 plots this relationship more directly. Figure 4: Posterior distribution of the GPA difference between treatment students who write with temporal breadth 1 standard deviation below and 1 standard deviation above the mean. Positive values on the y axis represent better outcomes for high temporal breadth. Blue shading represents 95% uncertainty intervals Incorportating identity threat Values affirmations are meant to benefit those who academically suffer due to identity threat. As such, we've so far been ignoring this crucial variable. I now incorporate this information into the analysis. I'm not sure that there's a clear prediction with respect to whether we would expect temporal breadth to most help stigmatized versus nonstigmatized students. As such, this is pretty exploratory. Figure 5: Coefficient estimates. Points represent the mean of the posterior distribution. Inner bars represent 50% uncertainty intervals. Outer bars represent 95% uncertainty intervals. Estimate for the intercept is excluded There are 16 coefficients plotted above, but not all of them are of substantive interest. The first five (from the top) are main effects and are not especially interesting for this analysis. The coefficient plot highlights two effects of particular interest. First, the quartercondition-temporal breadth interaction (5 from bottom) represents the fact that temporal breadth modulates longitudinal trajectory for nonstigmatized students in the treatment condition. Second, the quarter-condition-breadth-stigmatized interaction (bottom) representes the fact that temporal breadth does not seem to modulate longitudinal trajectory for stigmatized students in the treatment group. Indeed, if it does at all, it's likely in the opposite direction (more breadth leads to worse performance over time.) Figure 6: Model fit to the data. Line represents the mean of the posterior distribution. Bands represent the 95% uncertainty interval. Plot is faceted by the approximate mean and +/-1 standard deviation of temporal breadth, along with group (stigmatized and not stigmatized) More specifically, the mean of the posterior distribution estimating this interaction between temporal breadth and time in the non-stigmatized group is 0.15, [0, 0.3]. The probability that this effect is greater than zero is 0.02, 0.98. Concretely, after 8 quarters, for a non-stigmatized student in the treatment group whose writing was 1 standard deviation below the mean in terms of temporal breadth, the model predicts that their GPA would be 2.64 [2.5, 2.78]. For the equivalent student whose writing is 1 standard deviation above the mean, the model predicts that their GPA would be 2.89 [2.68, 3.11]. The probability that a non-stigmatized student in the treatment whose language is 1 standard deviation above the mean in terms of temporal breadth will have a higher GPA after 8 quarters than the equivalent student who is 1 standard deviation below the mean is approximately 0.97. College Attendence We can also perform a similar analysis with respect to college attendence. There is, of course, less data in such an analysis, but we can explore any support for the idea that students who write with temporal breadth are more likely to attend college. Figure 7 plots the coefficient estimates from this model. They suggest a story that is similar to that we saw in the model of middle-school grades - students in the treatment condition who write with breadth appear to go on to greater achievement. Figure 7: Coefficient estimates. Points represent the mean of the posterior distribution. Inner bars represent 50% uncertainty intervals. Outer bars represent 95% uncertainty intervals. Coefficient estimate values are in logit units Figure 8: Model fit to the data representing the probability of attending college (y axis) as a function of pre-performance index (x axis), condition (colored lines) and mean +/- 1 standard deviation of temporal breadth (facet). Figure 8 plots this model in a more intuitive way. As seen above, the effect here appears a bit larger than the previous analyses. For instance, a student in the treatment condition who has the average pre-performance score and writes with temporal breadth of 1 standard deviation below the mean is estimated to have a probability of attending college of 0.59, [0.5, 0.68]. This estimate is very close to a comparable student in the control condition (probability = 0.66, [0.48, 0.68]). However, a student in the treatment condition with the average pre-performance score who writes with temporal breadth of 1 standard deviation above the mean is considerably improved - they are estimated to have a probability of attending college of 0.81, [0.66, 0.92]. A similar control student is actually estimated to have declined in their probability of attending college somewhat (with respect to a control student writing 1 SD below the mean). The probability of attending college for this student is estimated at 0.55, [0.45, 0.64]. A full 99% of the posterior distribution is consistent with an effect of this direction.
© Copyright 2025 Paperzz