Regression Analysis Student Project Panera’s Nutritional Information Fall November 2012- January 2013 xxxxx xxxxxxx Objective: Panera is one of my favorite places to eat, and sandwiches are one of my staple foods. Thus, I decided that I would perform a regression analysis on the nutritional data for the most popular Panera sandwiches, provided on the Panera website. The goal of this analysis is to examine the relationships between the total calories and several nutritional values, yielding a model with the best fit. The dependent variable is calories, and there are nine independent variables. I will perform Ordinary Least Squares Regression, using a 95% confidence interval. Data: The data can be seen below, and was obtained from http://www.panerabread.com/pdf/nutr-guide.pdf. Panera has quite a large menu, so I chose only full sized sandwiches from the Café and Signature lists. There are fourteen items total. Equation: For the analysis, I will use the following equation, Y = A + B1X1 + B2X2 + B3X3 + B4X4 + B5X5 + B6X6 + B7X7 + B8X8 + B9X9 The dependent variable (Y) is the calories per each sandwich. A is the intercept, and the independent variables (the X’s) are listed below: X1= Fat (g) X2=Saturated Fat (g) X3=Trans Fat (g) X4=Cholesterol (mg) X5=Sodium (mg) X6=Carbs (g) X7=Fiber (g) X8=Sugars X9=Protein (g) Models: In order to estimate the parameters of the following models, I used the regression function in Excel’s Data Analysis Add-In. First, the full model is evaluated, including all nine explanatory variables. Based on the results, variables will be eliminated one by one- starting with the variable that has the highest p-value. Model 1- The Full Model Utilizing all nine explanatory variables in the regression yields the following results; The initial regression on all nine independent variables produces this equation: Y = 10.7756 + 8.8872X1 + .8393X2 + 6.8358X3 - .0950X4 + .0041X5 + 3.8823X6 + .3519X7 - .1586X8 + 3.7147X9 The R2 of this regression is .99984, the Adjusted R2, which adjusts for the degrees of freedom, is .99947. Since these values are close to 1, we see that this model is a good fit to the data. The F ratio is 2740.3738, and the Signifiance F is a very small number. These results also indicate that the regression equation produces a close prediction of the actual data. However, it is still possible to enhance the model in order to provide a closer fit, so we look at the P-values. Small P-values tell us that these explanatory variables really influence the response variable. Therefore, we can remove the explanatory variable with the highest P-value and rerun the regression to make our model more precise. In this case, we rerun the regression without X8=Sugars, which has the highest P-value (.7683). Model 2- Eight Explanatory Variables Removing Sugars from the regression and utilizing the other eight explanatory variables yields the following results; The second regression on eight independent variables produces this equation: Y = 10.2641 + 8.8783X1 + .8044X2 + 7.3577X3 - .0968X4 + .0043X5 + 3.8668X6 + .3000X7 + 3.7408X9 With this regression, the R2 is .99983, almost the exact same as the full model. The Adjusted R2 is .99957, which is a tiny bit larger than that of the full model. The standard error of Model 2 is also less than Model 1’s standard error (3.48537 vs. 3.84923). Lastly, when comparing the F ratio, we can also see that Model 2 has an F ratio of 3760.1911, which is larger than Model 1. Together, these statistics illustrate that Model 2 may be an even better fit to the data. Still, to enhance the model further, we can remove the explanatory variable with the highest P-value again, Fiber (X7). Model 3- Seven Explanatory Variables Removing Fiber from Model 2 and rerunning the regression yields the following results; The third regression on seven independent variables produces this equation: Y = 10.8761 + 8.8876X1 + .7632X2 + 7.533X3 - .1016X4 + .0038X5 + 3.8823X6 + 3.7648X9 The R2 did not change from Model 2 to Model 3, it is .99983. However, the Adjusted R2 increased to .99963, the F ratio increased to 5051.3079, and the Standard Error decreased to 3.21475. Hence, we can deduce that Model 3 is not only a good fit to the data, but an improvement from Models 1 and 2. We continue enhancing the model by pulling out X5=Sodium because it’s P-value is still not close to 0. Model 4- Six Explanatory Variables Removing Sodium from Model 3 and rerunning the regression yields the following results; The fourth regression on six independent variables produces this equation: Y = 6.9503 + 8.9243X1 + .8463X2 + 5.9666X3 - .1393X4 + 3.9183X6 + 4.0114X9 The R2 barely decreased in this model, now .99981. Again, the Adjusted R2 increased to .99964, the F ratio increased to 6058.5481, and the Standard Error decreased to 3.17054. Model 4 is still a precise fit, but we rerun the regression again, removing X3=Trans Fat, with the highest P-value of .0818. Model 5- Five Explanatory Variables Removing Trans Fat from Model 4 and rerunning the regression yields the following results; The fifth regression on five independent variables produces this equation: Y = 2.6641 + 8.9607X1 + .9732X2 - .1486X4 + 3.9137X6 + 4.1566X9 For Model 5, the R2 decreased a little to .99969, the Adjusted R2 decreased a little to .99950, and the F ratio decreased to 5228.3807. The Standard Error increased to 3.73852. This model is less precise than Model 5, according to these statistics. Still, we run another model without X4=Cholesterol since it has the highest Pvalue of the explanatory variables left. Model 6- Four Explanatory Variables Removing Cholesterol from Model 5 and rerunning the regression yields the following results; The sixth regression on four independent variables produces this equation: Y = 2.3764 + 8.7752X1 + .8842X2 + 3.9926X6 + 3.8485X9 As with Model 4, Model 5 saw a decrease in R2 (.99944), a decrease in Adjusted R2 (.99920), a decrease in the F ratio (4038.4995), and an increase in the Standard Error (4.75526). These results mean that Model 5’s goodness of fit is worse than all the other models. We will run one last regression since there is one explanatory left with a P-value that is larger than 0, X2=Saturated Fat. Model 7- Three Explanatory Variables Removing Saturated Fat from Model 6 and rerunning the regression yields the following results; The seventh regression on three independent variables produces this equation: Y = .2363 + 9.0779X1 + 3.9935X6 + 3.9214X9 Once again, R2 has decreased (.99882), Adjusted R2 has decreased (.99847), the F ratio has decreased (2823.2085), and Standard Error has increased 6.56518. These changes are significant enough to say that Model 7 is most likely not the best fit to the data. The P-values for the remaining independent variables are all very close to 0, so the regression will not be re-evaluated. Correlation: According to Excel’s Correlation Analysis, Fat has the highest correlation with Calories, at .88181. Sugars have the lowest correlation with Calories, .02829. The Fat correlation with Calories is the highest correlation in the data set. Conclusion: Below, is a summary of each model’s results: Based on this table, I conclude that Model 4, with six explanatory variables, is the best fit to the data. This model has the largest Adjusted R2 and F Ratio. It also has the smallest Standard Error. Although Model 4’s R2 is not the highest of group, it is very close to 1 and only .00001 off from the full model and Model 2. Also, the Adjusted R2 is a better comparison since it adjusts for degrees of freedom. Additonally, the P-values of the explanatory variables in Model 4 are all fairly close to 0. For all these reasons, Model 4 is my choice for most precise model. This regression equation is: Y = 6.9503 + 8.9243X1 + .8463X2 + 5.9666X3 - .1393X4 + 3.9183X6 + 4.0114X9 Choosing Model 4 is saying that Fat, Saturated Fat, Trans Fat, Cholesterol, Carbs, and Proteins are the main drivers of a Panera meal’s calorie total.
© Copyright 2026 Paperzz