#
**What Wins College Football Games? (Revisited)**

*(Warning, a number heavy post lies ahead.)*

When it comes to college football, everyone has an opinion on the most effective way to win games. Some think a tenacious rush defense is the best way to come out victorious while others believe that a full on aerial assault is the superior way to go. While everyone has their opinion, **very little research has been done on the subject.** If you remember back to last summer, **I briefly imitated the model** used by Brian Burke of AdvancedNFLStats.com and came up with simple correlations to determine the best efficiency numbers (i.e. per attempt rather than per game numbers) for measuring college football teams. At that point in time, I concluded that offensive/defensive pass yards per attempt, offensive/defensive rush yards per attempt, and offensive/defensive turnover percentage were the best measures of success. However, **I have since updated the data with 2012’s numbers and also redefined the previous statistics to include more variables**, giving a better indication of how teams win. There are 723 observations in the analysis.

First, I’ll explain the change in how variables were measured. Previously, pass yards per attempt were defined as total pass yards divided by passing attempts. However, a very important part of the passing game was left out of this traditional calculation, that variable being sacks. Since sacks are undeniably a part of the passing game, I’ve added them to the calculation on both sides of the ball. This new statistic with sacks and sack yards included is called true pass yards per attempt, and it individually correlates higher to team success than its basic pass yards per attempt counterpart. The second change to the original data came with offensive/defensive turnover percentage. Whereas the previous model measured it as total drives divided by total turnovers, the new model measures it as total plays divided by total turnovers. I did this because team drive data isn’t available for years before 2011. The new measure also correlates slightly higher. Now that we’ve went over the new definitions, **here’s the correlation summary for all six stats when measured against winning percentage.** Keep in mind the closer to 1 or -1 the higher the correlation.

The numbers above suggest that defense, in all three categories, is more important than offense. Also, if you were to rank upon higher correlations, passing would be the highest rated event, followed by rushing and turnover percentage. While these numbers give a good baseline analysis on how teams win, this isn’t done in the most complete manner as all things aren’t held constant in the analysis. In order to correct this, **I’ve run a simple regression to find out which variables are the most important when all things are held constant.** Below is the chart containing correlation coefficients along with the R Square value.

**First, every single independent variable in the regression is statistically significant at the 1% level.** The R square value is high, illustrating that 76.8% of variance in team winning percentage can be explained using the above variables. Naturally, there are other factors which help explain how teams win, but this model explains a significant portion of team winning percentage. With the above coefficients, we can create a linear model to predict a team’s winning percentage.

Win%= .520 + (.076 * Off. True PYPA) + (-.070 * Def. True PYPA) + (.069 * Def. TO%) + …

While the above model is a decent fit, **we need to take it one step further and standardize the coefficients.** Why? Because variables like offensive rush yards per attempt and offensive turnover percentage aren’t measured on the same scale. To get standardized values, the excel function “standardize” is used to put the data into z-scores (how many standard deviations above or below average a data point is). The below chart contains those values and reveals the true relative importance of the statistics.

From this model, once all things are held constant, we can conclude that** efficient passing on offense is the most meaningful variable when measuring against winning percentage. **If a team finds itself one standard deviation above average in Off. True PYPA while being average in all other categories, they’d be expected to increase their winning percentage by .081 points. The same technique is used to evaluate the other coefficients as well. For example, if a team rushed the ball one standard deviation better than average while being average in all other categories, they’d be expected to increase their winning percentage by .031 points. Knowing that passing on offense and defense are the most important factors when it comes to winning percentage, **it should benefit teams to favor the pass more in play calling and recruiting. **This doesn’t mean teams should abandon their current strategies, after all, some teams are very successful with turnovers and rushing, but according to this model the passing game shows more relative importance.

What does this mean for Kentucky going forward? As I’m sure you’re aware by now, Mark Stoops has brought in one of the nation’s top offensive minds to manage the offense. In 2012, Neal Brown’s Texas Tech offense was a model of efficiency, ranked 13th in total offense. They averaged 7.3 true yards per passing attempt (28th nationally), 4.56 yards per rush (48th), and lost the ball on 2.4% of plays (68th). While their ability to rush and hold the ball weren’t elite, they still had one of the nation’s best offensive attacks because of their passing efficiency. In the future when talent has increased, Mark Stoops should have the Cats towards the upper half of the FBS as his Seminole defense was ranked 2nd nationally in total defense. This was accomplished by allowing 4.1 true yards per passing attempt (1st nationally), 2.74 yards per rush (4th), and forcing turnovers on 2.4% of plays (68th). Once talent is up to par, **Wildcat fans should feel confident going forward as Stoops and Brown do the most important things well on both sides of the ball.**

## 13 Comments for What Wins College Football Games? (Revisited)

Interesting. I’m curious how this would look with a log transformation on yardage variables assuming positive skewness?

Im going to need to see a p value for this

More points than your opponent ????

More points than your opponent ????

Whatever statistics you use UK is 2-10

Should have used a multiple linear regression with Akaike’s Information Criterion model selection, bro.

Seeking: single male. Actually, I’ll take any male as long as they are a UK fan. Warning: I will swing hard on your nuts.

I am going to go out on a limb and say that scoring more points is what wins football games…actually that also wins basketball games, baseball games, softball games, oh well you get the picture.

Are you serious? Using a multilinear regression on a boolean outcome? Have you ever even heard of a logit or probit model? Plus, there are so many problems with multicollinearity and lack of independence among the variables.

Also, if it was a valid multilinear regression (setting aside all the reasons why the axioms for such a model are violated), you by definition should have a constant of 0.50 exactly (because you list symmetrical variables–offensive and defensive counterparts for each variable.) Yet, you have 0.52 as a constant.

Congrats ‘applied math phd,’ you have a big brain. I’m sure you know that OLS is robust to all kinds of issues, and a linear probability model (what he did) will give you the same results as a logit/probit almost all the time. You must have forgotten that multicollinearity doesn’t bias the parameter estimates, though, or you wouldn’t have mentioned that. And yeah, he could have done some checks for autocorrelation, but we’re talking about different games here, so I’ll bet he’s fine there anyway. And whose to say that he didn’t check for this stuff anyway? Do you report all diagnostic tests in your articles? I hope not, for that would make a boring read indeed. For my two cents, it’s kind of nice to see folks using some more advanced stats than just correlations.

So, if there is a problem, it’s in the way that the estimates are interpreted. When you look at those coefficients, Jonathan, don’t assume that .076 is more important than .069 just because it’s bigger. Interpret this as 1 unit change in X leads to an increase/decrease of [coefficient value] on Y, on average. Your units on the Xs aren’t the same here (e.g., a passing yard is on a different scale than turnover/play), so you’ll need to go a step further to get some good substantive interpretations out of your results.

Some decent work overall, though. Good job.

Clay has it. Calculate some standardized beta coefficients and keep up the good work. This is at least as rigorous as the crappy those folks at the mit sports consortium or whatever they are called produce.

The SEC logo.