CFB Stats Correlation

Posted on October 11, 2011March 1, 2017 by Dick Kusleika

Disclaimer: I’m not a statistician. I aced Business Statistics, but that was 20 years ago and I forgot everything the day after the final.

My premise is that Points-per-Yard on offense and Yards-per-Point on Defense are predictive of the final score of a game. I went back to week five of the college football season to look at the data. The rankings that are used are current (week 6), which makes the results a little suspect.

I added Points-per-Yard * 100 to Yards-per-Point for each team. For each game, I compared the difference in that calculation between the two teams. Then I calculated the score difference. I used the CORREL function to see how well the data correlated. Ignoring games with D1AA teams, the correlation coefficient was .14.

That’s not too predictive. Maybe week 3 is too early in the season, so I go to week 5. Maybe I need to exclude more crappy opponents. Maybe I need to exclude even more crappy opponents. Maybe instead of looking at stats against highly ranked team, I need to look at those against opponents that are near this week’s opponent’s rank. Maybe not so near. Here’s the data

Week	Criteria	# of Games	Corr. Coeff.
3	All D1A Games	44	.14
5	Games Against Top 60	38	.30
5	Games Against Top 80	50	.27
5	All D1A Games	53	.23
5	Games Against Opp. Rank +/-20	24	-.30
5	Games Against Opp. Rank +/-30	34	-.29
5	Games Against Opp. Rank +/-40	41	-.21

Maybe, just maybe, this stat doesn’t predict a damn thing.

3 thoughts on “CFB Stats Correlation”

Debaser says:

October 12, 2011 at 8:25 am

The lack of correlation doesn’t necessarily mean it is a bad place to start. Any model which tries to predict a result needs to be evolved and refined. There are so many variables in ~~hand egg~~ american football that it shoudln’t be surprising that the correlation for this is so poor. Keep adding variables and refining your model and you should come up with something better. Just remember though if the results could be predicted with any sort accuracy then bookies wouldn’t be taking bets on it.
Dick Kusleika says:

October 12, 2011 at 1:27 pm

Yeah, I agree. Those negative coefficients surprised me though. I thought that if I got around .5 I had have a good starting point to start tweaking, but starting this low maybe starting down a bad path.

As for bookies, they don’t have good enough Excel skills to keep up with me. :)
Gavin says:

October 13, 2011 at 11:57 am

The model needs to be built out further because as you include more fields that could also be correlated the coefficient of the ones you have will likely decrease.

“Parents salary” might drive household income if thats the only thing you look at but when you build in more fields like “has a job or not”, the coefficient on “parents salary” will decrease.