CFB Stats Correlation

Disclaimer: I’m not a statistician. I aced Business Statistics, but that was 20 years ago and I forgot everything the day after the final.

My premise is that Points-per-Yard on offense and Yards-per-Point on Defense are predictive of the final score of a game. I went back to week five of the college football season to look at the data. The rankings that are used are current (week 6), which makes the results a little suspect.

I added Points-per-Yard * 100 to Yards-per-Point for each team. For each game, I compared the difference in that calculation between the two teams. Then I calculated the score difference. I used the CORREL function to see how well the data correlated. Ignoring games with D1AA teams, the correlation coefficient was .14.

That’s not too predictive. Maybe week 3 is too early in the season, so I go to week 5. Maybe I need to exclude more crappy opponents. Maybe I need to exclude even more crappy opponents. Maybe instead of looking at stats against highly ranked team, I need to look at those against opponents that are near this week’s opponent’s rank. Maybe not so near. Here’s the data

Week Criteria # of Games Corr. Coeff.
3 All D1A Games 44 .14
5 Games Against Top 60 38 .30
5 Games Against Top 80 50 .27
5 All D1A Games 53 .23
5 Games Against Opp. Rank +/-20 24 -.30
5 Games Against Opp. Rank +/-30 34 -.29
5 Games Against Opp. Rank +/-40 41 -.21

Maybe, just maybe, this stat doesn’t predict a damn thing.

3 thoughts on “CFB Stats Correlation

  1. The lack of correlation doesn’t necessarily mean it is a bad place to start. Any model which tries to predict a result needs to be evolved and refined. There are so many variables in hand egg american football that it shoudln’t be surprising that the correlation for this is so poor. Keep adding variables and refining your model and you should come up with something better. Just remember though if the results could be predicted with any sort accuracy then bookies wouldn’t be taking bets on it.

  2. Yeah, I agree. Those negative coefficients surprised me though. I thought that if I got around .5 I had have a good starting point to start tweaking, but starting this low maybe starting down a bad path.

    As for bookies, they don’t have good enough Excel skills to keep up with me. :)

  3. The model needs to be built out further because as you include more fields that could also be correlated the coefficient of the ones you have will likely decrease.

    “Parents salary” might drive household income if thats the only thing you look at but when you build in more fields like “has a job or not”, the coefficient on “parents salary” will decrease.

Leave a Reply

Your email address will not be published. Required fields are marked *