My fanpost on Cal vs. the Pac-10 in the Tedford Era led me to one major conclusion: It's hard to figure out anything from those numbers because the sample sizes are way too small, and too easily skewed by individual games like (for example) last year's Washington and Washington State games, or by factors that don't show up in the numbers like special teams in the UCLA games (thanks norcalnick).
I thought that there was room for further exploration of what variables correlated the most closely to victory, and there were four potential areas that came to mind:
1. Use a sample consisting of all of our games in the Tedford Era. The advantage here would be that the sample would be bigger. The disadvantage is that the sample would include a lot of games against extremely strong teams (like USC) and extremely weak teams (again, like Washington State), and so would have a higher deviation (I don't know how to express this in statistical terms - whatever we come up with would be an average that may not provide much predictive value for the more important games that we want to look at).
2. Use a sample consisting of all games against ranked opponents in the Tedford Era. This would be a smaller sample, but theoretically at least one with relatively little deviation, and one that would have some predictive value as far as our tougher games go.
3. Look at specific intervals within the range of data for each variable. While not statistically significant, this could provide examples of how Cal's performance may not match its statistics.
4. Look at specific scenarios. Even smaller samples, but interesting.
I decided to look at correlation coefficients using the first sample to get a "big picture" kind of view, and then use the second sample to explore in more detail the implications of our past record against USC and other ranked teams.
[Before I go any further: big shoutout to Royrules once again for sharing his data with me, without which this post would not be possible.
Also, note that I have no real knowledge of statistics, this is just me messing around with numbers and trying to figure out what they mean. If anyone has suggestions on how to better examine this data, that would be awesome.]
1. Correlation Coefficients (all Tedford-era games) (N=89)
I used Excel to calculate the correlation between each of the following variables and the outcome of each game (defined as points margin - the most objective measure I could think of):
(Note: Close to 1 means more positive correlation, close to -1 means more inverse correlation, 0 means less correlation)
A couple of thoughts here:
- The ground game, both ways, clearly correlates more closely to points margin than the passing game
- Offense correlates more closely than defense. Could this be because our defense is fairly consistent, and victory or defeat depends more on how the offense does?
- Passing offense has a negative correlation to points margin. Is this because we abandon the run and start airing it out when we're behind, like the 08 Maryland game where Riley ended up with 423 passing yards?
Of course, these numbers don't tell us much out of context You can argue that a 7 point win over USC is more impressive than a 21 point win over Pradesh A&M, which is why I decided to look at the second set of data: ranked opponents only.
2. Correlation Coefficients (Tedford-era games against ranked opponents) (N=25)
- The passing game's correlation to points margin dwindles to almost nothing
- Turnovers are HUGE - almost twice the correlation coefficient of any other variable
Now you may be thinking: maybe these correlations are just flukes, given the small sample size. I was curious about that as well, so I decided to break down our record against ranked opponents based on individual variables into intervals. That will allow us to see what happened in all cases where variable = x (positive TO margin, more than 200 rushing yards, whatever).
3. Cal's Record for Specific Intervals of Data (Overall record: 11-14)
These are mostly what you would expect, but a few are pretty interesting.
When TO Margin > 0: 7-0
When TO Margin = 0: 3-6
When TO Margin < 0: 1-8
Observations: Res ipsa loquitur.
Third Down Percentage:
When 3D% <= 33%: 2-8
When 3D% > 33%: 6-4
When 3D% > 40%: 5-2
When Opp 3D% <= 33%: 4-5
When Opp 3D% > 33%: 4-7
When Opp 3D% > 40%: 2-5
When 3D% > Opp 3D%: 4-3
When 3D% <= Opp 3D%: 3-8
Observations: Nothing unexpected here.
When Cal scores < 21: 0-9
When Cal scores 21-30: 2-2
When Cal scores 31-40: 3-3
When Cal scores 41-50: 5-0
When Cal scores 50+: 1-0
When opponent scores < 21: 3-1
When opponent scores 21-30: 6-5
When opponent scores 31-40: 2-3
When opponent scores 41-50: 0-4
When opponent scores 50+: n/a
Observations: Nothing unexpected here.
When Cal has < 101: 0-8
When Cal has 101-150: 6-0
When Cal has 151-200: 1-3
When Cal has 201-250: 4-2
When Cal has 250+: 0-1
Observations: Looks like our losses come when we can't get the ground game going at all, which makes sense. When we get 100+ yards, we're almost .667. And if that 1-3 looks weird, note that those 3 losses were all against USC.
When Cal has < 151: 0-4
When Cal has 151-200: 2-2
When Cal has 201-250: 5-3
When Cal has 251-300: 3-3
When Cal has 300+: 1-2
Observations: I interpret these numbers to mean that as long as we get some kind of passing game going, outcome has relatively less correlation to passing yardage than to rushing yardage.
Opponent Rushing Yards
When opponent has < 101: 5-2
When opponent has 101-150: 2-4
When opponent has 151-200: 2-5
When opponent has 201-250: 2-2
When opponent has 250+: 0-1
Observations: Seems like we have to shut down the run to win, which again makes sense. We are .333 when the opponent has more than 100 rushing yards.
Opponent Passing Yards
When opponent has < 151: 1-2
When opponent has 151-200: 2-1
When opponent has 201-250: 1-7
When opponent has 251-300: 4-2
When opponent has 300+: 3-2
Observations: Doesn't seem to have much correlation. Could the 7-4 when opponents are 250+ be because opponents throw more when behind? Could the 1-7 when opponents are 201-250 be because opponents don't need to throw as much when they have a strong running game going? Lots of possibilities here.
Total Offensive Yards
When Cal has < 301: 0-4
When Cal has 301-350: 1-3
When Cal has 351-400: 5-2
When Cal has 401-450: 2-2
When Cal has 451-500: 3-2
When Cal has 500+: 0-1
Observations: Not much of a pattern here past 350.
Opponent Total Offensive Yards
When opponent has < 301: 1-1
When opponent has 301-350: 1-1
When opponent has 351-400: 5-5
When opponent has 401-450: 2-2
When opponent has 451-500: 2-1
When opponent has 500+: 0-4
Observations: No pattern here either under 500.
4. Cal's Record in Specific Scenarios
- Win-loss doesn't really tell us anything. Obviously statistics in games that Cal lost are going to be a lot worse.
- Cal is pretty strong on the road. Cal loses a net of about 50 yards and a TD by playing away, but that's less than I expected.
- Even when Cal loses, we're not getting blown out yardage-wise. It's scoring and turnovers that are the problem.
- Cal is surprisingly strong against USC (especially keeping in mind the caliber of some of those USC teams we played). Offense clearly suffers, as does turnover margin, but it seems like our defense holds up well against them.
- It's interesting to see how the numbers change when you remove the USC games from the mix.
- The conventional wisdom about games being won or lost on the ground is true. Cal's rushing offense and defense correlates more closely to the final points margin than does passing offense or defense. It seems like that a certain baseline amount of passing yardage is necessary (makes sense, an offense has to be somewhat balanced) but above that baseline rushing yardage is more closely correlated to success than passing yardage. Of course, this could be explained either as Cal being more succesful when we are able to run the ball or as Cal running the ball more when we are already winning.
- Yardage numbers and 3rd down % don't always correlated to the final score. The most obvious explanations that come to mind for this: either team being unable to score in the redzone, special teams plays, and turnovers.
- Cal can win on the road. Cal can beat USC, too. The fact that our average statistics against them over the last 7 years are so close proves that. It's just a question of the other variables (again, things like special teams and turnovers) falling into place.
- Let's repeat that, because it seems like it might be the single biggest factor at work: turnovers. Turnovers turnovers turnovers. Here's another factoid; in our seven games against USC, Cal's final turnover margins were 0, -1, -3, -5, -2, 0, and 0.
Obviously none of these conclusions are particularly groundbreaking (most of them are pretty damn obvious), but it's always nice to see things backed up by numbers and charts and stuff. </nerd> Again, if anyone has ideas for how we can more closely examine any of these things, that would be awesome.