Cal vs. Ranked Opponents in the Tedford Era
My fanpost on Cal vs. the Pac-10 in the Tedford Era led me to one major conclusion: It's hard to figure out anything from those numbers because the sample sizes are way too small, and too easily skewed by individual games like (for example) last year's Washington and Washington State games, or by factors that don't show up in the numbers like special teams in the UCLA games (thanks norcalnick).
I thought that there was room for further exploration of what variables correlated the most closely to victory, and there were four potential areas that came to mind:
1. Use a sample consisting of all of our games in the Tedford Era. The advantage here would be that the sample would be bigger. The disadvantage is that the sample would include a lot of games against extremely strong teams (like USC) and extremely weak teams (again, like Washington State), and so would have a higher deviation (I don't know how to express this in statistical terms - whatever we come up with would be an average that may not provide much predictive value for the more important games that we want to look at).
2. Use a sample consisting of all games against ranked opponents in the Tedford Era. This would be a smaller sample, but theoretically at least one with relatively little deviation, and one that would have some predictive value as far as our tougher games go.
3. Look at specific intervals within the range of data for each variable. While not statistically significant, this could provide examples of how Cal's performance may not match its statistics.
4. Look at specific scenarios. Even smaller samples, but interesting.
I decided to look at correlation coefficients using the first sample to get a "big picture" kind of view, and then use the second sample to explore in more detail the implications of our past record against USC and other ranked teams.
[Before I go any further: big shoutout to Royrules once again for sharing his data with me, without which this post would not be possible.
Also, note that I have no real knowledge of statistics, this is just me messing around with numbers and trying to figure out what they mean. If anyone has suggestions on how to better examine this data, that would be awesome.]
1. Correlation Coefficients (all Tedford-era games) (N=89)
I used Excel to calculate the correlation between each of the following variables and the outcome of each game (defined as points margin - the most objective measure I could think of):
(Note: Close to 1 means more positive correlation, close to -1 means more inverse correlation, 0 means less correlation)
| Rush Off | 0.62 |
| Pass Off | -0.10 |
| Rush Def | -0.41 |
| Pass Def | -0.15 |
| Total Off | 0.43 |
| Total Def | -0.22 |
| Net TO | 0.55 |
A couple of thoughts here:
- The ground game, both ways, clearly correlates more closely to points margin than the passing game
- Offense correlates more closely than defense. Could this be because our defense is fairly consistent, and victory or defeat depends more on how the offense does?
- Passing offense has a negative correlation to points margin. Is this because we abandon the run and start airing it out when we're behind, like the 08 Maryland game where Riley ended up with 423 passing yards?
Of course, these numbers don't tell us much out of context You can argue that a 7 point win over USC is more impressive than a 21 point win over Pradesh A&M, which is why I decided to look at the second set of data: ranked opponents only.
2. Correlation Coefficients (Tedford-era games against ranked opponents) (N=25)
| Rush Off | 0.39 |
| Pass Off | 0.06 |
| Rush Def | -0.26 |
| Pass Def | -0.05 |
| Total Off | 0.37 |
| Total Def | -0.28 |
| Net TO | 0.70 |
| 3D% | 0.36 |
| Opp 3D% | -0.39 |
Observations:
- The passing game's correlation to points margin dwindles to almost nothing
- Turnovers are HUGE - almost twice the correlation coefficient of any other variable
Now you may be thinking: maybe these correlations are just flukes, given the small sample size. I was curious about that as well, so I decided to break down our record against ranked opponents based on individual variables into intervals. That will allow us to see what happened in all cases where variable = x (positive TO margin, more than 200 rushing yards, whatever).
3. Cal's Record for Specific Intervals of Data (Overall record: 11-14)
These are mostly what you would expect, but a few are pretty interesting.
Turnover Margin:
When TO Margin > 0: 7-0
When TO Margin = 0: 3-6
When TO Margin < 0: 1-8
Observations: Res ipsa loquitur.
Third Down Percentage:
When 3D% <= 33%: 2-8
When 3D% > 33%: 6-4
When 3D% > 40%: 5-2
When Opp 3D% <= 33%: 4-5
When Opp 3D% > 33%: 4-7
When Opp 3D% > 40%: 2-5
When 3D% > Opp 3D%: 4-3
When 3D% <= Opp 3D%: 3-8
Observations: Nothing unexpected here.
Points:
When Cal scores < 21: 0-9
When Cal scores 21-30: 2-2
When Cal scores 31-40: 3-3
When Cal scores 41-50: 5-0
When Cal scores 50+: 1-0
When opponent scores < 21: 3-1
When opponent scores 21-30: 6-5
When opponent scores 31-40: 2-3
When opponent scores 41-50: 0-4
When opponent scores 50+: n/a
Observations: Nothing unexpected here.
Rushing Yards
When Cal has < 101: 0-8
When Cal has 101-150: 6-0
When Cal has 151-200: 1-3
When Cal has 201-250: 4-2
When Cal has 250+: 0-1
Observations: Looks like our losses come when we can't get the ground game going at all, which makes sense. When we get 100+ yards, we're almost .667. And if that 1-3 looks weird, note that those 3 losses were all against USC.
Passing Yards
When Cal has < 151: 0-4
When Cal has 151-200: 2-2
When Cal has 201-250: 5-3
When Cal has 251-300: 3-3
When Cal has 300+: 1-2
Observations: I interpret these numbers to mean that as long as we get some kind of passing game going, outcome has relatively less correlation to passing yardage than to rushing yardage.
Opponent Rushing Yards
When opponent has < 101: 5-2
When opponent has 101-150: 2-4
When opponent has 151-200: 2-5
When opponent has 201-250: 2-2
When opponent has 250+: 0-1
Observations: Seems like we have to shut down the run to win, which again makes sense. We are .333 when the opponent has more than 100 rushing yards.
Opponent Passing Yards
When opponent has < 151: 1-2
When opponent has 151-200: 2-1
When opponent has 201-250: 1-7
When opponent has 251-300: 4-2
When opponent has 300+: 3-2
Observations: Doesn't seem to have much correlation. Could the 7-4 when opponents are 250+ be because opponents throw more when behind? Could the 1-7 when opponents are 201-250 be because opponents don't need to throw as much when they have a strong running game going? Lots of possibilities here.
Total Offensive Yards
When Cal has < 301: 0-4
When Cal has 301-350: 1-3
When Cal has 351-400: 5-2
When Cal has 401-450: 2-2
When Cal has 451-500: 3-2
When Cal has 500+: 0-1
Observations: Not much of a pattern here past 350.
Opponent Total Offensive Yards
When opponent has < 301: 1-1
When opponent has 301-350: 1-1
When opponent has 351-400: 5-5
When opponent has 401-450: 2-2
When opponent has 451-500: 2-1
When opponent has 500+: 0-4
Observations: No pattern here either under 500.
4. Cal's Record in Specific Scenarios
Averages
| Stat | Avg | Home | Away | Win | Loss | USC | Oregon | Not USC |
| Pts | 29.4 | 28.80 | 26.24 | 39.45 | 21.43 | 16.43 | 30.50 | 34.39 |
| Opp Pts | 27.9 | 22.30 | 27.94 | 21.55 | 32.93 | 25.14 | 22.75 | 29.00 |
| Total | 57.3 | 51.10 | 54.18 | 61.00 | 54.36 | 41.57 | 53.25 | 63.39 |
| Net | 1.44 | 6.50 | -1.71 | 17.91 | -11.50 | -8.71 | 7.75 | 5.39 |
| Rush Off | 152 | 157.80 | 130.35 | 172.45 | 135.50 | 126.14 | 173.75 | 161.72 |
| Pass Off | 231 | 211.20 | 215.76 | 241.09 | 223.43 | 208.43 | 194.50 | 240.06 |
| Rush Def | 146 | 137.90 | 133.18 | 120.82 | 165.29 | 147.86 | 152.50 | 144.89 |
| Pass Def | 264 | 218.40 | 259.18 | 269.82 | 258.71 | 225.86 | 227.50 | 278.28 |
| Total Off | 383 | 369.00 | 346.12 | 413.55 | 358.93 | 334.57 | 368.25 | 401.78 |
| Total Def | 409 | 356.30 | 392.35 | 390.64 | 424.00 | 373.71 | 380.00 | 423.17 |
| Net TO | 0.16 | 0.30 | 0.06 | 1.91 | -1.21 | -1.57 | 0.50 | 0.83 |
| 3D% | 0.35 | 0.33 | 0.30 | 0.42 | 0.30 | 0.39 | 0.36 | 0.32 |
| Opp 3D% | 0.37 | 0.31 | 0.35 | 0.33 | 0.40 | 0.38 | 0.32 | 0.36 |
Observations:
- Win-loss doesn't really tell us anything. Obviously statistics in games that Cal lost are going to be a lot worse.
- Cal is pretty strong on the road. Cal loses a net of about 50 yards and a TD by playing away, but that's less than I expected.
- Even when Cal loses, we're not getting blown out yardage-wise. It's scoring and turnovers that are the problem.
- Cal is surprisingly strong against USC (especially keeping in mind the caliber of some of those USC teams we played). Offense clearly suffers, as does turnover margin, but it seems like our defense holds up well against them.
- It's interesting to see how the numbers change when you remove the USC games from the mix.
Overall Conclusions
- The conventional wisdom about games being won or lost on the ground is true. Cal's rushing offense and defense correlates more closely to the final points margin than does passing offense or defense. It seems like that a certain baseline amount of passing yardage is necessary (makes sense, an offense has to be somewhat balanced) but above that baseline rushing yardage is more closely correlated to success than passing yardage. Of course, this could be explained either as Cal being more succesful when we are able to run the ball or as Cal running the ball more when we are already winning.
- Yardage numbers and 3rd down % don't always correlated to the final score. The most obvious explanations that come to mind for this: either team being unable to score in the redzone, special teams plays, and turnovers.
- Cal can win on the road. Cal can beat USC, too. The fact that our average statistics against them over the last 7 years are so close proves that. It's just a question of the other variables (again, things like special teams and turnovers) falling into place.
- Let's repeat that, because it seems like it might be the single biggest factor at work: turnovers. Turnovers turnovers turnovers. Here's another factoid; in our seven games against USC, Cal's final turnover margins were 0, -1, -3, -5, -2, 0, and 0.
Obviously none of these conclusions are particularly groundbreaking (most of them are pretty damn obvious), but it's always nice to see things backed up by numbers and charts and stuff. </nerd> Again, if anyone has ideas for how we can more closely examine any of these things, that would be awesome.
The opinions expressed in a FanPost are, in every way, reflective of the opinions of every California Golden Blogs Marshawnthusiast. Moreover, they are reflective of every employee of SBNation, including Tyler "Blez" Bleszinski.
12 comments
|
7 recs |
Do you like this story?
Comments
Rec’d for awesomness, flagged for misspelling the word “era”
ALL HAIL SUPREME LEADER AVINASH!
www.CaliforniaGoldenBlogs.com
Good insights, couple of thoughts
These stats could support the idea that the run game is far more critical to any success Cal has than passing, which could take some of the onus off the poor quarterback play the past few seasons.
*USC and Maryland we couldn’t run the ball at all last season
*UCLA, Furd, ASU in 07 our performances were similarly underwhelming
*All of our losses in 06 had a lot to do with inability to run the football.
However I find it hard to believe that QB play plays no part in how well our passing offense produces points. I’d have to say that passing yardage might not be the stat you want to be looking at (since yardage is not the crucial part of the passing game the way running is), but maybe passing yards per attempt per game, or completion percentage.
Contact if you want to chat: bearsnecessities@gmail.com
by Avinash Kunnath on Sep 10, 2009 2:15 PM PDT reply actions
That’s a good point.
It would be really interesting to look at rushing yards per carry, passing yards per attempt, run-pass ratio, etc. but unfortunately Royrules’ data does not include those categories, and I don’t have time right now to input them for 80-something games. Maybe as a longer-term project.
dboneisloose
To add on to that
Situational passing may matter as well.
The relative value of completions in the first half might be greater than in the second half due to its effect on the defense with respect to the running game.
The hypothesis would run something like this: If the passing game flounders in the first half, the defense is able to key in on the running game. The running game being less effective, the team may fall behind and the offense may have to abandon the run game in order to keep itself in the game. The effect being predictability in play calling in both halves. Ineffective passing in the first half allows the defense to concentrate n the run. Ineffective run game plus point differential plus time constraints in the second half permits the defense to employ more effective pass defenses.
another item to look at might be scoring by quarter
and off/deff production by quarter to get an idea of how we win and when it matters to play our best best best.
Go Bears Go
Interesting post
Did you calculate these correlation coefficients by constructing a new model for each variable, or are these coefficients from multiple linear regression models? If the former, it would be interesting to try different combinations of these variables to see if there is any interaction… I’d suspect that rushing/passing yards may interact, for some of the reasons you gave above. Given the small sample size though, don’t think it would be wise to include too many variables more in any given model.
No – I just used Excel’s correlate function to compare two columns of data. I would love to put together a model using multiple variables, but I don’t know nearly enough about statistics.
dboneisloose
by HolmoePhobe on Sep 13, 2009 10:20 PM PDT up reply actions
0-8 in <100 yrd rushing!
This sort of shocked me to know we have had 8 games where we rushed below 100. Thats a third of the games in this sample. I’m beginning to understand our slight underachievement these past couple years.

by 



















































