Cal ranked #13: Stupid computer polls
The post about ballots definitely piqued my interest, because it raised the issue of how people don't trust the computer rankings. I tend to believe them, because I'm an EECS grad. I've been using the Sagarin rankings the past few weeks to discuss the football rankings with friends, and while surprises pop up, you can always dig into them and come up with an explaination for them. And don't forget- serious sports gamblers tend to watch the computers more than they do human polls.
Fundamentally, the computer rankings draw up a graph of all the football teams out there and connect them. Teams start out with a base ranking at the start of the season (it's obvious USC is better than App State, etc). Their ranking is then adjusted by their resume- how good or bad the teams they beat or lost to were. Simple enough, and that mirrors a standard "resume" poll.
Here's where it gets hard to follow for any human. The computers don't just take the resumes of who Cal beat into account. They also take into account the resumes of teams like Notre Dame and Michigan, because MSU beat ND ad ND beat Michigan. And this goes on and on. Cal has been doing extremely well for this reason. It gets even more fun when say, Cal beats UCLA, UCLA beats USC, and USC beats Cal. Not like that ever happened though. The computers are smart enough to take this into account. Does this work? Well if you have any doubts about such a system, consider this: Google's famous and incredibly successful PageRank algorithm uses the same principle.
So that's the basis for most computer polls. The variations between computers polls are due to how computers weigh each win or loss. You can account for things like home field advantage, margin of victory, crucial injuries, week of the season, how long has it been since you played another team, bye weeks, rushing yards vs rushing yards allowed, etc. The possibilities are endless, and can include pretty much everything sports analysts babble about every weekend.
Okay, so that explains computer polls. Let's take a look at them- below are this week's Sagarin rankings. I've put an asterik next to Pac-10 teams, and also highlighted some notable rankings.
Sagarin rankings, end of September: http://www.usatoday.com/sports/sagarin/fbt08.htm
The Top 20
1. Alabama
2. Oaklahoma
3. Southern California* (note the small drop)
4. Penn State
5. Texas
6. Boise State
7. LSU
8. BYU
9. Georgia
10. Utah
11. Florida
12. TCU
13. California* (!?!?)
14. Vanderbilt
15. Georgia Tech
16. Virginia Tech
17. Oregon*
18. Texas Tech
19. Wisconsin
20. Ohio State
Notable Others
23. Oregon State*
27. Michigan State
36. Arizona*
41. Maryland
45. Michigan
47. Stanford* (Stanford > ASU, ladies and gentlemen)
65. Arizona State*
83. UCLA*
90. Washington*
92. Cal Poly-SLO (Cal Poly > WSU)
136. Washington State* (Yeah.)
So the first thing you might notice is that these rankings look almost alien. A lot of top twenty teams remain the same with different orders, but wait...Cal is in the top 15?!? Stupid computers.
Rankings to note
But there may be a method to this madness. Lets analyze why some of these rankings differ from the human polls.
3. USC: Wow, they only dropped 2 spots! But their strength of schedule is ranked #7, and the next team (Penn State) has had a fluff schedule so far. Simple enough to see.
13. Cal: Okay, the #13 ranking sounds ludicrous. But the key thing to note here is that Cal was ranked #18 last week. We rose five spots due to strong performances by MSU and Maryland in addition to our own win, but that's something pollsters have ignored. The pollsters accurately shuffled MSU and Maryland up, but didn't consider that also makes Cal a stronger team. By not being on the radar we don't budge. Even the CGB guys are reluctant to move us up that far due to True Blue syndrome.
14. Vandy. Okay, maybe the pollsters aren't that insane after all. We'll see if they stay here, and remember, this is a resume style poll, not a power poll. This is only based on preseason predictions and performance so far.
17. Oregon: We'll need to watch out for Oregon, no surprise.
27. MSU: Michigan State is the top team we've played so far, and we're getting a lot of good karma from them. And they're ranked high because their only loss is Cal, another top caliber team.
47. Stanford is ranked #5 among Pac 10 teams! So despite the Cardinal sucking it up the past few years, last year's Big Game was a sign of things to come. This year's Big Game will be no cupcake.
65. ASU is ranked terribly. And no surprise here- they lost to ULNV, after all. The computers show them very little mercy for it. To me, this is good news because we don't have Best and ASU is clearly suffering.
92. Cal Poly/136. WSU: WSU is bad. Real bad. Bad enough that twenty five (yes, 25) I-AA teams are ranked higher than WSU. Words do not describe the incredible suckage here. High time we get Fresno State or Boise State in the Pac-10 to replace them? You decide!
Concluding thoughts
As you can see here, you can get some real useful analysis from the computers, even though the first instinct is to distrust them.
I tend to think the human polls are bogus. Because really, every Saturday over 30-40 top caliber games are played. TV and press coverage is at best uneven (just think about how much love (or lack thereof) Cal gets. As armchair pundits, we can debate the implications of a game like Oregon State vs USC this past week all we want, but the reality is that we're all very short sighted. We only look skin deep and we don't take into account the ripple effects something like that have- for example, teams that played Oregon State should have their stock increased. Likewise, Ohio State looks that much worse for getting blown out by USC. But it's near impossible for a human to take all these small but important side effect into account, because there are literally hundreds of implications every time one team beats another, and even thousands more for all games played.
Are the human polls useful? Yes, absolutely. On average, humans can do a better job of analyzing things not encapsulated in a single score. For example, the computers don't know much about Dennis Dixon, and they didn't know that his injury meant the Ducks would collapse this year. Humans are probably better at predicting things, but computers are better at analyzing a team's resume.
If you distrust the computer polls, consider this: every single year that I've paid attention to the computers, they've ranked Cal higher than the human polls. And every year they rank Pac-10 teams higher than the humans do. Go figure.
So what do you think? Are the computers really all that strange?
The opinions expressed in a FanPost are, in every way, reflective of the opinions of every California Golden Blogs Marshawnthusiast. Moreover, they are reflective of every employee of SBNation, including Tyler "Blez" Bleszinski.
74 comments
|
7 recs |
Do you like this story?
Comments
Luddites Anonymous
Distrust toward computer polls stems in part because humans analyze college football very differently than computers do. Human brains do not do particularly well when they are asked to work as parallel processors, at least not on the conscious level. On the other hand, college football seasons are like the intersecting concentric circles created when you threw two stones in a pond. They meet in ways that no human can fully take in or observe at once. The fact that computers, or humans using spreadsheets, can perform this task is an excellent supplement because it compensates for our chief weakness.
This is not to say that computers have no flaws either. As you noted a lot can change based on the weight that you assign particular metrics. It would be an interesting – and monumental – effort to test enough different data points that you could determine the dp with the highest correlation to victory. Is it offensive yards? Is it turnover margin? Is it red zone conversions? Or is it a hidden stat, such as the one that Ivan Maisel suggested, aggregate number of starts by the starters on the offensive line? This is a research effort that I am very happy to delegate.
Nor is assigned weights the computer polls only problem. Computer polls fail to take into account program prestige, which I can tell you was one of the most important techniques I used to develop my year-over-year #1 recruiting classes in NCAA Football 2006.
Finally, there is another more sinister source of our distrust, one that we are all loathe to acknowledge but secretly know is true. We distrust the computers because we know that in 2029 they will rule the world, thanks to Skynet, a creation of Cyberdyne Systems. Remember gentlemen, first they came for the AP poll.
P.S. First!
Someone actually did this in economics in a paper called “I just ran 1,000 regressions”. He tried to identify the variables that had the highest impact on GDP growth, one of the holy grails of economics with lots of competing theories and approaches.
The researcher basically took all the growth-related variables ever tested and ran regressions with every imaginable combination and then ranked the variables according to how often they were significant.
If you put together the data set (this is always 98% of the work), I’ll run the regressions!
Although the computer methodology is a nice component, it should never be the sole component in determining how you rank your top . Unlike baseball, where the Moneyball strategy is quite useful because the variables are simple to analyze (hitching to pitching is two dimensional at best), football requires measuring a number of variables, such as offensive blocking, receiver management, how the quarterback goes through his progressions, how quickly linebackers can get to the edge, etc. There’s still a wide gap of uncertainty in measuring a team solely by statistics. The human viewpoint is necessary in the college ranks, where a championship must be determined by polls.
Football Outsiders has made some nice inroads, but it’s still a long way from making the proper measurements of the value of a team.
*the sole component in determining how the top 25 is constructed.
by BearsNecessity on Sep 30, 2008 8:00 PM PDT up reply actions
As a PoliSci major, I flagged this post.
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
Does this work?
Well if you have any doubts about such a system, consider this: Google’s famous and incredibly successful PageRank algorithm uses the same principle.
Ever notice that Google’s first few entries for a search are really good and that they get progressively bad as you flip through the pages? Ever wonder why? Basically it’s just noise after the first few entries just like with this ranking.
That said, awesome post.
Stanfurd Delendum Est.
You’re just noise after the first few entries.
It’s times like this I wish they’d never discovered CougCenterium.
The Maharg hasn’t held forth on the Riley v Longshore debate. I suspect that given the low company he keeps he leans in the Longshore direction. Please, do tell.
Stanfurd Delendum Est.
I’m pretty sure he means Kevin Riley’s family, who you apparently tuck your jersey in with all the time.
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
interesting post.
I remember a Terry Bowden column on yahoo over the summer, where he was talking about the things good teams did well.
When he was coaching he would look at the year end stats of the best teams (highly ranked won bowl games etc.) and see what they did statistically well. This past year the best teams had a) good rush defense, b) high scoring offence (not necc pass or rush leaders), and c) good turnover margin. The top 10 were in the top 20 to 25 of each of those categories. other categories didn’t seem to matter as much.
The Top 20 teams also did great keeping their ratio of Terry Bowded to non-Terry Bowden coaches low in there.
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
true enough
but as a way to measure your program, trying to do better what the best teams do well seems reasonable.
by Rocksanddirt on Oct 1, 2008 12:28 PM PDT up reply actions
Excellent post
Since I prefer the “resume” polling philosophy over the who-would-beat-whom-on-a-neutral-field-gut-feel approach, I very much am a fan of the computer polls—especially at the end of the season when they have a lot more data points to work with.
Year in and year out, the Sagarin rankings seem the most on-target to me, although as he explains, his numbers still include this week a subjective component based on his pre-season start values. Thus, it’s pretty hard to justify the gap he has between Cal (#13) and Maryland (#41), and if both teams keep winning, I suspect the gap will close considerably as his formula becomes non-biased.
It’s interesting to note that Cal is actually ranked #8 in Sagarin’s preferred “Predictor” model, which takes into account margin of victory and predicts a 21-point win this Saturday for the Bears over the Devils. That remains to be seen, but there is a (homerish) side of me that thinks Cal isn’t that overrated by Sagarin right now. The struggles on offense, and in particular the passing game, are obvious to us. But we’re forgetting what a fantastic start the defense is off to. By my figuring, the defense has played only a single bad half of football—the first against Maryland—and that wasn’t so bad as to knock the Bears out of the game. Indeed, it only would have taken a correct Safety call here, and a key third-down stop (or two) there, and Cal is probably still looking at an undefeated season. Beyond that game, the week-one shutdown of Javon Ringer looks more and more impressive every week, and holding WSU and CSU to six and seven points respectively—despite having an offense that provided very little in terms of Time of Possession—is likewise worthy of note, no matter how bad those teams might be.
Is Cal a Top-Ten team in my subjective estimation right now? No. But Top 20 seems quite reasonable, particularly from a “resume” point of view, so I’m inclined to believe the computers more than the pollsters, who are downgrading Cal simply because of the failed promise of post-season 2004, pre-season 2006, and post-Oregon 2007. Thrice burned, Cal would have to win the Rose Bowl before some voters would be willing to put the Bears back into their Top 10. The good thing about the computer polls, they’re not programmed to have such carry-over memory from the past.
It’s also interesting to check out the Sagarin ratings by conference:
http://www.usatoday.com/sports/sagarin/fbc08.htm
Yes, he has the SEC in first, and No, he does NOT have the Mountain West (#7) ahead of the Pac-10 (#5). But he does have the Mountain West scoring in the same range as the 6 BCS conferences, with a huge drop-off to C-USA in 8th. There’s obviously room for argument here, given the Mountain West’s head-to-head record against the Pac-10. But keep in mind that his rankings emphasize a true average—the teams at the middle of each conference—rather than the marquee teams at the top.
Go Bears!
by California Pete on Oct 1, 2008 12:03 PM PDT reply actions 1 recs
Isn’t it because Cal’s loss to Maryland is infinitely less embarrassing than Maryland’s loss to Middle Tennessee State?
by BearsNecessity on Oct 1, 2008 12:49 PM PDT up reply actions
Certainly. And I also would imagine that Sagarin’s algorithm—particularly the one that includes margin of victory—sees the Bears’ three victories as collectively more impressive than Maryland’s.
One of the strengths, and weaknesses, of computer “polls” is that they don’t care about head-to-head results (e.g., Maryland should be ahead of Cal because Maryland beat Cal) in the way that human pollsters very much do. While I am in no way a fan of the BCS, at least they finally got in right in devising a ranking system that more or less balances the human and computer points of view, because each sees something that the other cannot.
Go Bears!
by California Pete on Oct 1, 2008 2:23 PM PDT up reply actions 2 recs
Thank you for the post!
Personally, I am glad that the BCS has the sense to use computers at all. They can obviously see statistical variation no human has time or energy to sift through. But then again I am also glad they know to put in the human element. If only they included the Blogpoll, though.
I have a question, though:
Teams start out with a base ranking at the start of the season (it’s obvious USC is better than App State, etc).
Who/what determines this base ranking? 2007? Sagarin? The AP poll? Monkeys? The Maharg? Because as I see it that ranking is pretty darn crucial, and the computer bases it off zero statistics gathered for the 2008 season…
Sagarin determines the starting rating
I couldn’t say what method he uses. However, the starting values are phased out once Sagarin has a completely connected graph to work with, so by mid-season, all of the preseason base rankings are thrown out. So really, the preseason ranking doesn’t matter too much.
So, basically, you gotta Go Bears!
I actually really like computer polls
(surprise, surprise, another EECS grad here…)
There’s a reason we link to Sagarin’s rankings just above our own Top 25 on the ‘Polls and Rankings’ widget; I generally really his rankings, and in the past have often referenced them in my posts. However, it’s important to understand the limitations of computer rankings, as several posters have mentioned: they can’t watch the games, only the box scores. I think it’s important to look at both expert human opinions and computer-generated rankings. To disregard either would be to ignore valuable information.
I think an interesting method of coming up with a Top 25 would be to start with a computer ranking, and then make adjustments from there. Instead of starting with preseason preconceived notions of strength, we’d start with an unbiased ranking and make “corrections” for data that the computer is unable to see. Perhaps I’ll try that one of these weeks.
So, basically, you gotta Go Bears!
I am a PoliSci major and wonder why all these computers are being used for non-porno purposes. What a waste of bandwith!
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
FIRST!
Am I doing this right?
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
Technically you are correct, you did first it first.
It’s times like this I wish they’d never discovered CougCenterium.
I want to first you in the nothing.
It’s times like this I wish they’d never discovered CougCenterium.
As long as you don’t fist me anywhere (nothing or otherwise), I’ll be fine.
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
Oh, baby, you can firrst me all night long!
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
Isn’t it always, The Maharg? Isn’t it always?
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
I tend to think the human polls are bogus.
Based on that statement, I would assume you think the BCS was justified in leaving USC out of the BCS title game after the 2003 season over the objections of the pollsters. Correct?
So, basically, you gotta Go Bears!
we hate usc around here
if they get screwed over, excellent
As a Haas alum...
I think you EECS majors spent way too much time with your heads stuffed inside a textbook. Put the star trek toys down, turn down the Rachmaninoff blaring the background (ok, business majors listen to classical too from time to time), stop gulping your Mountain Dew, and recognize that controversy makes money, and that if we left everything to computers and the computer scientists that build algorithms, then there’d be no controversy and less money pumped into the machine that is sports today. Think about it….what gets us angry and excited, and hopeful and debating….its the subjectivity of bad referee calls on the field, the coaches poll, the AP poll, all that subjectivity makes for good conversation. If we left everything to computers and EECS majors…….

I left my heart at the Durant Food Court
I'm a gEECS major and...
I’ve never watched Star Trek in my life
The only sci-fi I like is the book 20,000 Leagues Under the Sea and the original Star Wars
The only fantasy I like is Harry Potter
I hate Mountain Dew. It kills your sperm. Beer is my alternate
I don’t listen to classical
I love money
Algorithms >> you
In other words, Go Bears!
Is this another round of ‘fill in the rest of this sentence?’
If we left everything to computers and EECS majors…….
porn sites would also have menus for you to order pizza.
as a haas alum too...
i once read a great quote in a product design book
“if you let an engineer design a nightclub, the bathrooms would be sparking clean, the room well lit, nobody would ever crowd you, there’d always be seating, no lines, and drinks would cost $1”
Here's an algorithm I'm sorta thinking off
I haven’t tested it and I don’t know if it’ll even terminate. It’s just a speculation.
Let’s have a directed graph with nodes for each D-IA team that start out at 0 rating. An edge will lead out of node x to node y iff the x defeats y. At the end (after graph is drawn out) we add up the the ratings of all teams defeated by a node i (outgoing edges) plus one and subtract from that ratings of all teams that defeated node i (incoming edges). Applying this algorithm each weeks allows for the ratings to converge to the correct or optimal rating for each team. And of course the top 25 teams are the ones with the top 25 ratings.
This could be modified with a bias term and possibly weightings depending on the final score, injuries, covering a spread, etc.
The inspiration for this idea came when I stumbled upon my CS188 (AI) project from last semester on neural networks/reinforcement learning.
Again emphasis: this could all be bullshit
In other words, Go Bears!
by royrules22 on Oct 3, 2008 12:04 AM PDT reply actions 1 recs
If I didn’t have research, CS184 (graphics) ray tracing project and CS162 (OS) project I’d have tried implementing this
In other words, Go Bears!
I’m calling bullshit on this one
I left my heart at the Durant Food Court
by dballisloose on Oct 3, 2008 10:00 AM PDT up reply actions
What?
Like I said it’s something that I randomly came up with and I have no clue if it will even fucking work
In other words, Go Bears!
Relax, I’m just being a jerk. I’m cranky. I’m ready for tomorrow’s game. If a fellow Cal alum came up with an algorithm that changed the way college football rankings work, I’d be thrilled. Its just that at this moment, I don’t care. And I don’t think its personal towards you or your CS projects. Though it might be personal, I’m a little bitter that my laptop isn’t working right, and I’m going to direct my frustration at the EECS of the world.
I left my heart at the Durant Food Court
by dballisloose on Oct 3, 2008 12:38 PM PDT up reply actions
Come up with an algorithm so that when Nate throws a pick, the ball bounces off everyone’s hands.
And when Kevin Riley throws it to a WR, it becomes sticky.
Rec’d for hurting my brain! Any post that hurts my brain should be rec’d!
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
by TwistNHook on Oct 3, 2008 7:34 AM PDT up reply actions 1 recs
You rec’d the wrong post!
I'm no The Maharg! But I try. Oh, how I try!
www.CaliforniaGoldenBlogs.com
by TwistNHook on Oct 3, 2008 9:30 AM PDT up reply actions 1 recs
i'm not sure
but i think that’s what Sagarin does. some others might do that too.
So, basically, you gotta Go Bears!
yeah sagarin is a secret formula...
…which is half the problem with the computer polls. they’re secret.
it’s reasonable to assume he uses the base strategy that i outline though, given that the description he provides has terms like “bayesian” and “connected”
well shucks, i've been MIA
made this post, and then work swamped me for the next two days. thanks for all the recs though, and i’ll respond to some of these comments tomorrow.
for now, gotta drive up to berkeley tonight :)

by 



























































