clock menu more-arrow no yes mobile

Filed under:

Memorial Stadium Attendance: When Do Fans Show Up and When Do They Stay Home?

When does Memorial Stadium host a sellout crowd and when are there more seagulls than fans?
When does Memorial Stadium host a sellout crowd and when are there more seagulls than fans?

Dating back to 2004, Cal's Memorial Stadium has hosted only eight sellouts in its thirty-eight home games.  In the past two seasons, Memorial has only sold out one game.  Cal faces no shortage of competition, as it competes with two professional football teams, two baseball teams, and a basketball team.  Still, when a #12-ranked Cal team and its Heisman candidate Jahvid Best take on Maryland to redeem themselves after 2008's embarrassing loss, why is it that only 62,367 fans show up?  Cal had room to fit 10,000 more butts in the seats, so why didn't Memorial sell out?

To figure this out, I looked over the last six years' worth of attendance data at Memorial Stadium.  I hope to find some patterns that explained when Memorial sold out and when, like at the Maryland game, an exciting team played an important game yet still had 10,000 empty seats.  After the jump we'll look at which factors contribute to sellouts and which do not.  We'll do this in reverse order, though.  I'll spoil the ending (talk about which variables strongly affected turnout rates) and then walk you through how I came to these conclusions.

Overall, we'll look at what brought fans to games, what didn't seem to affect turnout, and what actually seemed to keep fans away from Memorial Stadium.  With these results, I can try to approximate attendance at each Cal game for the upcoming season.  We'll have to hold off on that, though, until preseason rankings and other factors for the 2010 season are finalized.

I suppose you can't spoil an interesting story if you don't know what the story is about.  So first, let's take a quick look at some attendance numbers.  Below is a chart with attendance from every home game during the 2004 through 2009 seasons (I couldn't get enough data on the 2003 season and earlier).  You can easily spot the eight sellouts (highlighted gold).  Notice that most of the other games brought between 55k and 63k fans.

(if SBN's autosizing made this too small, click on it for a larger version)


It's pretty clear which games tend to sell out (USC, Stanford, UCLA).  Other than that, nothing immediately jumps out at me.  Our fans seem rather unpredictable.  They'll show up in respectable numbers to see a team they've never heard of (Eastern Washington: 58,083) but fewer go watch Cal battle a ranked Pac-10 foe in an important conference game (Arizona 2009: 53,347).

Year-to-year, the average attendance rates seem to tell a straightforward story: more people will show up to see a high-quality team...or do they?

2004: 64019
2005: 60377
2006: 64317
2007: 63136
2008: 61633
2009: 59471

2004 and 2006, Tedford's best teams, brought the most fans.  They also had the advantage of having two games that are almost guaranteed sellouts: UCLA and Stanford.  So which is it, are the numbers influenced by the solid teams or by the games fans favor?

Well, here's the part where I go and ruin the story for the movie while you're still waiting in line.  [SPOILER ALERT]: Fans seem to be most influenced by 1) whether Cal is playing a rival (Stanford, UCLA, or USC) and 2) opponent quality.  To a lesser extent they are influenced by 3) whether Cal is highly ranked, 4) how well Cal played the previous year, and, to a much lesser extent, 5) whether Cal lost to the opponent the previous year.


Let's look at these factors in more detail (things are not as simple as they seem):

The best predictors of positive fan attendance:

1) Rivals (+10k-12k fans)

People turn out en masse to see Cal play Stanford, USC, and UCLA.  Of the last eight sellouts, three belong to USC, two belong to Stanford, and one belongs to UCLA.  Even when Cal did not sell out against those opponents, Memorial was usually packed (70,000+).  People turn out in large numbers to see Cal take on its institutional rivals and its conference rival (a game which is often pivotal in deciding the conference standings).

2) Opponent quality (+10-15k fans for a top-15 team)

This is intuitive enough, right? The only two sellouts since 2004 that weren't against one of the rivals came when formidable opponents came to town (#11 Oregon 2006, #15 Tennessee 2007).  This bonus in fan turnout is only really noticeable against top-15 opponents though.  Top-25 opponents do not bring significantly more fans than unranked teams.  For examples see ASU 2004 (52,652), ASU 2006 (58,024), Michgan St 2008 (62,956), Oregon 2008 (61,432) and Arizona 2009 (53,347).  Except for the 2008 games, those were all well below the season average.  Even the 2008 games were barely above average.

3) Whether Cal is highly ranked (+6-8k fans if Cal is a top-5 team)

When Cal was a top-5 team, significantly more fans showed up than when Cal was ranked 25th through 6th or unranked altogether.  I was surprised that a top-25 or even top-15 Cal team wasn't a significant predictor, but I suppose that explains why Cal-Maryland 2009 had "only" 62k show up.


So far it looks like this is all obvious stuff, doesn't it?  You don't need any statistics to tell you that people like to watch a) highly ranked teams and b) rivalry games.  You can get that all with everyone's favorite statistical test: the interocular percussion test.  What is most interesting is which factors do not predict attendance and which factors are associated with fewer people showing up at games.

What plays an unclear role in predicting attendance?
  • Cal's winning percentage during the current year
  • Opponent's winning percentage during the current year and previous year

What does not play a significant role in predicting attendance?

  • Opponents being ranked 25th-16th
  • Cal being ranked outside the top-5 or unranked altogether
  • "A" or "B" non-conference opponents

Counterintuitively, fans don't seem to turn out in greater numbers when Cal is ranked outside the top-5.  Additionally, they don't seem especially interested in watching teams outside the top-15 or non-conference foes from the BCS conferences.


Which factors are associated with lower attendance numbers?

 How well Cal played last year ( -1500 per win)

Believe it or not, this worked in the opposite direction!  When Cal was great the previous year, people showed up in lower numbers the following year.  And when Cal was mediocre the previous year, fans showed up in greater numbers the next year.  Don't read too much into this, though.  This is mostly explained by solid teams being followed by disappointing teams (2004 to 2005, 2006 to 2007).

Revenge! (-2k-3k fans)

    When Cal lost to a team the previous year, it seems like fewer fans showed up when that team came to Memorial the next year.  This sounds like a case of Typical Pessimistic Cal Fan Syndrome: perhaps some Cal fans just can't bear to see the team beaten twice in a row by an opponent.


From this point onward, I'll explain how I derived these conclusions.  If you're interested in statistics and/or data analysis this will be a good insight into the results.  Otherwise, be warned: Here be numbers!


Most of the numbers came from with a handful of others coming from ESPN.  I gathered attendance numbers and, because the data worked for it, used a regular ol' regression model with a variety of variables.

First we'll look at my variables and then I'll post the wholesome, juicy regression tables.

I ran several regressions with a combination of the following seventeen variables (they're all fairly self-explanatory).

Top-25 opponent
Top-15 opponent
Top-5 opponent
(and I also tried collapsing this into the variable "ranked")

Cal top-25
Cal top-15
Cal top-5
(similarly, I tried collapsing this into the variable "Cal ranked")

(I collapsed these into a "rival" variable for a few regressions)

Revenge: opponents qualified for this if they defeated Cal the previous year

Opponent's win percentage the previous year
Opponent's win percentage during the current year

Cal's win percentage the previous year
Cal's win percentage during the current year

Non-conference "A" or "B" opponent

Pac-10 opponent

Now let's take a look at the regression table for the model with the most variables.  In case you forgot everything from Stats 20, here's a quick refresher.  To interpret this table, look at the sums of the intercept (the baseline attendance number, 58468) and the coefficients (how much each variable contributes to attendance numbers).  The coefficients (in the first column) approximate how many people each variable contributes to attendance.  So if Cal is playing a top-15 opponent, you can guess that around 16,516 more people will show up than if Cal was playing an unranked opponent.  On the other hand, if a coefficient is negative (like Cal's winning percentage the previous year), then that many fewer people will show up at any given game.

 If the p-value is significant (ie if it has a * by it), then that variable predicts attendance better than random chance).  The coefficients that are not significant should not be disregarded, but take their coefficient estimates with a grain of's likely they're influenced by random chance.

Here is a regression with the whole slew of variables.  Interpret away!


             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  58468.14    4956.44  11.796 2.64e-09 ***
OpponentT25  -2789.09    2207.68  -1.263 0.224559    
OpponentT15  16516.16    3476.28   4.751 0.000217 ***
OpponentT5    1058.07    3473.54   0.305 0.764590    
CalTop25      -892.15    1478.95  -0.603 0.554811    
CalTop15       -91.44    1894.05  -0.048 0.962093    
CalTop5       6605.79    2571.99   2.568 0.020623 *  
UCLA         12518.83    2217.26   5.646 3.65e-05 ***
USC            997.34    3654.69   0.273 0.788425    
Stanford     15652.60    2034.20   7.695 9.15e-07 ***
OOC-ab       -2162.21    2887.75  -0.749 0.464872    
revenge      -2803.39    1585.78  -1.768 0.096151 .  
OppWin%LastYr 6377.09    4160.35   1.533 0.144852    
OppWin%ThisYr 3555.76    2918.44   1.218 0.240743    
CalW%LastYr -10975.14    7527.77  -1.458 0.164202    
CalW%ThisYr   5776.78    4408.14   1.310 0.208534    
Pac 10 foe   -2210.26    1946.36  -1.136 0.272855    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2725 on 16 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared: 0.9299,     Adjusted R-squared: 0.8598 
F-statistic: 13.26 on 16 and 16 DF,  p-value: 2.396e-06  ...


So what does this tell us? Top-15 Opponents, UCLA, Stanford, top-5 Cal team, and revenge are significant.  You might be surprised the USC isn't significant, but this is likely due to a high level of collinearity in some of these variables (aka one variable overlaps with another).  USC is always a revenge game and USC is usually in the top-15, so those variables seem to pick up on why people show up for USC games.  Also, this has a very high r-squared, meaning that this combination of variables explains 85.98% of the variance in attendance numbers.  As your stats professor may have told you, though, if you stick a ton of variables into a regression, it's more likely you'll get a high r-squared.  These variables don't necessarily explain all the variance in attendance rates


If we collapse UCLA, Stanford, and USC into the "rival" variable, then it remains significant.  Also, a couple others jump into or out of the realm of significance:


             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  61477.49    6022.64  10.208 6.50e-09 ***
OpponentT25  -4657.25    2616.40  -1.780  0.09196 .  
OpponentT15  10650.36    3494.03   3.048  0.00692 ** 
OpponentT5   -1409.61    4162.36  -0.339  0.73879    
Cal top 25     345.78    1771.61   0.195  0.84744    
Cal top 15   -1500.78    2255.95  -0.665  0.51432    
Cal top 5     5807.58    3109.35   1.868  0.07816 .  
Rivalry      11715.91    1835.90   6.382 5.21e-06 ***
OOC A or B    -701.45    3519.26  -0.199  0.84425    
Revenge      -3260.28    1880.69  -1.734  0.10009    
OppWin%LastYr 4204.45    5052.53   0.832  0.41623    
OppWin%LastYr 5420.36    3492.47   1.552  0.13806    
CalW%LastYr -21211.22    8505.97  -2.494  0.02260 *  
CalW%ThisYr   9139.35    5307.93   1.722  0.10224    
Pac 10 foe      67.13    2258.10   0.030  0.97661    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 3369 on 18 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared: 0.8795,     Adjusted R-squared: 0.7858 
F-statistic: 9.383 on 14 and 18 DF,  p-value: 1.350e-05


Like the last table, rivals, Cal's winning percentage the previous year, and top-15 opponents are significant predictors for attendance. Cal top-5 and opponent top-25 are borderline significant.  Cal's current winning percentage, opponent's win percentage, and revenge are all borderline significant.


Well, some of you may say, do these coefficients change when Cal is playing someone who is not a rival?  Let's take a look...(here I've controlled for rivals, so these are the coefficients when Cal is not playing a rival).

Coefficients: (2 not defined because of singularities)

            Estimate Std. Error t value Pr(>|t|)    

(Intercept)    59183       5209  11.362 2.03e-07 ***

Top25 Opponent -3225       2304  -1.400  0.18916    

Top15 Opponent 16646       3583   4.646  0.00071 ***

Top25 Opponent    NA         NA      NA       NA    

Cal Top25      -1271       1739  -0.731  0.48026    

Cal Top15      -1873       2265  -0.827  0.42600    

Cal Top5        8398       2881   2.915  0.01406 *  

rival             NA         NA      NA       NA    

OOC-A or B     -1823       3022  -0.603  0.55856    

Revenge        -2227       1855  -1.201  0.25504    

OppWin%LastYr   1158       5570   0.208  0.83909    

OppWin%ThisYr   6153       3632   1.694  0.11834    

CalWin%LastYr -12412       8306  -1.494  0.16320    

CalWin%ThisYr   7588       4627   1.640  0.12927    

Pac 10 foe     -1583       2069  -0.765  0.46036    


Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2769 on 11 degrees of freedom

  (4 observations deleted due to missingness)

Multiple R-squared: 0.8442,     Adjusted R-squared: 0.6741 

F-statistic: 4.965 on 12 and 11 DF,  p-value: 0.0062 


When Cal is not playing a rival, it seems like playing a top-15 opponent is the strongest and most significant predictor, with a top-5 Cal team being the only other significant variable.  A few are close to significance (Opponent's win percentage, Cal's win percentage, Cal's win percentage the previous year).

What can we conclude from all this?  Cal fans show up in large numbers when there is any combination of the following factors: Cal is a top-5 team, Cal plays a top-15 opponent, Cal plays a rival, and, just for the lulz, they stay home when Cal was good the previous year.  I know, I know, these are all pretty obvious conclusions.  But what is most illuminating for me is what does not strongly influence fans to attend.  These are variables like 25th-16th ranked opponents, a Cal team that is ranked outside the top-5, playing an A or B non-conference opponent, and how well Cal and its opponent did the current and previous years.

If you're still reading this far, you probably have some degree of interest in statistics.  If you want to see more analysis of this data, ask away!  I'll be happy to oblige.