Monday, December 26, 2011

What's a good early season worth?

If you've been reading the Hoya interwebs since the Memphis game, you may have come across a note that Georgetown has wrapped up the pre-Big East part of the schedule with a 10-1 record, and this is the fifth consecutive season that the Hoyas have started the season 10-1.

Of course, not all 10-1 records are the same.  As I write, Georgetown sits 12th in both polls, and 14th in Ken Pomeroy's ratings.  Meanwhile, the Seton Hall Pirates have run off an 11-1 record so far, but can get not a whiff from either poll and Ken rates them 55th overall.  While the teams have played comparably difficult schedules so far [G'town = 252nd, SHU = 274th], the Hoyas rate high by beating the bad teams by a lot, and for beating (or losing to) some better teams - Georgetown's only loss was to Kansas, while the Hall fell to Northwestern.

But that's not why I'm writing.

What I am actually curious about, is whether early season performance is a useful predictor of what comes later in the year.

The simple answer is "Yes."

I want to be careful to spell out a few definitions before I proceed.
  • The data set shown in the figure above is for all Big East teams for the seasons 2006 to 2011 (the six seasons played since the expansion with Conference USA teams)
  • Early season refers to games in November and December - I'm not discriminating for whether a game is conference or non-conference here.  Early season is represented on the x-axis
  • Later games means just that:  all games played after Dec. 31st. More pointedly, I'm not limiting the discussion to merely how a team performs in a single-elimination tournament (conf. or NCAA).  Late season is represented on the y-axis.
  • I'm not going to be using win-loss record but rather Ken Pomeroy's stats [adjusted efficiencies] to evaluate how a team performs.  The stat of interest here I'll call "net adjusted efficiency" which is simply adj. offensive efficiency [aOE] minus adj. defensive efficiency [aDE].  You can find aOE and aDE on Ken's ratings page (free to all).  I'm happy to acknowledge that his "Pythagorean" rating is probably a bit more accurate, but it's also a lot more complicated.
The scatter plot is fairly impressive - using Ken's stats, there is a clear linear trend between how well a team plays early and how well they will play later.

Lots of fancy-pants analysis, what it means for Georgetown, and predicted Big East wins for all teams after the jump.

I'm going to display that scatter plot again, but with some additional information on the figure:

I've added several lines here:
  • The red line is the 1:1 line; any dot that fall on the line indicates a team that played equally well early and late season.  Dots that lay above the line represent teams that played better late season than early.  Dots that lay below the line represent teams that got worse.
  • The solid blue line is the linear fit to the data, and very close to the 1:1 line.  This shows that Big East teams that play poorly early season tend to improve slightly later on.  Teams that play very well early season tend to play at about the same level later.
  • The dashed blue lines are the 95% prediction bands - dots outside of the area they bound represent the outliers in the data set, teams that played either much better or much worse later in the year than early.
  • The goodness of this fit is robust (to use the stats vernacular).  For the data here, R² = 0.89, meaning that 89% of the scatter in the data can be explained simply by the fitted line.

Using the fit here, we can give an estimate with 95% confidence (1 in 20 will be wrong) for how a Big East team will play later in the year, which we will soon do.  But first, if you look up at the plot, a question you may have had is "Who was the team that played so poorly later in the year?"

I'm glad you asked.

The same plot one last time, but now I've color coded and labeled Georgetown's seasons.  Sure enough, Hoyas fans were lucky to live through not only the greatest collapse by a Big East team in the previous six seasons [2009], but also two of the three greatest collapses in the data set [2009, 2011].  Only the 2007 UConn Huskies could provide an equally breathtaking flameout.

That one huge over-achiever?  The Louisville Cardinals (also in 2007).

A question that's often been asked the past few years is whether Coach John Thompson is fairly developing a reputation for having his teams fade later in the year.

This can be answered two ways:  if you look where the dots lay on the plot, twice the team has indeed faded (once in part from injury) [2009, 2011], twice they played about as expected [2008, 2010], and twice they exceeded early season performance [2006, 2007].  Call it a wash.

But, if we average the difference between expected and actual late season performance over the past six seasons, we'd find that the Hoyas will play about -1.4 net eff. points worse late season than early.  Thus the reputation has been earned.

Here's a table for all sixteen Big East teams, comparing late to early season:
Team           net adj. eff.
Cincinnati         -2.2
Connecticut         0.1
DePaul             -0.4
Georgetown         -1.4
Louisville          2.3
Marquette           1.2
Notre Dame          1.1
Pittsburgh          0.9
Providence         -0.6
Rutgers             0.4
Seton Hall          1.0
South Florida      -1.6
St. Johns          -0.4
Syracuse            0.3
Villanova          -0.6
West Virginia      -0.2
The team that shows the biggest improvement later in the year is Louisville - attribute it to great coaching by Rick Pitino, or perhaps because it is driven by that great 2007 season (not really).  The other big improvers were Marquette (despite the coaching change) and Notre Dame (fear the turtleneck).

The worst team is Cincinnati, much worse than even the Hoyas.  In five of the six seasons we looked at, the Bearcats played worse later in the year.  South Florida and the Hoyas are also faders.

Finally, we'll combine the fitted line, the team-by-team correction for improving or regressing during the season, and some previously discussed relationships between net efficiency and conference wins to produce the master table:  expected conference wins, along with a range of expected wins that should be correct 95% of the time.
.                      (95% conf.)
Team            Wins     Range 
Cincinnati        8       5-10
Connecticut      11       9-13
DePaul            4       2-6
Georgetown       12      10-15
Louisville       13      10-15
Marquette        13      11-15
Notre Dame        8       6-11
Pittsburgh        9       7-12
Providence        5       3-7
Rutgers           6       3-8
Seton Hall       10       7-12
South Florida     4       2-7
St. Johns         4       2-6
Syracuse         16      14-18
Villanova         8       5-10
West Virginia    10       8-13

Twelve conference wins for the Hoyas this season.  It's that simple - no need to even play the games, really.

A few final notes:
  • Since there are 16 teams in the Big East and the predicted ranges should be wrong only 1 in 20 times, we'd expect at most one team to fall outside of its forecast range of wins.
  • A couple of caveats:
    • I didn't propagate all uncertainties through, so the range widths are probably a bit optimistically narrow.
    • All previous seasons used all games played through the end of December.  The table above includes only games played before Christmas.  A bit more data might change the table somewhat.  If I remember, I'll update the table here after New Years (updated on Jan 2 - BL)
  • Yes, I am aware that St. John's managed to win a conference game already while I wrote this interminable blog post - check out the above caveats before telling me that.


  1. Awesome post. Keep 'em coming!

  2. This is a really cool read. Hopefully this year our strong early season will pay some dividends during that single-elimination tournament!

  3. How did I miss this? A great post and spectacularly great work.

    I would say the sample seems a bit low from my non-trained standpoint -- by that I mean if Chris Wright doesn't get hurt last year or if the team improves this year, it seems we'd get a different result.

    Is six seasons enough?

  4. Is six seasons enough?

    Absolutely not - but when has that ever stopped us before?