Sunday, November 16, 2008

Another stats gimmick, and J'ville preview

Excuse this interruption of SFHoya99's season preview, but I thought I'd chime back in to introduce another stats feature that I've been working on behind the scenes.

If you're looking for the Jacksonville preview, you'll need to scroll down quite a bit.

My regular reader may have noticed by now that I've been loathe to assign credit or blame on specific players during a single game, but rather tend to present team stats. I do this in part because I think that it is difficult to evaluate individual play (especially defense) with a simple basketball box score.

There are tools available to glean some additional information when you look at a single game, notably the individual net score box that Dean Oliver describes in Basketball on Paper. Henry Sugar over at Cracked Sidewalks is a particular proponent of this, and has been providing Marquette fans with his version (which he calls "Individual Player Ratings") for most of last season. Here's an example from last year's game between MU and Villanova (hope he doesn't mind me linking):

Note that I've previously discussed this game when I introduced my version of the HD Box Score.

I won't explain Mr. Sugar's work here, but I will point to an excellent post he wrote last season covering the basics of each stat column listed. The bottom line for most fans is in columns 5 and 7 - points produced and net points added. This gives us an idea, based on tempo-free stats, of just how many points each player contributed towards the game result (in this case, a 10 point win for Marquette).

There are some limitations to this work.

Without going into too much detail here, I can assure you that the defensive rating assigned to each player for this game is just loosely tied to reality. Defensive stats are not available for most basketball games (NBA too) at the detail-level needed, so it is somewhere between difficult and impossible to assign blame for each player's defensive effort.

But more generally, the calculations used for the stats in the table above are underpinned by a large number of estimates, which should improve as we aggregate data over the course of a season, but which can be quite a bit off during an individual game. Here are just some examples of missing information needed to make the calculations for the stats above:
  1. How many possessions did a player have on offense? Defense?
  2. How many offensive/defensive possessions ended in a score?
  3. What percentage of field goals made by a player were off of an assist?
  4. How often are a player's missed shots rebounded by a teammate?
  5. How well did the team rebound while the player was on the court?
  6. How often did a player end a possession by making at least 1 free throw?
  7. How often does a player give a foul, and the opponent miss at least 1 free throw (e.g. Hack-a-Shaq)?
None of these questions - and others I haven't posed - can be answered by looking at the game box score. So the only recourse is to make estimates, based on a series of formulas introduced by Dean Oliver (and presumably used by Henry Sugar).

However, all of the questions asked above can be answered by parsing the available play-by-play from the game. And that is what I propose to do.

A few points to consider:
  • While I can improve the accuracy of the final stats by replacing estimates with actual tallies of various components of the calculations, I'm not modifying the philosphy (or math) of the final stats. That is, if you don't think individual player Offensive Rating is a good measure of how a player contributes on offense, there is little here to convince you otherwise. Of course, if your main quibble is with D. Oliver's many underlying estimates, keep reading.
  • As I've said before, the drawback of using play-by-play data is that there are inevitably errors in the transcript, which can lead to uncertainty in assigning credit or blame. However, I am not convinced that these same errors aren't also in the official box score, but are just hidden from view. Just for Georgetown, I know of at least one instance where Ken Pomeroy found an error in the play-by-play that propagated to the box score.
  • I am not exploiting the play-by-play fully yet, because if takes a lot of work. I've written over 5000 lines of code so far (yes, that was a brag) and my wife keeps mentioning how much time I spend working on the program, and something about a divorce (at least I think that's what she said, I wasn't really paying attention). For instance, I could record the shooting percentage of each player making an assisted basket, but I don't yet. I could distinguish between assisted dunks, layups and jumpers, but I don't yet.
A bigger point, and it goes back to an early post, is that I don't really believe in D. Oliver's defensive stats, and frankly I don't think he does either. They are merely an estimate, using an exceeding limited toolbox. Here's what I wrote there to briefly explain his Defensive Rating stat:

Defensive rating is an attempt to estimate the contribution of each player to the team's defensive efficiency. It is calculated as team defensive efficiency, plus one-fifth of the difference between team defensive efficiency and individual player stops per 100 possessions played. Player individual stops are estimated from the number of blocks, steals and defensive rebounds each player has, plus some team stats. Since it is not a simple ratio, it is more like being graded on a curve, such as that it is limited to the range of 80% - 120% of team defensive efficiency. So, a player who literally refused to play defense (e.g. Donte Greene) could score no worse than 80% of his team's efficiency. I would describe this stat as a very rough estimate of actual defensive worth . . .
Later in that same post, I discussed an alternative method, which was simply to use available plus/minus stats to calculate the team's defensive efficiency while the player was on the court, and use that (less the team's defensive efficiency while the player was off the court) to rate that player's defensive ability.

The drawback to this method, pointed out on this thread on Hoyatalk, is that it the quality of one's teammates can have a big effect.

So here, I'm proposing a new method: I am using Dean Oliver's basic statistics for player offensive and defensive rating, but the data I am feeding into the underlying equations are only those generated by his team while the player was on the court. This should especially help with defensive stats, in that the base team defensive efficiency used is now the def. efficiency while the player was on the court (i.e. the player receives no credit or penalty for great or lousy defense played by his teammates while he sat on the bench). The remainder of Dean Oliver's def. rating calc. (stops, stop %, scoring poss., etc.) is used as originally described. Additionally, as stated earlier I am removing as many of the estimates used by Oliver as I can, when I have time. The seven listed above are all incorporated, along with a few others (e.g. is a blocked shot recovered by the shooter's team?). I'll try to write up a FAQ covering all of the gory details at some point this season - likely when my wife is out of town.

As a test case, I've run the Marq/Nova game mentioned at the top of this post. Here's what I get:

Marquette             Off    Poss           Individ     Def             Individ                             
Player                Poss   Used    ORtg   Pts Prod    Poss    DRtg   Pts Allow   Net Pts
HAYWARD, Lazar         59    12.5   111.2    13.9        59    100.9     11.9       +2.0                  
BARRO, Ousmane         51     3.5   149.3     5.2        51     95.4      9.7       -4.6                  
JAMES, Dominic         69    18.0   140.5    25.3        70     97.3     13.6      +11.7                  
MCNEAL, Jerel          66    18.6    79.0    14.7        67     96.4     12.9       +1.8                 
MATTHEWS, Wesley       42    11.7    92.8    10.8        42     86.0      7.2       +3.6                    
ACKER, Maurice         23     4.7   181.0     8.4        23     81.8      3.8       +4.7                  
FITZGERALD, Dan        16     0.3   280.0     0.9        17    104.5      3.6       -2.7                   
CUBILLAN, David        31     3.1    74.8     2.3        32    134.1      8.6       -6.3                   
BURKE, Dwight           6     0.0     -       0.0         7     62.9      0.9       -0.9             
MBAKWE, Trevor         12     2.0   100.0     2.0        12    124.4      3.0       -1.0                  
TOTALS                 75    74.3   112.4    83.5        76     98.7     74.6       +8.9          

Villanova             Off    Poss           Individ     Def             Individ                         
Player                Poss   Used    ORtg   Pts Prod    Poss    DRtg   Pts Allow   Net Pts
Pena, Antonio          62    12.9    75.6     9.8        60    123.0     14.8       -5.0                 
Cunningham, Dante      61    12.0    93.3    11.2        62    112.0     13.9       -2.7                     
Reynolds, Scottie      60    16.1    85.4    13.7        57    123.9     14.1       -0.4                     
Fisher, Corey          62    17.4    76.4    13.3        59    112.0     13.2       +0.1                 
Anderson, Dwayne       54     6.6   154.1    10.1        53    125.3     13.3       -3.1                    
Redding, Reggie        25     2.0   223.2     4.4        28     80.9      4.5       -0.1                   
Clark, Shane            8     0.8   333.3     2.5         9     70.2      1.3       +1.2                
Stokes, Corey          48     7.6   121.7     9.3        47    106.7     10.0       -0.7                 
TOTALS                 76    75.3    98.7    74.3        75    113.3     85.1      -10.8                    
The actual score of the game was MU 85, VU 75.

Several of the columns here are the same as Henry Sugar's above, but there are a few new ones as well. Briefly
  • Off/Def Poss - the number of offensive or defensive possessions that a player was on the court; I think this is more useful than minutes played.
  • Poss Used - the number of offensive possessions used by a player (partial credit due to assists and offensive rebounds).
  • Off. Rating - the number of individual points produced, divided by the number of offensive possessions used, multiplied by 100. This is an estimate of the number of points a player would produce (not simply score) in 100 possessions.
  • Points Produced - similar to possessions used, it is an estimate of the team points scored that can be credited to an individual player; again, partial credit due to assists and offensive rebounds.
  • Def. Rating - An estimate of the number of points a player would allow in 100 possessions. See the discussion above the table for the details.
  • Points Allowed - The actual number of points allowed by the player - again an estimate.
  • Net Points - The difference between points produced and points allowed.

I've also included a totals line for all stats, so you can actually check my work.

The total Off Poss & Def Poss are the actual number of possessions in the game.

The total number of possessions used by each team agree very well with the reality - for my data parser, total possessions used are typically within 5% of actual possessions played, but this game worked exceptionally well.

Total points produced for each team are also very close to actual points scored. These should be with 10%, and often with 5%.

The summed points produced divided by total possessions used gives an estimate of team off. efficiency. This is the value listed as the total of ORtg. The estimated team offensive efficiencies (112.4 & 98.7) agree extremely well with actual off. efficiencies for each team (113.3 & 98.7).

At least for this game, it appears that my method is giving a quite satisfactory measure of what happened on offense. It won't always be so accurate, but this is why I want to give these totals - it will allow my reader to decide for himself (do any women read this blog?) how well the stats analysis is working.

Defensive stats are more tightly coupled to team, rather than individual, data so the totals here aren't quite so useful. The DRtg totals are simply team defensive efficiencies, calculated as team points allowed divided by defensive possessions.

Here, the summed individual points allowed for each team agree within 1 point of the actual score, another excellent result - I find typically they will agree within 5 points.

Finally, the net points totals give two estimates of the margin of victory (or loss). The average of the two [(8.9 + 10.7)/2] = 9.9 is almost exactly the true margin. It usually doesn't work quite this well!

I think this method compares favorably to the "classic" method proposed by Dean Oliver. I will keep working at it to remove additional estimated values and fix any bugs (e.g. I wasn't counting missed dunks until last week), but I think the basic framework is now in place. Any feedback would be appreciated.

Edited to add: A year later, and I did incorporate some feedback into net points. See here for the gory details.



Finally tonight, I thought I'd take a look at last year's game vs. Jacksonville, which the Hoyas won 87-55. That link will take you to my post-game post from last season, which includes the tempo-free and HD box scores (both will be part of each post-game analysis this season, when available). Here, I'll post the net points stats from last year's game - I've bolded and italicized any player who should play tomorrow.
Georgetown            Off    Poss           Individ     Def             Individ                        
Player                Poss   Used    ORtg   Pts Prod    Poss    DRtg   Pts Allow   Net Pts
Wallace, Jonathan      26     9.3    71.7     6.6        25     91.3      4.6       +2.1                         
Summers, DaJuan        39     8.8   125.0    11.0        36    101.8      7.3       +3.7                           
Sapp, Jessie           36     9.2    61.8     5.7        35     75.0      5.2       +0.4                    
Ewing, Patrick         26     2.0   101.6     2.0        26     55.5      2.9       -0.9                      
Hibbert, Roy           24     8.7   115.6    10.0        24     84.4      4.1       +6.0                    
Macklin, Vernon        46     4.4   143.3     6.4        45     97.8      8.8       -2.4                       
Wright, Chris          40    10.0   137.2    13.7        40     74.5      6.0       +7.7                     
Rivers, Jeremiah       28     4.7   152.7     7.1        28     83.6      4.7       +2.4                        
Jansen, Bryon           4     0.0     -       0.0         4     80.0      0.6       -0.6                     
Freeman, Austin        42     4.3   255.1    10.8        43     76.6      6.6       +4.3                       
Crawford, Tyler        29     4.5   123.0     5.5        29     95.5      5.5       +0.0                       
Wattad, Omar           10     2.1   129.3     2.7        10     90.9      1.8       +0.9                    
TOTALS                 70    67.9   120.3    81.6        69     79.7     58.1      +23.5                 

Jacksonville          Off    Poss           Individ     Def             Individ                          
Player                Poss   Used    ORtg   Pts Prod    Poss    DRtg   Pts Allow   Net Pts
SMITH, Ben             54    16.8    64.3    10.8        55    115.1     12.7       -1.9                      
HARDY, Ayron           34     5.4    73.4     4.0        37    125.0      9.3       -5.3                    
MCMILLAN, Andre        37     5.6   143.8     8.0        37    119.7      8.9       -0.8                       
COLBERT, Lehmon        40     9.0    87.8     7.9        40    105.8      8.5       -0.6                           
ALLEN, Marcus          30     3.8    95.6     3.6        30    126.3      7.6       -4.0                         
COHN, Travis           16     3.4    62.0     2.1        16    135.0      4.3       -2.2                        
GILBERT, Brian         30     3.1    97.2     3.0        30    143.6      8.6       -5.6                          
KOHIHEIM, Paul         26     3.8    20.9     0.8        25    120.5      6.0       -5.2                      
BROOKS, Aric           19     5.9    80.9     4.8        19    116.6      4.4       +0.4                        
LUKASIAK, Szymon       33     5.0    79.1     3.9        35    138.1      9.7       -5.7                            
JEFFERSON, Evan        26     5.1    59.6     3.1        26    139.5      7.3       -4.2                       
TOTALS                 69    66.9    77.8    52.0        70    124.3     86.9      -34.9              
DaJuan Summers had a great offensive game, but a lousy defensive game against the Dolphins, while Jessie Sapp was just the reverse (bad O, great D). Austin Freeman was his typical efficient self on offense but didn't use up a lot of possessions (~10%), while Chris Wright was player of the game on both ends of the court. Even Omar Wattad did his thing on the offensive end (1-1 2FG, 1-2 3FG).

I won't go into the Jacksonville players (you can see how they played last year).

The Dolphins lost to Florida State on Saturday, 59-57. J'ville was trailing 57-40 with 3:30 left and proceeded to go on a 15-1 run to bring the score to 58-55 with :20 left in the game, thanks in part to 2-8 FT shooting by FSU.

No comments:

Post a Comment