Saturday, October 27, 2007

Analysis: Scorekeeper Bias?

This week, I thought I'd take a look at another of Pomeroy's articles for the new Basketball Prospectus (no relation), Hometown Scoring: How Many Assists Did Acie Law Really Have? The discussion here is whether some official scorers are taking liberties with the NCAA definition of an assist, and are either interpreting more or fewer than is typical. To keep the discussion tempo-free, assist statistics are presented as assists per made field goal (55.1% was Div-I average last season).

Two cases are presented:
Liberal assists
 Assist Percentage (A/FGM)
Home Away
Texas A&M 78.5 45.2
Sam Houston St. 82.6 55.0
Evansville 72.1 47.3
South Florida 74.9 52.1
Cal St. Fullerton 69.0 47.9
Conservative assists
 Assist Percentage (A/FGM)
Home Away
New Mexico St. 46.3 60.1
Northern Colorado 50.9 60.6
Lafayette 57.7 67.1
Illinois 57.8 66.0
Long Island 42.3 50.4

The Aggies numbers certainly appear skewed, and as further evidence, Pomeroy notes that U of Texas went so far as to void the assist statistic from their game in College Station last year.

This is all well-and-good, but I was left wondering why the analysis wasn't a bit more detailed. For instance, did A&M play a large number of creampuffs at home early in the year, padding their stats? The answer is a qualified yes, as the team certainly had an unbalanced early season schedule (only 2 non-conf. road games, which were their only 2 non-conf. losses), but I don't have the data to determine if this lead to assist-padding at home versus the road. And is the difference home and road statistically significant, or just a fluke? In this case again, the underlying data is not presented to find out.

Since I can't run the numbers for A&M, I thought I'd take a look at Georgetown's home and road data to see if a bias was apparent at the Verizon (previous know as MCI) Center / McDonough Gym.

Games Asst %
Pre-BE - Home 9 59.4
Pre-BE - Away 4 57.0

Big East - Home 8 59.5
Big East - Away 8 56.6

Post Season - Neutral 8 61.1

Well, there does seem to be a bias home versus road, and it showed up in both the non-conf. and conference portions of the season. Having answered the first question, we now wonder whether this is real, or can just be attributed to small sample size. [Yes, I do see that if I were to combine the neutral-court games with the road games, the results would be essentially a wash (59.4% vs. 58.6%), but I'm trying to make a much more trivial point here.]

I'll reproduce the table, combining all home and road games, still ignoring the neutral games. Additionally, I'll treat the data a second way, calculating an Asst % for each game, then averaging:

Home Road
Games 17 12

Total Assists 243 169
Total FG Made 409 298
Asst % 59.4% 56.7%

Asst % (G by G) 58.3% ± 13.0% 57.8% ± 12.3%

Calculating an Asst % for each game first, then averaging may not be quite as statistically sound as just summing all home or road assists and field goals made, but it does provide us with an important piece of information, namely the variability (shown here as the standard deviation [1σ]) of the assist % from game to game.

How does that help? Because now we can use a very important tool in the statistician's bag of trick: Student's t-test. (You'll also note that the home and road splits are not as disparate using this second method, which is not unexpected for the small sample sizes and large variabilities.) Student's t-test simply tells us if 2 sets of data are significantly different. The details are beyond the discussion here, but the upshot is that it allows us to quantify a specific null hypothesis; "What is the likelihood that the home and road Asst % are not significantly different?"

The result, once worked through, is that it is 92% likely that the Asst % are not significantly different, i.e. there is not significant bias at G'town home games. Moreover, we can work out what the difference would need to be in the home/road split for there to have been a significant bias (at 90% confidence): 8.2%. Therefore, even the difference calculated with the raw numbers (59.4% - 56.7% = 2.7%) is not enough to get worried about.

Now, going back to the original article by Pomeroy, we now have some context for the numbers presented. Assuming the same variability and number of games for any other team's home and road splits (lousy assumptions, for sure), you'd want to see > 8.2% difference (or > 9.8% at 95% confidence) before you start questioning the numbers.

Clearly, Texas A&M meets that criterion.

Saturday, October 20, 2007

Analysis: Possessions used

If you haven't had a chance yet, be sure to cruise over to Basketball Prospectus, where Ken Pomeroy and John Gasaway are starting to post some nice articles.

I'm particularly excited about this new site, because now I don't have to work so hard for new material. Instead, I can cherry-pick topics that they've worked through and apply them to the Hoyas.

Example - Effective Usage

Pomeroy took a look at the likelihood that a player will be able to make a great leap forward in possessions used from one season to the next. This can become important when you want to identify efficient role-players that may be able to become efficent stars the next season. As it turns out,
Once a player demonstrates himself to be a role player, it's unlikely he'll ever be a go-to guy and, therefore, a superstar. It's not quite a law in college basketball, but players who are not very involved in the offense tend to stay that way. Any major changes in a player's usage are usually the result of filling the hole left by a departing possession eater.
Remembering that using 20% of available possessions is average (there are 5 players on the floor at any time), let's take a look at possession usage and efficiencies for G'town the last 3 seasons (the JTIII Era).

2005 2006 2007
Min % Poss % O. Rate Min % Poss % O. Rate Min % Poss % O. Rate
Bowman, Brandon 83.4% 24.2 112.4 71.2% 24.6 101 - - -
Cook, Ashanti 80.7% 20.3 102.3 77.4% 18.1 113 - - -
Owens, Darrel 63.2% 14.1 124.8 67.1% 15.7 127.2 - - -
Green, Jeff 84.6% 23.8 111.5 81.3% 25.4 102.7 83.2% 24.9 114.4
Wallace, Jon 75.9% 14.9 98 77.0% 15.6 116.2 80.5% 18.9 119.7
Hibbert, Roy 39.6% 25.3 89.2 60.1% 25.6 120.9 65.9% 22.8 130.8
Crawford, Tyler 17.6% 19.7 102.1 10.7% * * 19.7% 16.2 98.2
Reed, Ray 40.4% 15.9 82.8 - - - - - -
Ross, RaMell 19.9% 17.6 102.1 - - - - - -
Sapp, Jessie - - - 40.1% 12.2 96.5 82.1% 18.7 107.9
Egerson, Marc - - - 17.3% 19.7 94.1 60.2% 17.6 114.2
Summers, DaJuan - - - - - - 65.9% 22 101.8
Ewing, Pat Jr. - - - - - - 36.2% 17.7 109.9
Macklin, Vernon - - - - - - 24.4% 15.8 119.5
Rivers, Jeremiah - - - - - - 29.3% 12.5 77.9
*Crawford did not play enough minutes in 2006 (92 total) to generate meaningful stats.

It looks like the Hoyas follow Pomeroy's thesis quite well. For those players with at least 2 full seasons under JTIII, the largest change was for Jessie Sapp, who went from 12.2% to 18.7% of available possessions used. This is to be expected, as he stepped into Ashanti Cook's role directly, absorbing both his minutes and responsibilities. Most every other player used roughly the same percentage of possessions from year to year, within uncertainties (see the original article for more on this). Some other notes:
  • Jon Wallace has been able to increase both his Poss % and OR every year. A significant drop in either stat during his senior season would likely be a problem for the Hoyas.
  • Darrel Owens developed the reputation as an incredibly efficient player amongst stat-heads, but it should be noted that his selective use of possessions likely played a large role. While many of us were frustrated that he didn't "shoot more", it is not unreasonable to suggest that he wouldn't have been as good if he had. He was the quintessential efficient role-player.
  • Sapp was able to be more efficient while he used more possessions, which was critical in the team's success last year (it just seemed like he was a hucker [technical term] because Cook spoiled us with his senior season).
  • Macklin, a favorite for breakout candidate, looks to be hard pressed to make that leap due to his low possession usage, and the lack of an obvious hole to fill (I expect Ewing and Summers to take up most of Jeff Green's role). Macklin did used ~20% of available possessions in the post-season (BET and NCAA), but in very limited minutes.

Well, that's all for now. Feels good to shake off some of the rust from the long off-season.

Edited to add - Table updated for greycat

Sunday, October 14, 2007

News: Not dead yet . . .

While it may appear that this site has died a quiet death, rest assured that I will endeavor to keep it going with new content as the basketball season gets underway.

Some quick points:
  • I should soon be getting a big increase in free on-line storage, so I'll be moving the highlight videos in the next few days. All of the old links will be broken, so look at the right side table for new links. The new storage limit should allow me to post higher quality videos for the upcoming season.