Saturday, October 27, 2007

Analysis: Scorekeeper Bias?

This week, I thought I'd take a look at another of Pomeroy's articles for the new Basketball Prospectus (no relation), Hometown Scoring: How Many Assists Did Acie Law Really Have? The discussion here is whether some official scorers are taking liberties with the NCAA definition of an assist, and are either interpreting more or fewer than is typical. To keep the discussion tempo-free, assist statistics are presented as assists per made field goal (55.1% was Div-I average last season).

Two cases are presented:
Liberal assists
 Assist Percentage (A/FGM)
Home Away
Texas A&M 78.5 45.2
Sam Houston St. 82.6 55.0
Evansville 72.1 47.3
South Florida 74.9 52.1
Cal St. Fullerton 69.0 47.9
Conservative assists
 Assist Percentage (A/FGM)
Home Away
New Mexico St. 46.3 60.1
Northern Colorado 50.9 60.6
Lafayette 57.7 67.1
Illinois 57.8 66.0
Long Island 42.3 50.4

The Aggies numbers certainly appear skewed, and as further evidence, Pomeroy notes that U of Texas went so far as to void the assist statistic from their game in College Station last year.

This is all well-and-good, but I was left wondering why the analysis wasn't a bit more detailed. For instance, did A&M play a large number of creampuffs at home early in the year, padding their stats? The answer is a qualified yes, as the team certainly had an unbalanced early season schedule (only 2 non-conf. road games, which were their only 2 non-conf. losses), but I don't have the data to determine if this lead to assist-padding at home versus the road. And is the difference home and road statistically significant, or just a fluke? In this case again, the underlying data is not presented to find out.

Since I can't run the numbers for A&M, I thought I'd take a look at Georgetown's home and road data to see if a bias was apparent at the Verizon (previous know as MCI) Center / McDonough Gym.

Games Asst %
Pre-BE - Home 9 59.4
Pre-BE - Away 4 57.0

Big East - Home 8 59.5
Big East - Away 8 56.6

Post Season - Neutral 8 61.1

Well, there does seem to be a bias home versus road, and it showed up in both the non-conf. and conference portions of the season. Having answered the first question, we now wonder whether this is real, or can just be attributed to small sample size. [Yes, I do see that if I were to combine the neutral-court games with the road games, the results would be essentially a wash (59.4% vs. 58.6%), but I'm trying to make a much more trivial point here.]

I'll reproduce the table, combining all home and road games, still ignoring the neutral games. Additionally, I'll treat the data a second way, calculating an Asst % for each game, then averaging:

Home Road
Games 17 12

Total Assists 243 169
Total FG Made 409 298
Asst % 59.4% 56.7%

Asst % (G by G) 58.3% ± 13.0% 57.8% ± 12.3%

Calculating an Asst % for each game first, then averaging may not be quite as statistically sound as just summing all home or road assists and field goals made, but it does provide us with an important piece of information, namely the variability (shown here as the standard deviation [1σ]) of the assist % from game to game.

How does that help? Because now we can use a very important tool in the statistician's bag of trick: Student's t-test. (You'll also note that the home and road splits are not as disparate using this second method, which is not unexpected for the small sample sizes and large variabilities.) Student's t-test simply tells us if 2 sets of data are significantly different. The details are beyond the discussion here, but the upshot is that it allows us to quantify a specific null hypothesis; "What is the likelihood that the home and road Asst % are not significantly different?"

The result, once worked through, is that it is 92% likely that the Asst % are not significantly different, i.e. there is not significant bias at G'town home games. Moreover, we can work out what the difference would need to be in the home/road split for there to have been a significant bias (at 90% confidence): 8.2%. Therefore, even the difference calculated with the raw numbers (59.4% - 56.7% = 2.7%) is not enough to get worried about.

Now, going back to the original article by Pomeroy, we now have some context for the numbers presented. Assuming the same variability and number of games for any other team's home and road splits (lousy assumptions, for sure), you'd want to see > 8.2% difference (or > 9.8% at 95% confidence) before you start questioning the numbers.

Clearly, Texas A&M meets that criterion.

No comments:

Post a Comment