Saturday, October 25, 2008

Big East Growth Charts

About this time last year, I posted about possession usage, based upon an article by Ken Pomeroy about effective usage. Possession usage, or possession percentage (%Poss) is yet another in a litany of stats that you'll find here and other site run by fans with a predilection towards numbers and too much time on their hands. It is simply the ratio of possessions that a player ends divided by total possessions played. A player can end a possession by:
  1. scoring
  2. turnover
  3. missing a shot that is rebounded by the defense.
On a team that was truly democratic (socialist?), everyone would have a %Poss = 20%, meaning that team possessions were used equally. In the real world, %Poss for Big East basketball players typically ranges from 10 to 30%, with a median value around 18%*. In other words, the majority of players choose to use less than number of possessions you'd expect.
*Edited to add: Where did this come from? See the end of the article.

Why? Simply, because someone else on the court is using possessions at a disproptionately higher rate. There are a few reasons why this could be (here comes another list):
  1. the coach has instructed the team to allow certain player(s) to use more possessions
  2. certain player(s) have decided on their own that they should use more possessions
  3. the player in question thinks other player(s) should use more possessions
Whatever the reason, the logic of the 18% player is sound so long as he is giving away those possessions to someone more efficient (i.e. with a higher Off. Rating) than his own. However, when a player has a very high Off. Rating (e.g. Darrel Owens, Colin Falls) but uses significantly less than his allotted 20% of possessions, he may by hurting his team. Of course, the counter-argument can be made that by using more possessions, he'd become less efficient, but there would still be some marginal returns until this player's efficiency is comparable to his teammates - and yes, I've just lapsed into economic theory.

Ken Pomeroy's - remember Ken? we started out by talking about him - thesis is "role players don't usually become go-to guys from one year to the next, or at any point during their careers."

To prove his point, he provided a couple of nice charts, showing the change in %Poss from one year to the next, or one year to two years later. I'll reproduce them here (as always, click any image to enlarge).


There are 4 types of fitted lines on these charts.
  • The thick black line represents the best fit of the data (all 2005, 2006 and 2007 college players, I think).
  • The thick dark blue lines indicate the 50% prediction interval around the fit - that is, 50% of all players should be between these two lines. Another way to think of the lines is that 25% of all players will fall below the lower line, and 75% of players will fall below the upper line (and therefore 25% of players will be above the upper line).
  • The thin black lines are 95% prediction intervals - only 2.5% of all players should fall below the bottom line, or lie above the upper line.
  • Finally, there is the dashed line, which the is the 1:1 line. A player lying on this line would have no change in his %Poss from year 1 to year 2 (or 3).
There's actually a bit more information we can glean from these plots. For example, you'll notice that the 1:1 line is below the best fit line until %Poss ≈ 22.5% for year 2 vs. year 1. What this means is that, for players who used less than 22.5% of their possessions in a given year, more will increase their usage the following season than not. For the year 3 vs. year 1 plot, this point of intersection is ~23% - very similar.

This is not to say that all players will eventually become 22.5% possession users, but rather that this is the point where increasing possession usage becomes more difficult than not, likely due to increased competition with teammates for available possessions.


So, is there a point to all of this?

Since KenPom has %Poss data available back to 2005, I decided to plot all Big East players from the last 3 seasons (or 2 seasons) on his charts, to see if the Big East behaves as the rest of college basketball with respect to changes in usage.

Here's season 2 vs. season 1 (you'll need to click to expand to see things clearly):



A bit of explanation is in order (if there's one thing I want to be famous for, it's busy charts).
  • I've sized the markers by Season 1 %Min (% of available minutes played), so that end-of-bench players wouldn't swamp regular players on the scatter plot.
  • I've color-coded the markers by Season 2 Off. Rating. When I initially ran this analysis, I expected that the rate of increase from year 1 to year 2 would be strongly related to how well the player performed in the 2nd year, but this is obviously not the case - the color appears random.
  • I've also color-coded Georgetown players as gray rather than on the color scale, so they'd stand out. Nothing of exception with this group.
  • I've added tags identifying a few outliers.
  • Finally, I've added a horizontal and vertical line at 22.5%, indicating the point where more possessions become scarce.
The analysis by KenPom seems to apply very well to the Big East - the data almost entirely within the 95% prediction interval, and follows the trend line.

What's of most interest to me are the two points in the upper left quadrant: James Holmes and Draelon Burns. These two players represent the exception to KenPom's rule, in that they made the leap from role players (%Poss = 18.7 & 19.5, respectively) to go-to players (27.5 & 28.4) in a year. Since I wasn't paying much attention to either team at the time (or now), I'll leave it to someone else to explain what happened in each case.


On to the two-year gap (season 3 vs. season 1):



Many fewer data points here (n=80 here; n=274 for the previous plot), but again the analysis by KenPom seems appropriate for the Big East.

The players of interest here include Daryll Hill, who fell from go-to to role player due to injuries, and three rising seniors: Anthony Mason, Levance Fields, and Georgetown's own Jessie Sapp. Only Mason has made the leap into true got-to status (17.1 to 23.0 to 26.9), but Jessie Sapp has made an extraordinary rise from pass-only to important cog (12.2 to 18.7 to 22.6). It will be interesting to watch these three to see if they can continue to absorb possessions.

-------------------------------------------------------------------------------------------------
Edited 10-27-08, 10:00pm to add:

While playing with the data, one thing I did look at was the distribution of %Poss for Big East players. Here are the histograms for all players (n=611) from 2005-2008, and also for those with %Min > 40% (n=395). What's interesting is that majority of players who play less than 40% of available minutes also use less than 18% of available possessions (note that KenPom has his own filter of %Min > 10% on the data, so players with very little playing time [< 4 min / game] are already dropped from the data set).



I suspect that the subset of players with 10% < %Min < 40% group has a significant number of freshmen who are slowly being introduced to their coach's respective systems. I used the median value of the entire population in the discussion above, since many starters began their career in this role as bench / role players (e.g. Jessie Sapp).

I hadn't thought to plot %Poss vs. %Min before tonight, but the histograms above imply a relationship. Here it is:



The data points are both colored and sized by offensive rating, but there doesn't seem to be much trend in that variable. The slope of the line is ~0.085; in other words, an increase in %Min by 23% will increase %Poss by 2%, on average.

You've got to love D. Caracter's freshman season at Louisville.

Saturday, October 18, 2008

Individual Offensive & Defensive Ratings - 2007-8 Review

General comment about the edits - I realized soon after originally posting this article that the underlying numbers just didn't look right. Sure enough, I tracked down at least 2 mistakes in my math. I think I've got everything correct now, but I make no warranty. I reserve the right to continue to make mistakes, but I will do my best to avoid them.


Before I get started, I just want to point out that the player +/- stats page has been updated, and now includes all available games for the last 2 seasons.

-----------------------------------------------------------------------------------------------

In my last few posts, I've be touting my new HD Box Score MakerTM, which uses game play-by-play data to extract a lot more info than a standard box score yields (all available HD boxes for the last 2 seasons now posted).

In my never-ending quest to keep you, my only reader, ahead of the curve when it comes to basketball knowledge, I thought I'd start a series of posts using the data generated by my HD Box Score MakerTM to learn a bit more about your Georgetown Hoyas.

To start, I thought I'd try to take on one of the questions that I raised at the end of my intro post to HD box scores:
2. Was J. Rivers really that great of a defender? I'll look at the team's offensive and defensive efficiencies with each player on or off the court, to see if I can learn a bit more about the defensive side of things.

Analogous to fielding defense in baseball, individual defense in basketball is not well-described by traditional basketball statistics. We can talk about team defensive stats (Def. Efficiency, DReb %, Def 2FG%, Def eFG%, Block %, etc.) with some confidence that we are able to describe what is actually taking place on the court, but the difficulty comes in attributing the individual defensive stops, rather than just taking a holistic view.

There are a few ways to tackle this problem:
  • Watch each game, and chart each defensive possession for who was responsible for stopping (or allowing) the defense from a score. This has been advocated by Dean Oliver, the father (mid-wife?) of advanced basketball statistics - but, as far as I know, simply isn't available for college basketball games.
  • Use the available box score data to estimate the number of stops each player makes, based on some rather large assumptions; one example of this metric is called Defensive Rating (also developed by Dean Oliver). This is similar to his Offensive Rating for individual players, which I, Ken Pomeroy and many others calculate, but uses less certain assumptions. Currently, I'm only aware of Henry Sugar at Cracked Sidewalks reporting Def. Ratings (example linked), although I'm sure there are others. I'll talk more about this stat below.
  • Use the available play-by-play data to estimate the importance of each player to total team defense. Your first thought might be that we could use the play-by-play data to determine actual defensive stops by player, but we can't. The play-by-play doesn't tell us who is guarding whom, so we'd be back to the same assumptions that Dean Oliver uses. However, there is a simple analytical tool that we now have available to us: we now know how many points each team scored when any player was on (or off) the court. That is to say, we can calculate a team's offensive and defensive efficiency (points per 100 possessions) as a function of whether any player is on the court, and thereby look at what impact each player has on team offensive or defense.

At this point, I will explain some basic terms as a refresher, and also cover what's new here. All of this is explained in much more detail at the web pages linked to the right under "Tempo-Free Stats 101." Feel free to skip ahead if this is all familiar.

Possession-based (tempo-free) statistics is a concept in basketball going back at least as far as Frank McGuire, and is useful for comparing players and teams who operate at different paces, or speeds of play. A possession ends either by a made basket (including some made FTs), a turnover or a defensive rebound - that's it, that's the list. Offensive rebounds don't create new possessions, only prolong the current one. If you use this definition, two teams will end up with either the same number (± 1) of possessions at the end of any game; since possessions go back-and-forth, it must be so. The equation for estimating total possessions per team per game is floating in various forms around the internet, but I will add to the clutter:

Possessions = FGA + 0.44 * FTA - OReb + TO

Since this formula is only an estimate of the actual number of possessions, I find that it is best to solve for each team, then take the average. Any team's (or player's) stats should be instantly comparable to any other with a per-possession system, since what is expressed is team (or player) efficiency rather than counting stats.

Offensive (and defensive) efficiency is a team statistic expressed in the units of points per 100 possessions (why per 100 possessions? so there aren't so many digits to the right of the decimal). This statistic is rather simple to calculate, once you've worked out how many possessions have been played with the equation above. Ken Pomeroy, one the populists of tempo-free stats, has an additional version of this stat, called adjusted off. (or def.) efficiency. Here, he attempts to weight points per possession based upon quality of opposition.

Offensive rating (as mentioned above) was created by Dean Oliver in an attempt to better rate individual basketball players on offense. The calculation of this stat is not simple - I've had people ask me in the past for the equation, but it's actually a bunch of equations (see this book for details). In simple terms, it is the ratio of points produced (not scored) by any player, divided by possessions used (not played), with both of these terms estimated from normal box score data. It is a tempo-free statistic, since it is expressed in points per (100) possession. Since players should be credited for assists and offensive rebounds as well as actual points scored, this rating is just an estimate of actual player worth, but the underlying assumptions are well thought out (you'll have to trust me, or read the book).

Defensive rating is an attempt to estimate the contribution of each player to the team's defensive efficiency. It is calculated as team defensive efficiency, plus one-fifth of the difference between team defensive efficiency and individual player stops per 100 possessions played. Player individual stops are estimated from the number of blocks, steals and defensive rebounds each player has, plus some team stats. Since it is not a simple ratio, it is more like being graded on a curve, such as that it is limited to the range of 80% - 120% of team defensive efficiency. So, a player who literally refused to play defense (e.g. Donte Greene) could score no worse than 80% of his team's efficiency. I would describe this stat as a very rough estimate of actual defensive worth . . .

. . . which leads me back to the point of this post (there really is a point). Now that I have access to the play-by-play of most G'town games, can I use this to better estimate the defensive contribution of each player, on a possession-basis? If so, we could finally talk about the overall value of a player to his team, rather than just his offensive contribution. The play-by-play shows who was on the court at any point during the game, so we can assign partial credit to each player for how well the team plays at both ends while he's in the game; likewise, we can see if the team plays better or worse when he leaves. This is really just applying the concept of plus-minus and Net/40 (or Roland rate), but rather using possession info to speak in tempo-free terms, rather than per-minute.

To explain explicitly here, I've taken each player, and added up the points that G'town scored and allowed while he was on the court. and how many were scored and allowed when he was off the court. In each case, I also know how many offensive or defensive possessions he participated in, so I can divide each point total by respective possessions (times 100) to find the team's offensive of defensive efficiency while he was on or off the court. Then, I find the difference between on- and off-court efficiency (either off. or def.) and add that to the team's efficiency.

For example, to calculate Jessie Sapp's Net Offensive Efficiency:

Jessie Sapp played 1171 offensive possessions, and the Hoyas scored 1298 points while he was on the court.
[1298 / 1171 x 100 = 110.8 Off. Eff. on-court]

He sat for 561 offensive possessions, and the Hoyas scored 585 points while he sat.
[585 / 561 x 100 = 104.3 Off. Eff. off-court]

So Jessie Sapp's Net Off. Eff. is equal to:
(Off. Eff. on-court - Off. Eff. off-court) + Team Off. Eff. =
(110.8 - 104.3) + 108.7 = +6.5 + 108.7 = 115.2 pts/100 possessions


Before I jump into the defensive analysis, I first want to see if my idea that player-based on-court / off-court net efficiency correlates to individual player rating holds water. To do this, I'll take a look at each player's net offensive efficiency versus individual offensive rating for last season.

One last point before I go on, the individual player ratings here won't match exactly with what either Pomeroy or I post for season totals. Since I don't have play-by-play data for all games, I re-ran the player ratings using box score data only for games that also had p-b-p data [to see which games are missing, go to the player +/- page].

Let's take a look (as always, click any image to enlarge):




This seems to work quite well! Players in the upper right of the graph are the best offensive players by either metric, while players in the lower left are not carrying their weight. The red line is a linear fit to all the data (r = 0.81), excluding Bryon Janson, who just doesn't have enough playing time to generate meaningful stats. The slope of the line is about 0.55, significantly less than 1, which is actually to be expected. The Net Team Efficiency stat doesn't completely isolate a player from his teammates in the way that Off. Rating attempts to do; since there is variability player-to-player, the range of the Net. Team Eff. stat gets compressed - a bad offensive player surrounded by good players will look better than he is.

The strong correlation indicates that the two statistics are highly coupled (co-variant). Individual offensive rating is a fairly well-accepted statistic, and it seems to do a good job of measuring how important a player is to team efficiency. Of course, the converse should also be true - team offensive efficiency as a function of each player on or off the court is a good measure of individual offensive value.

Moreover, and here's where I may be stretching the statistics a bit, the scatter plot can tell us a bit more: if a player is above the line, he makes the team more efficient than expected based upon Off. Rating (i.e. the player is underrated by Oliver/Pomeroy/etc.) while if he is below the line, he is overrated. Keep in mind that there are considerable uncertainties for the data on both axes that are not shown or even calculated, because that would make my life a lot harder. But it looks like Jessie Sapp and Roy Hibbert were underrated offensively last season, while Patrick Ewing Jr. and Vernon Macklin were overrated.


Now, let's now take a look at the defensive side. The math as the same as presented above, just looking at defensive possessions now.




Things here are not so clear-cut as for offense. There is poor correlation between the two data sets (r = 0.11), so I've just thrown a 1:1 line onto the chart. Note that both axes have their scaling reversed (they get smaller as you head away from the origin), since a lower defensive rating or efficiency is better. Again, players in the upper right corner are the best defenders, those in the lower left are the worst.

One thing that I notice immediately is that the scaling for the two statistics is much closer to 1:1 than for offense. We already expect that Net Def. Rating should be compressed since we can't isolate individual players, only their effect on the team while on the court. But here, the Def. Rating stat shows about the same scaling meaning either a) there isn't as large a difference between a good and bad defensive player as there is for an offensive player, or b) the Def. Rating stat isn't able to isolate individual defensive skills.

To take this a bit further, we demonstrated above that the Net. Efficiency methodology for offense seems to work quite well in correlating to a "good" measure of offensive prowess, albeit on a somewhat compressed scale. Since the method is identical for defense and offense, there's no reason to expect Net Efficiency to stop working for defense. Therefore, it could be argued (I just did) that Net. Def. Eff. is a better measure than Def. Rating.

If you've not read Basketball on Paper, you should probably just skip over the next paragraph.

Digression: Before I go on, I should point out that I'm using a slightly modified version of D. Oliver's Defensive Rating calculation. His formula estimates defensive stops in two parts, and the second has a necessary assumption that he, himself, acknowledges to be poor with regards to position-specific uncredited stops. Since centers get a disproportionate number of stops by way of blocks, they tend to be overweighted by his formula (for reasons far too obscure to explain here). I've added a simple weighting factor, based on Steals/(Steals+Blocks) to correct this. For this data set, the effect of this correction ranges from -2.7% for Roy Hibbert to +2.3% for Jonathan Wallace and Chris Wright. (n.b. - The weighting factor is based on Oliver's own data).

Returning to the scatter plot, once again we can find over- and underrated players by using the fitted line.

Now, Tyler Crawford and Chris Wright are the most overrated defenders, while Jonathan Wallace (!), Roy Hibbert and Jessie Sapp are underrated. Of course, that only looks at the comparison between two poorly correlated stats; the real take-home message is that Hibbert, Sapp, DaJuan Summers and Patrick Ewing anchored last year's excellent defense, just as Hibbert, Wallace and Austin Freeman were the most effective on offense.

And to answer the question that started this whole thing - yes, Rivers was a good defender, but not extraordinarily so, and not as important as Hibbert, Ewing or Summers. In fact his stats are not obviously better than either Sapp's or Wallace's(!). I will speculate that Rivers was the best on-ball defender as a guard, but Wallace and Sapp were more sound within the defensive schemes used last year (I just made that up). And I'm not sure why Omar Wattad looks like Gene Smith on this plot, but I'll guess it's just the result of a small sample size (n = 58 def. possessions).


Finally, we can combine the offensive and defensive metrics on a single plot, to get a rank of the overall value of each player. Here, I'm simply taking the average of Off. Rating & Net. Off. Eff. for the y-axis, and the average of Def. Rating & Net. Def. Eff. for the x-axis. Here, I'm hoping that the averages of two measures of the same variable come closer to describing its true character than either measure on its own.





This plot is a bit more complicated, as I'm trying to convey a lot of information.

Again, you'll need to re-jigger yourself to the axes. Offense is on the left-axis, with up equaling better performance; defense is on the bottom axis, but still with reversed scaling, so that right equals better performance. Players in the upper right are most valuable, players in the lower left are least valuable; those closer to the upper left have more value on offense, those down and to the right have more value on defense. This last bit looks to be well correlated with expectation, as the two players best know for offense rather than defense (Wallace & Freeman) show up right about where you'd think.

The size of the names are now scaled by possessions played, so Omar Wattad's newly-discovered value as an all-world defender is now tempered by the fact that he rarely plays. DaJuan Summers somewhat swallows up Patrick Ewing because of this, but I think you can still make them out.

Finally, the series of diagonal lines (isopleths) on the chart mark show contours of total player value (i.e. the difference between off. and defensive worth). For example, Jessie Sapp's off. worth = 108.9 and his def. worth = 89.8, so you'd expect that he'd provide +19.1 (= 108.9 - 89.8) pts./100 poss. to the team. Meanwhile, Austin Freeman's off. worth = 115.4 and his def. worth = 95.8, so you'd expect that he'd provide +19.6 pts./100 poss. That is, Sapp and Freeman were essential equally valuable to last year's team on a per possession basis, although they did it in different ways. Because they are rated at equal value, they both lie on about the same position relative to the diagonal lines.

Moreover, if you had a team of players all equally efficient overall as Sapp and Freeman, you'd expect the overall difference in team Off. Eff. and Def. Eff. to be around +19 pts/100 poss. FWIW, last year, G'town's efficiency difference was +18.8 (raw, not adjusted), good enough for a #2 seed in the NCAA tournament.

Here's a summary table of off., def. and total player worth for all the players:
                      Off.      Def.    Total
Hibbert, Roy 122.3 90.2 32.1
Wallace, Jonathan 118.7 94.0 24.7
Freeman, Austin 115.4 95.8 19.6
Sapp, Jessie 108.9 89.8 19.1
Ewing, Patrick 106.1 90.5 15.6
Summers, DaJuan 105.6 90.6 15.0
Wattad, Omar 103.3 88.3 14.9
Wright, Chris 100.8 93.2 7.6
Crawford, Tyler 95.6 95.4 0.1
Rivers, Jeremiah 90.1 94.4 -4.3
Macklin, Vernon 94.4 100.0 -5.6

We will miss Roy Hibbert. On a team that was ranked 7th by Ken Pomeroy after the NCAA tournament, Mr. Hibbert was the best Hoya player on the court by a large margin.

Next comes the gang of five (+1), who were the other important contributors to the Hoya's success; in very rough order of importance: Wallace, then Freeman and Sapp, then Summers and Ewing. Wattad sneaks in right behind this group, despite his few possessions (who knows, maybe the whole point of this article was to find a new player to champion; after all, the last time went so well).

By my reckoning, the two soon-to-be-transfers were not helping much last season. In the end, Rivers' defense just couldn't make up for his offensive woes, while Macklin struggled at both ends of the court. I never gave much thought to Macklin's defense, but he was easily the worst defender on the team by my numbers. Does that seem right? Tyler Crawford struggled with his outside shooting last year (3-22 3FGs), and wasn't able to make up for it with great defense.

There's one returning player that I haven't mentioned - Chris Wright. He missed the majority of the season with a foot injury, and ended up with a little more than half of Macklin's total time played. After struggling early in the season, he seemed to have a breakthrough in the 2nd half against Derrick Rose and Memphis, only to go down 2 games later. And while he played in 5 post-season games after getting healthy (3 in the BET, 2 in the NCAA), I only have the Pitt game in the BET finals in this analysis, since I don't have p-b-p for the other 4 games. From looking at the box scores, 3 of the 4 games I'm missing (Villanova, UMBC, Davidson) were among his best.

What I'm saying is that I think Wright is being undervalued here, not because there's anything wrong with the analysis, but because his underlying data doesn't do him justice. I hope I'm right.

Tuesday, October 14, 2008

News: Hoyas Graduation Rate Declines

The Graduation Success Rate (GSR) of Georgetown's mens basketball team declined this year, dropping from 82% to 70%[pdf] (thanks to Cracked Sidewalks for pointing out the release of this year's rates).

Here's the data:

Year Cohort GSR Fed Rate
2008 1998-2001 70 47
2007 1997-2000 82 60
2006 1996-1999 64 47
2005 1995-1998 50 42

While I'll guess that HoyaSaxa.com will have more on this, I thought I'd put together a post providing answers to a few questions raised by this news.

1. What is the Graduation Success Rate (GSR)? Who cares?
This is a metric[pdf] that the NCAA uses to judge whether a school's sports programs are graduating student-athletes at an acceptable rate. Simply, it is the percentage of incoming athletes in a four-year cohort who either graduate or transfer in "good academic standing" within six years of their classes entrance to the school. It includes athletes who transfer in to the school as well.

The reason we care is that the data used to calculate GSR rates are also used to calculate each school's Academic Progress Rate (APR)[pdf]; here's G'town for the 2006-7 school year. If a school does not meet minimum APR requirements, penalties ensue.

1a. What's the difference between GSR and APR?
GSR is simply the percentage of athletes who either graduate or transfer out while academically eligible. APR looks at semesterly academic standing and retention rate, so a player who completes the fall semester successfully and returns in the spring earns 2 out of 2 points, while a player who completes the fall semester but leaves the school (e.g. Marc Egerson in 2006) earns 1 out of 2 points.

2. What is a cohort?
A population that is being studied. In this case it is all student-athletes who belong to 1 of 4 contiguous classes. Specific to this year, it is the athletes who were part of the 1998-2001 incoming classes. As classes are sometimes thought of in terms of their targeted 4-year graduation date (especially after graduation), it would be the Classes of 2002-05.

3. What happened to the old formula?
There is an alternative formula, the Federal Graduation Rate, that does not credit schools for transfers out in "good academic standing." For example, Jeremiah Rivers and Vernon Macklin left Georgetown over the summer - the old formula will consider them as non-graduates, while the GSR would allow the school to count them as graduates.

4. What if a player leaves early to go pro (e.g. Jeff Green)?
In this case, that player will be considered a non-graduate if he doesn't complete his coursework in 6 academic years. In the case of Jeff Green, the clock is still running, and Jeff is still working towards his degree - he was 24 credits short heading into this past summer.

5. Why are the using such old classes? John Thompson Jr. was still coaching in 1998!
Well, the NCAA wants to get an average GSR over a four-year time period, and it also wants to give athletes up to 6 years to graduate. So, for the 2008 data, its for players whose graduation clock ended in August 2007, as well as the previous 3 years (2006, 2005, 2004). So, for the August 2004 group, if the had 6 years to finish, then they would have started in 1998 (in this case, Kevin Braswell and Willie Taylor).

6. What will happen next year? How do Georgetown's classes look?
Time for another table which is really just an update of an old HoyaTalk post by SFHoya99:
Class  Player                         Transfer In?  Graduated?  Transfer out?  Left?

1997 Boumtje-Boumtje, Ruben Yes
Burton, Nat Yes
Perry, Anthony Yes
Scruggs, Lee Yes Yes

1998 Braswell, Kevin Yes
Taylor, Willie Yes

1999 Freeman, Courtland Yes
Samnick, Victor Yes
Wilson, Wesley Yes
Hunter, Demetrius Yes
Burns, Jason Yes

2000 Faulkner, Omari Yes
Ross, RaMell Yes
Riley, Gerald Yes
Sweetney, Mike Yes

2001 Owens, Darrel Yes
Bethel, Tony Yes
Hall, Drew Yes
Thomas, Harvey Yes

2002 Bowman, Brandon Yes
Cook, Ashanti Yes

2003 Dizdarevic, Sead Yes
Ewing, Patrick Yes Yes
Causey, Matt Yes
Reed, Ray Yes

2004 Crawford, Tyler Yes
Green, Jeff Yes
Hibbert, Roy Yes
Wallace, Jon Yes
Guibunda, Cornelio Yes

2005 Sapp, Jessie On-schedule
Egerson, Marc Yes
Spann, Octavius Yes
Thornton, Josh Yes

2006 Summers, DaJuan On-schedule
Macklin, Vernon Yes
Rivers, Jeremiah Yes

2007 Wright, Chris On-schedule
Mescheriakov, Nikita On-schedule
Freeman, Austin On-schedule
Wattad, Omar On-schedule
Vaughn, Julian Yes On-schedule

2008 Monroe, Greg On-schedule
Clark, Jason On-schedule
Sims, Henry On-schedule

What's interesting is that last year's GSR (82%) and this year's GSR (70%) don't work if players count only as pass/fail. For instance, the 1998-2001 cohort had 15 players (I don't think walk-ons count), and 0.70 x 15 = 10.5. There were 8 players who stayed 4 years (and I believe all graduated), 6 who transferred and Mike Sweetney turned pro. Perhaps academically-eligible transfers out count for 0.5 points? Then, we'd get (assuming 1 non-eligible transfer [HT?]):
8 x 1 + 5 x 0.5 + 2 x 0 = 10.5 out of 15 = 70%

If this is the correct method, then it should work for the 1997-2000 cohort (close):
11 x 1 + 3 x 0.5 + 1 x 0 = 12.5 out of 15 = 83%

So, assuming I'm doing the math correctly (and for some reason, I don't think I am), let's run a few more cohorts:

Year Cohort Full Partial None GSR
2011 2001-2004 8 5 2 70% (assumes JG does NOT graduate in time)
2010 2000-2003 8 4 2 71%
2009 1999-2002 9 4 2 73%
2008 1998-2001 8 5 2 70%
2007 1997-2000 11 3 1 83%

So there you have it - it appears that Georgetown will be sitting at our current GSR for at least the next 3 years.

-------------------------------------------------------------------------------------

I've finished running (and posting) those HD box scores for the last 2 seasons - if I haven't posted a game, it's because I don't have the play-by-play to run it. I hope to post something this weekend that will show what you can do with the underlying data, once you've collected it.

Also, I've gathered just about all of my various highlight clips onto one page (here), if you're looking to kill some time. You can also find the clips (and lots of other things) by using the Popular Tags box down in the sidebar.

Saturday, October 4, 2008

Further update on HD box scores

I've spent some more time debugging my data loader, and I think it is now working.
  • I've updated the HD boxes in the previous post, as run through my HD Box Score MakerTMVersion 1.0.
    • I was wrong about there being 4 offensive fouls in the Villanova game; now it looks like just 2 (Reynolds and Sumpter).
    • I am now generating full headers and footers for each box score. If you'd like to see something else added, let me know.
    • Totals are now calculated from the play-by-play rather than the box score. Expect some differences from the official box score.

  • I have begun to post HD boxes retroactively. I've added all games from Nov. & Dec. 2006 except for the win at Vanderbilt (11/15), since the play-by-play for that game had no substitution data. I hope to have all available games from the 2006-7 and 2007-8 seasons added before the start of this season.
  • Having run about 15 or 20 games so far, I'm slowly resigning myself to the fact that both the play-by-play and the official box score contain mistakes on occassion, and these could only be corrected by video review (which I'm not likely to do). I suspect that KenPom abandoned his attempt to provide these HD boxes because of this. However, so long as the mistakes are few enough, I will stick to my pledge to run all Big East conf. games. I promise not give up until at least December 30th.