Introduction In basketball there is no perfect way to determine how well a player is likely to perform in the National Basketball Association (NBA) when they are drafted during their college years. In the NBA draft (the draft), players are selected by one of the thirty teams in the NBA based on how well the teams expect the players to perform in the NBA. Generally, the best players are drafted first, and successive draft picks are widely accepted to be less talented.1 However, at the time this paper was written, there were no good formulas in place to accurately predict how well a player might do in the NBA, based on their college statistics. This research paper will attempt to develop such a formula. Our goal is to predict how successful a player will be in the NBA, given only his college statistics. Outside of the merit of performing regression analysis for academic purposes, such a regression might be useful for a wide variety of sports applications. Professional teams might use a similar formula to help them make their draft selection. Sports gamblers could use this formula to help decide on whom to place their bets. Finally, college coaches could use this formula to help improve their players chances of being drafted into the NBA. The Process The first step is to find a numerical score that is correlated with how good or successful a player is in the NBA. An ESPN sports analyst, John Hollinger, has developed a formula to give such a numerical score or ranking to current and former NBA players. This score is called the Player Efficiency Ranking (PER).2 This score is widely accepted to be the best numerical 1 The NBA draft utilizes a lottery system. The worst performing teams are given the highest probability of obtaining a higher draft pick, and therefore, a better player.
determinant of how good a player is. We will use PER, adjusted for assisted and unassisted field goals, and charges. This adjusted PER (APER) was calculated by hoopdata.com. This score, APER, will be our dependent variable in our regression. A player in the NBA is likely to hit his highest performing peak after his 3rd year in the NBA. Because of this, we will regress specifically on NBA players 3rd year APER scores. For reference, the highest APER is about 32 (LeBron James), and the NBA league average is 13.79. The second step is to collect data on the 2004 through 2007 NBA draft classes. We rd collected data on APER (3 year), Years of College, RPI, Min/Game, FG%, PPG, RPG, SPG, APG, BPG, PPM, RPM, SPM, APM, BPM, Height, Weight, Wingspan, Body Fat, Max Vert, Agility and Sprint scores.3 In collecting this data, we had to be careful to avoid including pre 2005 players, who did not attend college, and international players. The third step is to regress APER on different combinations of variables. After many different attempts, we decided that the best and most accurate regression did not include many of the variables which we collected data on. Specifically, the variables we threw out were: Height, sprint, weight, blocks, wingspan and agility. The determination to remove these variables was based on their high p-values and corresponding statistical insignificance. 2 PER = (1 / MP) * [ 3P + (2/3) * AST + (2 - factor * (team_AST / team_FG)) * FG + (FT *0.5 * (1 + (1 - (team_AST / team_FG)) + (2/3) * (team_AST / team_FG))) - VOP * TOV - VOP * DRB% * (FGA - FG) - VOP * 0.44 * (0.44 + (0.56 * DRB%)) * (FTA - FT) + VOP * (1 - DRB%) * (TRB - ORB) + VOP * DRB% * ORB + VOP * STL + VOP * DRB% * BLK - PF * ((lg_FT / lg_PF) - 0.44 * (lg_FTA / lg_PF) * VOP) ], Where: factor = (2 / 3) - (0.5 * (lg_AST / lg_FG)) / (2 * (lg_FG / lg_FT)) VOP = lg_PTS / (lg_FGA - lg_ORB + lg_TOV + 0.44 * lg_FTA), and DRB% = (lg_TRB - lg_ORB) / lg_TRB SOURCE: http://www.basketball-reference.com/about/per.html 3 RPI = Ratings Percentage Index (based on a teams wins, losses and strength of schedule). FG% = Field goal percentage. PPG, RPG, SPG, APG, BPG = Points, Rebounds, Steals, Assists, and Blocks, per game, respectively. These were also calculated Per Minute (i.e. PPM = Points Per Minute). Wingspan = total length of outstretched arms. Max Vert = players highest vertical jump. 2
The Result APER = - 25.83 ( 8.75 ) + .3max ( .13 Vert ) + 54.04 spm ( 30.81 ) + 32.69 apm + 23.46 rpm + 4.8 ppm + .16 fg ( 14.91 ) ( 8.96 ) ( 6.23 ) ( .10 ) + .4min ( .11 ) - 1.15 years ( .51 ) Analysis The reasoning behind throwing out these variables is somewhat intuitive. Considering the fact that the NBA has the highest and most competitive play of basketball, variables that are only physical attributes such as height, weight, and wingspan should not reasonably be an indicator of success. While simply being tall or big will certainly give you an advantage at the high school or even perhaps the college level, the fact of the matter is the NBA represents the best and most skilled players in the world. A simple height or weight advantage will not be enough to make you better than everyone else. The other two variables, sprints and agility, are timed events all draft prospects go through that measure how fast they are. Despite the fact that speed certainly gives you an advantage, the game of basketball is not a sprint. Thus the ability to run faster than other players should not determine how good of a player you are. The last statistic we decided to omit, blocks, was the only on-court statistic we threw out. There was one notable attribute to the variable blocks: its coefficient was negative. In other words, the more blocks a player gets at the collegiate level, the lower his efficiency rating would be. How could this be possible? One reason is the volatility of drafting college big men. For every great post- player such as Yao Ming, Shaquille ONeal and Tim Duncan, there is a Patrick OBryant, Michael Olowokandi, and Kwame Brown. That is, drafting centersplayers that tend to get the most blocksis a huge risk. You could get a great player such as Yao Ming, or you could get a player that does not even play in the league anymore after three years in Patrick OBryant. Most big - .02 rpi ( .01 ) 3
men never develop into players the team that drafted them envisioned they would become. As a result, the coefficient on blocks was negative; the variance on the variable blocks is so unpredictable that this statistic subsequently had a large p-value. That is how we determined why these variables be removed: based on their high p-values and corresponding statistical insignificance. The remaining variables we had left included : maxVert, spm, apm, rpm, ppm, fg, min, years, and rpi. All of these variables were statistically significant at the 5% level except points per minute (ppm) and field goals (fg). These two variables were left in the regression because the number of points scored and the number of field goals made should be an indicator of how good a player is; games are decided by points, and with the exception of free throws, field goals are the only way to score points. One notable adjustment we made on these variables is the adjustment for time and pace. College coaches all have different coaching philosophies. As a result, the amount of minutes they let their players play and the speed of the game in which they play vary across all teams. Certain coaches balance the minutes allotted to their players while other coaches allow their players to play entire games. Some teams emphasize offensive and play at a very fast pace, thus accumulating more statistics, while other teams play at a sluggish pace and as a result do not build their own personal statistics. To deal with this inconsistency across teams, we decided to adjust the statistics we used. For example, instead of seeing how many points a player averaged a game, we looked at how many points he scored per minute. That way, a player who played less minutes in a game would not be penalized. We subsequently adjusted steals, assists, and rebounds to fit into these criteria as well. In terms of predicting APER, you can see that all have a positive coefficient; that is the more points you score, the more assists you get, the more steals you make, etc., the higher your 4
efficiency rating will be. Looking at the regression, it seems the variable steals has the highest coefficient and thus the greatest effect on becoming a better basketball player. For every extra steal you make in a minute, your efficiency rating should go up by 54.04., whereas every assist or every rebound made only gives leads to increases of 32.69 and 23.46 respectively. However, the frequency of steals made in a game is much lower than the number of blocks and assists made. From our data, the average steals per minute is .04, the average assists per minute is .08, and the average rebounds per minute is .2. The average player makes twice as many assists per game than steals, and five times as many rebounds. As a result, for the average player, it is much easier to become a better player by getting more assists. The remaining variables, maximum vertical jump (maxVert), years played in college (years), and team record/strength of schedule (rpi) all can be explained instinctively. Basketball is a game that depends on athleticism; the most exciting players such as Michael Jordan, Dominique Wilkins, Lebron James, and Kobe Bryant are all great athletes. To become a good basketball player, you have to be athletic, and the athletic attribute that translates the best onto the hardwood floor is a players vertical jump. The higher you jump the more this separates you from your defender. As a result, every additional inch to a players vertical jump should translate to an additional .3 in efficiency rating. The next variable, RPI, had the lowest p-value out of all variables, and with good reason. RPI is a basically a measurement of how good the team the player was on was. If the player was on the best team in the nation, his RPI would be 1, and if he was on the worst team, it would be 347. As a result, the coefficient on RPI is negative; the worse your team is, the more it negatively affects you at the college level. Intuitively, this makes sense. If you play on a great collegiate team, the level of competition is higher and thus prepares you for the next level. At the same 5
time, if you have comparable statistics to a player on a lesser team, chances are you will be more prepared to play in the NBA than the other person. Finally, the last variable used was years of basketball played in college. While this statistic was statistically significant, it plays no role in forecasting or causal inference. Since the coefficient is negative, this suggests the longer you stay in college the worse of a basketball player you will be. Playing less basketball in college does not make you a better basketball player. Rather, if you are a good basketball player, you cannot afford to waste your time playing at the collegiate level when you could be making millions playing in the NBA. Thus, the negative coefficient is a reflection of how good players leave college early to pursue the NBA. In other words the variable, years, is a self-fulfilling prophecy. When parsing through our data, there were some players that we did not put into our regression. Certain players had to be thrown out of the regression despite the fact that they were drafted. The fact that our regression was heavily based on college statistics limited us to only college players. Thus, high school players who decided to forgo college and directly enter the draft could not be put into the regression, and neither could international players since we had no way of reconciling international basketball statistics with college statistics. Furthermore, the issue of injuries and trades also forced us to reconsider certain players in our regression. The problem with injuries and trades are comparable: injured players have less games played and thus a smaller sample size in terms of examining their efficiency rating. Another impact of injuries is that the player decides to play injured and subsequently performs at a lower level than he is capable compared to when he is healthy. The other issue, trades, presents a similar problem. Players that are traded to another team during the season face several hurdles: they must learn a new playbook and incorporate themselves to completely new team, coach, and 6
environment. As a result, players that have been traded usually do not see as much playing time as they must deal with the adjustment of playing on a different team. Even when they do get time on the floor, their quality of play is again likely to be lower than their capabilities. Indeed, the lack of familiarity that results from being traded devalues a players efficiency rating. Thus, we found it best not to include players that fell in the following categories: players that did not go to college, players that came from international leagues, players that were traded in the middle of a season, and players that were injured for prolonged periods of time. Next Steps After completing our regression, our next step was application. We wanted to use our regression to predict future NBA success for current college basketball players. Seeing as the 2009-2010 college basketball season had just came to an end, the nations top college players had recently entered their names into the 2010 NBA Draft, which takes place in June. Many basketball analysts have made their prediction on who the top picks in the draft will be. We must make the assumption that these predictions correlate with who they think will be the best NBA players. NBA.com created a consensus mock draft which they state, The Consensus Mock Draft is a compilation of the best mock drafts around the web. We bring them together to come up with a good estimate of how the draft could play out. They predict the top 10 in the NBA draft will play out in Table 1. In Table 2, we entered 36 of the best players who declared for the NBA draft into our regression. In comparison to the Consensus Mock Draft, we both identified John Wall as the number one pick, and Evan Turner as the second overall pick. Overall, we had 8 of the same 10 picks. 7
After we completed this project, we found out that John Hollinger did a similar regression analysis to create what he called his Draft Rater. Table 3 shows Hollingers top nine players according to his Draft Rater. We can see that Hollingers and our regressions predicted 8 of the same top 9. With the similarities between ours and the other two, it is safe to say that we have a valid regression. Table 1 Table 2 Table 3 NBA.com Our Regression John Hollinger 1 John Wall John Wall DeMarcus Cousins 2 Evan Turner Evan Turner Evan Turner 3 Derrick Favors Wesley Johnson John Wall 4 DeMarcus Cousins Greg Monroe Greg Monroe 5 Wesley Johnson Darington Hobson Derrick Favors 6 Al-Farouq Aminu Luke Babbitt Xavier Henry 7 Greg Monroe DeMarcus Cousins Luke Babbitt 8 Cole Aldrich Derrick Favors Al-Farouq Aminu 9 Ed Davis Al-Farouq Aminu Wesley Johnson 10 Ekpe Udoh Ekpe Udoh 8
Limitations While the comparisons of our regression results with other prominent draft forecasts compare favorably, the limitations of our project are the same as any other draft predictions; we dont know how successful these players will be in the NBA until they actually play. Also, as we noted earlier, APER isnt a perfect indicator of how good an NBA player is. It is the best quantitative estimate that we know of. In addition there are many unobservable variables that could be correlated with how good a player is. For example, the mental toughness of a player as well as work ethic are probably strongly correlated with how good a player turns out. Yet each of these variables are hard to measure. Another factor of how good a player turns out is the environment they play in. For example, if a player gets drafted by a team that already has a superstar playing that position, odds are that the younger player will have limited playing time which could limit his growth as a player. Lastly, other decisions besides measurable statistics factor into a teams decision when they are drafting a player. Players are also evaluated by their character. If a team believes that a player may be as dedicated as he should be, that could factor into their drafting decision. Also, some teams choose not to draft the player who they consider the best available. Rather they draft a player based on what position the team has a need at. Other factors that might influence a teams drafting decision include possible health or injury risks for certain players, and how a players skill set translates from the collegiate to the NBA style of basketball. 9