Predicting NBA Player Success or Failure
An Econometrics 120C Honors Research Project
By Aaron Chou, Daniel Rubin and William Wolfe
Introduction
In basketball there is no perfect way to determine how well a player is likely to perform
in the National Basketball Association (NBA) when they are drafted during their college
years.
In the NBA draft (the draft), players are selected by one of the thirty teams in the NBA based
on how well the teams expect the players to perform in the NBA. Generally, the best
players are
drafted first, and successive draft picks are widely accepted to be less talented.1
However, at the
time this paper was written, there were no good formulas in place to accurately predict how
well a player might do in the NBA, based on their college statistics. This research paper
will
attempt to develop such a formula. Our goal is to predict how successful a player will be
in the
NBA, given only his college statistics.
Outside of the merit of performing regression analysis for academic purposes, such a
regression might be useful for a wide variety of sports applications. Professional teams
might use
a similar formula to help them make their draft selection. Sports gamblers could use this
formula
to help decide on whom to place their bets. Finally, college coaches could use this
formula to
help improve their players chances of being drafted into the NBA.
The Process
The first step is to find a numerical score that is correlated with how good or successful
a player is in the NBA. An ESPN sports analyst, John Hollinger, has developed a
formula to give
such a numerical score or ranking to current and former NBA players. This score is
called the
Player Efficiency Ranking (PER).2 This score is widely accepted to be the best
numerical
1 The NBA draft utilizes a lottery system. The worst performing teams are given the highest probability of
obtaining a higher draft pick, and therefore, a better player.
determinant of how good a player is. We will use PER, adjusted for assisted and unassisted
field goals, and charges. This adjusted PER (APER) was calculated by hoopdata.com.
This score,
APER, will be our dependent variable in our regression. A player in the NBA is likely to
hit his
highest performing peak after his 3rd year in the NBA. Because of this, we will regress
specifically on NBA players 3rd year APER scores. For reference, the highest APER is
about 32
(LeBron James), and the NBA league average is 13.79.
The second step is to collect data on the 2004 through 2007 NBA draft classes. We
rd collected data on APER (3
year), Years of College, RPI, Min/Game, FG%, PPG, RPG, SPG,
APG, BPG, PPM, RPM, SPM, APM, BPM, Height, Weight, Wingspan, Body Fat, Max
Vert,
Agility and Sprint scores.3 In collecting this data, we had to be careful to avoid including
pre
2005 players, who did not attend college, and international players.
The third step is to regress APER on different combinations of variables. After many
different attempts, we decided that the best and most accurate regression did not
include many of
the variables which we collected data on. Specifically, the variables we threw out were:
Height,
sprint, weight, blocks, wingspan and agility. The determination to remove these
variables was
based on their high pvalues and corresponding statistical insignificance.
2
PER = (1 / MP) * [ 3P + (2/3) * AST + (2  factor * (team_AST / team_FG)) * FG + (FT *0.5 * (1 + (1 
(team_AST / team_FG)) + (2/3) * (team_AST / team_FG)))  VOP * TOV  VOP * DRB% * (FGA  FG) 
VOP * 0.44 * (0.44 + (0.56 * DRB%)) * (FTA  FT) + VOP * (1  DRB%) * (TRB  ORB) + VOP * DRB% *
ORB + VOP * STL + VOP * DRB% * BLK  PF * ((lg_FT / lg_PF)  0.44 * (lg_FTA / lg_PF) * VOP) ],
Where: factor = (2 / 3)  (0.5 * (lg_AST / lg_FG)) / (2 * (lg_FG / lg_FT)) VOP = lg_PTS / (lg_FGA  lg_ORB
+ lg_TOV + 0.44 * lg_FTA), and
DRB% = (lg_TRB  lg_ORB) / lg_TRB SOURCE: http://www.basketballreference.com/about/per.html
3 RPI = Ratings Percentage Index (based on a teams wins, losses and strength of schedule). FG% =
Field goal percentage. PPG, RPG, SPG, APG, BPG = Points, Rebounds, Steals, Assists, and Blocks, per
game, respectively. These were also calculated Per Minute (i.e. PPM = Points Per Minute). Wingspan =
total length of outstretched arms. Max Vert = players highest vertical jump.
2
The Result
APER =  25.83 ( 8.75 ) + .3max ( .13
Vert )
+
54.04
spm ( 30.81
) + 32.69 apm + 23.46 rpm + 4.8 ppm +
.16
fg ( 14.91 ) ( 8.96 ) ( 6.23 ) ( .10
) + .4min ( .11
)
 1.15 years ( .51 ) Analysis
The reasoning behind throwing out these variables is somewhat intuitive. Considering
the
fact that the NBA has the highest and most competitive play of basketball, variables that
are
only physical attributes such as height, weight, and wingspan should not reasonably be
an
indicator of success. While simply being tall or big will certainly give you an advantage
at the
high school or even perhaps the college level, the fact of the matter is the NBA
represents the
best and most skilled players in the world. A simple height or weight advantage will not
be
enough to make you better than everyone else. The other two variables, sprints and
agility, are
timed events all draft prospects go through that measure how fast they are. Despite the
fact that
speed certainly gives you an advantage, the game of basketball is not a sprint. Thus the
ability to
run faster than other players should not determine how good of a player you are. The
last statistic
we decided to omit, blocks, was the only oncourt statistic we threw out. There was one
notable
attribute to the variable blocks: its coefficient was negative. In other words, the more
blocks a
player gets at the collegiate level, the lower his efficiency rating would be. How could
this be
possible? One reason is the volatility of drafting college big men. For every great post
player
such as Yao Ming, Shaquille ONeal and Tim Duncan, there is a Patrick OBryant, Michael
Olowokandi, and Kwame Brown. That is, drafting centersplayers that tend to get the
most
blocksis a huge risk. You could get a great player such as Yao Ming, or you could get
a player
that does not even play in the league anymore after three years in Patrick OBryant. Most big

.02
rpi ( .01
)
3
men never develop into players the team that drafted them envisioned they would
become. As a
result, the coefficient on blocks was negative; the variance on the variable blocks is so
unpredictable that this statistic subsequently had a large pvalue. That is how we
determined why
these variables be removed: based on their high pvalues and corresponding statistical
insignificance.
The remaining variables we had left included : maxVert, spm, apm, rpm, ppm, fg, min,
years, and rpi. All of these variables were statistically significant at the 5% level except
points
per minute (ppm) and field goals (fg). These two variables were left in the regression
because the
number of points scored and the number of field goals made should be an indicator of
how good
a player is; games are decided by points, and with the exception of free throws, field
goals are
the only way to score points. One notable adjustment we made on these variables is the
adjustment for time and pace. College coaches all have different coaching philosophies.
As a
result, the amount of minutes they let their players play and the speed of the game in
which they
play vary across all teams. Certain coaches balance the minutes allotted to their players
while
other coaches allow their players to play entire games. Some teams emphasize
offensive and play
at a very fast pace, thus accumulating more statistics, while other teams play at a
sluggish pace
and as a result do not build their own personal statistics. To deal with this inconsistency
across
teams, we decided to adjust the statistics we used. For example, instead of seeing how
many
points a player averaged a game, we looked at how many points he scored per minute.
That way,
a player who played less minutes in a game would not be penalized. We subsequently
adjusted
steals, assists, and rebounds to fit into these criteria as well.
In terms of predicting APER, you can see that all have a positive coefficient; that is the
more points you score, the more assists you get, the more steals you make, etc., the
higher your
4
efficiency rating will be. Looking at the regression, it seems the variable steals has the
highest
coefficient and thus the greatest effect on becoming a better basketball player. For
every extra
steal you make in a minute, your efficiency rating should go up by 54.04., whereas
every assist
or every rebound made only gives leads to increases of 32.69 and 23.46 respectively.
However,
the frequency of steals made in a game is much lower than the number of blocks and
assists
made. From our data, the average steals per minute is .04, the average assists per
minute is .08,
and the average rebounds per minute is .2. The average player makes twice as many
assists per
game than steals, and five times as many rebounds. As a result, for the average player,
it is much
easier to become a better player by getting more assists.
The remaining variables, maximum vertical jump (maxVert), years played in
college (years), and team record/strength of schedule (rpi) all can be explained
instinctively.
Basketball is a game that depends on athleticism; the most exciting players such as
Michael
Jordan, Dominique Wilkins, Lebron James, and Kobe Bryant are all great athletes. To
become a
good basketball player, you have to be athletic, and the athletic attribute that translates
the best
onto the hardwood floor is a players vertical jump. The higher you jump the more this separates
you from your defender. As a result, every additional inch to a players vertical jump should
translate to an additional .3 in efficiency rating.
The next variable, RPI, had the lowest pvalue out of all variables, and with good
reason.
RPI is a basically a measurement of how good the team the player was on was. If the
player was
on the best team in the nation, his RPI would be 1, and if he was on the worst team, it
would be
347. As a result, the coefficient on RPI is negative; the worse your team is, the more it
negatively
affects you at the college level. Intuitively, this makes sense. If you play on a great
collegiate
team, the level of competition is higher and thus prepares you for the next level. At the
same
5
time, if you have comparable statistics to a player on a lesser team, chances are you
will be more
prepared to play in the NBA than the other person.
Finally, the last variable used was years of basketball played in college. While this
statistic was statistically significant, it plays no role in forecasting or causal inference.
Since the
coefficient is negative, this suggests the longer you stay in college the worse of a
basketball
player you will be. Playing less basketball in college does not make you a better
basketball
player. Rather, if you are a good basketball player, you cannot afford to waste your time
playing
at the collegiate level when you could be making millions playing in the NBA. Thus, the
negative coefficient is a reflection of how good players leave college early to pursue the
NBA. In
other words the variable, years, is a selffulfilling prophecy.
When parsing through our data, there were some players that we did not put into our
regression. Certain players had to be thrown out of the regression despite the fact that
they were
drafted. The fact that our regression was heavily based on college statistics limited us to
only
college players. Thus, high school players who decided to forgo college and directly
enter the
draft could not be put into the regression, and neither could international players since
we had no
way of reconciling international basketball statistics with college statistics. Furthermore,
the
issue of injuries and trades also forced us to reconsider certain players in our
regression. The
problem with injuries and trades are comparable: injured players have less games
played and
thus a smaller sample size in terms of examining their efficiency rating. Another impact
of
injuries is that the player decides to play injured and subsequently performs at a lower
level than
he is capable compared to when he is healthy. The other issue, trades, presents a
similar
problem. Players that are traded to another team during the season face several
hurdles: they
must learn a new playbook and incorporate themselves to completely new team, coach,
and
6
environment. As a result, players that have been traded usually do not see as much
playing time
as they must deal with the adjustment of playing on a different team. Even when they do
get time
on the floor, their quality of play is again likely to be lower than their capabilities. Indeed,
the
lack of familiarity that results from being traded devalues a players efficiency rating.
Thus, we
found it best not to include players that fell in the following categories: players that did
not go to
college, players that came from international leagues, players that were traded in the
middle of a
season, and players that were injured for prolonged periods of time.
Next Steps
After completing our regression, our next step was application. We wanted to use our
regression to predict future NBA success for current college basketball players. Seeing
as the
20092010 college basketball season had just came to an end, the nations top college players
had recently entered their names into the 2010 NBA Draft, which takes place in June.
Many
basketball analysts have made their prediction on who the top picks in the draft will be.
We
must make the assumption that these predictions correlate with who they think will be
the best
NBA players. NBA.com created a consensus mock draft which they state, The Consensus
Mock Draft is a compilation of the best mock drafts around the web. We bring them
together to
come up with a good estimate of how the draft could play out. They predict the top 10 in the
NBA draft will play out in Table 1. In Table 2, we entered 36 of the best players who
declared
for the NBA draft into our regression.
In comparison to the Consensus Mock Draft, we both identified John Wall as the
number
one pick, and Evan Turner as the second overall pick. Overall, we had 8 of the same 10
picks.
7
After we completed this project, we found out that John Hollinger did a similar
regression
analysis to create what he called his Draft Rater. Table 3 shows Hollingers top nine players
according to his Draft Rater. We can see that Hollingers and our regressions predicted 8 of the
same top 9. With the similarities between ours and the other two, it is safe to say that
we have a
valid regression.
Table 1 Table 2 Table 3
NBA.com Our Regression John Hollinger
1 John Wall John Wall DeMarcus Cousins
2 Evan Turner Evan Turner Evan Turner
3 Derrick Favors Wesley Johnson John Wall
4 DeMarcus Cousins Greg Monroe Greg Monroe
5 Wesley Johnson Darington Hobson Derrick Favors
6 AlFarouq Aminu Luke Babbitt Xavier Henry
7 Greg Monroe DeMarcus Cousins Luke Babbitt
8 Cole Aldrich Derrick Favors AlFarouq Aminu
9 Ed Davis AlFarouq Aminu Wesley Johnson
10 Ekpe Udoh Ekpe Udoh
8
Limitations
While the comparisons of our regression results with other prominent draft forecasts
compare favorably, the limitations of our project are the same as any other draft
predictions; we
dont know how successful these players will be in the NBA until they actually play. Also, as
we noted earlier, APER isnt a perfect indicator of how good an NBA player is. It is the best
quantitative estimate that we know of.
In addition there are many unobservable variables that could be correlated with how
good
a player is. For example, the mental toughness of a player as well as work ethic are
probably
strongly correlated with how good a player turns out. Yet each of these variables are
hard to
measure. Another factor of how good a player turns out is the environment they play in.
For
example, if a player gets drafted by a team that already has a superstar playing that
position, odds
are that the younger player will have limited playing time which could limit his growth as
a
player.
Lastly, other decisions besides measurable statistics factor into a teams decision when
they are drafting a player. Players are also evaluated by their character. If a team
believes that a
player may be as dedicated as he should be, that could factor into their drafting
decision. Also,
some teams choose not to draft the player who they consider the best available. Rather
they draft
a player based on what position the team has a need at. Other factors that might
influence a
teams drafting decision include possible health or injury risks for certain players, and how a
players skill set translates from the collegiate to the NBA style of basketball.
9