Saturday, May 3, 2014

Plus Minus, Box Scores & SportsVU - a bit of bridging the gap (Part I)

Hello everybody!
I recently hit 1000 viewers which means - actually nothing (memo to myself: try to be first responder to as many Grantland posts as possible). Some weeks ago, after ESPN published Real Plus Minus (RPM), I dabbled a bit into comparing it to 'normal' box score stats - and I have to admit it was partly rubbish (but at least it looks cool!).
(Note: If you get bored during the next paragraphs, simply scroll down to the new fancy figures...)
The biggest critique points are in my opinion that RPM is a stat that already includes box score stats and that I used a model that only compared one stat with RPM at a time. The first problem is directly obvious: If I compare assists with a stat that indirectly concludes 'assists are super!', then I don't know nothing. The second problem is a little bit more tricky. Imagine that assists correlate with turnovers (which is actually true, so you do not have to try very hard). This influences the analysis, as turnovers are generally seen as negative and assists as positive (for some strange reasons), but both could correlate as positive.
So, I started to use multiple linear regression, which sounds more dangerous than it is. Linear regression is basically: you have beans lying on the floor. Put a stick on the floor so that the beans are on average as close to the stick as possible. In the case of two factors, your beans are floating in space and you have to put on your space suit and adjust a board in a way that the beans are as close to it as possible1. I decided to not go further than two dimensions for the moment. That's probably another post.

To give you an idea about the data analysis:
Player selection:
- To avoid noisy data, I focus on players that played at least 42 games and 20 minutes per game.
- It does not make sense to compare the assist numbers of Chris Paul with the assist numbers of Dwight Howard. So players are sorted into Point Guards, Wings and Bigs which are separated using Time of possession (PGs dribble the ball a lot) & FG attempts against Player at the rim (bigs are standing close to the rim a lot), both normalized for 36 minutes. See the figure below to convince you that it works pretty neat. It also means that some players are considered as playing two positions - but we can live with that.
- These criteria give me 52 Point Guards, 87 Wings and 85 Bigs

Data selection:
RAPM (from and @talkingpractice): regularized Adjusted Plus Minus does not use any box score stats. It's split up into offensive and defensive Plus Minus, so that we can neatly separate offensive and defensive stats
Pace Adjusted stats from Quick explanation. Given all other players are equal and and your RAPM is minus 10. If you play 100 possessions  then your team is 10 points behind (and you probably shouldn't play basketball). So, the stat does not depend on the pace that your team plays. Thus, I mostly tried to use stats that work this way. In the following, if there is a stat like AST, it means Assist %, which means 'an estimate of the percentage of teammate field goals a player assisted while he was on the floor'. I will explain some of the stats on the fly. In any case, you should check both gotbuckets and basketball-reference, they give you so much information that you never needed. Speaking of so much information
SportsVU stats from You can get lost in these stats. For now, I focused on two things. First, the 'contesting shots at the rim' stats, which is a new dimension in box score stats. Second, the contested rebound stats, because people like to throw them around. They basically give you an information, which percentage of your rebound attempts are contested and if you get those contested rebounds.

Okay, enough with the setup. Let's look at the data! What you will see in the following, is how well different stats correlate with plus minus. If I were a commercial page, I would probably use 'explain' instead of 'correlate' and find other ways to make my data way more predictive sounding than it is. Please keep in mind, that what I measure does not necessarily imply causation. Anyhow, we start with

The Offense

What we see here for both Point Guards and Wings, is that having a good shooting percentage (TS or eFG), while being able to distribute the ball, is what mostly seem to define a good player. There are some additional factors (Steal Percentage for guards or Possible REBP for Wings - which means the percentage of rebounds a player gets compared to the number of rebounds he could get), but mostly you simply should play like LeBron James or Kevin Durant (surprise!). R2 values of 0.5 or higher are pretty solid in terms of correlation.
The picture is by far not as clear for bigs, where the maximum R2 values are around 0.2. One reason for this might be that bigs contain two main player types. On the one hand you have traditional center like Dwight Howard, on the other hand you have Dirk Diggler and even Paul Pierce nowadays often plays as some kind of second big man. You see this at the red square that combines USG (Usage percentage of a player) and 3PAr (number of 3 Pointers a player attempts). GotBuckets recently had similar conclusions paired with deeper analysis. It is interesting to note that it's only a few of the bigs that even attempt three pointer - but most of them to a very positive effect. It is also interesting to see that for Centers TOV (turnover percentage) has the highest Rvalue. Please keep in mind that offensive fouls (like brick wall blocks) also count as turnovers. So Kendrick Perkins favorite occupation on offense is tracked. TOV & AST, combined with TS and USG would be my WOC (weapon of choice) if I had to evaluate big men.

But you know what, enough TS and other BS for today, the sun is still shining and there are some great games on tonight. I'll tackle defense and general PM the next days.

I wish everybody a nice weekend,

1 I hope this helped. Otherwise it was probably the most boring sequel to Gravity that you can imagine.

Data from &


  1. Good start. I'd find this much more intriguing if the chart also had roll-up metrics on it, RAPM or RPM and PER or Offensive rating.


    1. What are roll-up metrics? My idea was, to combine a Plus Minus stat with stats that are directly understandable. PER & Offensive Rating for example merge all kind of stats.