Saturday, May 10, 2014

Bridging the gap, Part III - Throwing the kitchen sink

Hi everyone,

for now, part III is going to be the final chapter of my Plus Minus / box score / tracking data comparison. Basically like 'Return of the Jedi', only with less Ewoks.
After looking at correlations between offensive and defensive Plus Minus and pairs of other data, I decided to look at general plus minus and everything at the same time. Using 17 different statistics, I am going to look at all possible combinations, to find the best ones for predicting the data. All possible combinations are in this case 217-1 = 131071 = a shit load of possibilities. I also decided to add a new type of players to my previous groups of Point Guards, Wings & Bigs, which I called 'Very Bigs'. It's basically only those bigs that are not attempting any three pointer. The idea is to get a more homogeneous group.

Monday, May 5, 2014

Bridging the gap, Part II - Defense

Hi everybody,

last time I bored you to death, I tried to find those stats that correlate well with winning a game on offense.
Today, I will take a look at defense and check what in general correlates best with winning a game (because to win a game, you have to usually play both defense and offense).
As all the groundwork got laid out in my last post, I will directly start with the goodies.


Saturday, May 3, 2014

Plus Minus, Box Scores & SportsVU - a bit of bridging the gap (Part I)

Hello everybody!
I recently hit 1000 viewers which means - actually nothing (memo to myself: try to be first responder to as many Grantland posts as possible). Some weeks ago, after ESPN published Real Plus Minus (RPM), I dabbled a bit into comparing it to 'normal' box score stats - and I have to admit it was partly rubbish (but at least it looks cool!).
(Note: If you get bored during the next paragraphs, simply scroll down to the new fancy figures...)
The biggest critique points are in my opinion that RPM is a stat that already includes box score stats and that I used a model that only compared one stat with RPM at a time. The first problem is directly obvious: If I compare assists with a stat that indirectly concludes 'assists are super!', then I don't know nothing. The second problem is a little bit more tricky. Imagine that assists correlate with turnovers (which is actually true, so you do not have to try very hard). This influences the analysis, as turnovers are generally seen as negative and assists as positive (for some strange reasons), but both could correlate as positive.
So, I started to use multiple linear regression, which sounds more dangerous than it is. Linear regression is basically: you have beans lying on the floor. Put a stick on the floor so that the beans are on average as close to the stick as possible. In the case of two factors, your beans are floating in space and you have to put on your space suit and adjust a board in a way that the beans are as close to it as possible1. I decided to not go further than two dimensions for the moment. That's probably another post.