Tuesday, December 9, 2014

Catch & Shoot vs Pull Up and the risk of binning

Hi everyone,
a short note as I observed several articles making use of data binning. I am not saying that binning is automatically wrong, but it has at least one important potential flaw.

Let's use the catch & shoot vs pull up shooting percentage as an example. It is shown that catch and shoots are generally more open than pull ups. Furthermore, let us say that we define an uncontested shot as any shot where no defender is in a distance of 6 feet (that is binning).
Now, a study might find that uncontested catch and shoot attempts have a higher FG% than uncontested pull ups. But, if we now compare hypothetical distributions for both shot types
we can clearly see that the mean weight is not the same for catch & shoot and pull ups. 
Both for contested and uncontested shots, hypothetical pull up shots are clearly more contested.

That's it for right now. Don't say I didn't warn you.

Update: I only mentioned the problem, but didn't offer any solution. I never had to work with a problem like this myself, but @neuteufel mentioned Propensity score. I guess there are probably a few more standard procedures, but the most important is that you are aware that open is not necessarily open (that sounds weird...)

Tuesday, December 2, 2014

Looking for soulmates (even though I don't believe in them...)

Hi everyone,

I was using hierarchical clustering quite a bit over the last month.
And two days ago I found this nice R-script, so I combined the data with my clustering.
Short info before it gets self-explanatory:
- Used all players that had at least 10 games and 12 minutes per game
- as the list is too long to show all the names, I will use subclusters where you can find the names
- tried to use minute normalized data (per 36)
- the effective field goal column is not used for clustering. I did not include any FG percentages in the end (for no real reason...)
- If a value is green, that means that there was a division by zero (mostly because the player never drove)
- If you have other interesting data, feel free to send it to me and I'll send you the figures back. Or if you have Matlab I can give you the code as well (probably should clean it beforehand...)

Without further ado (click on the figures to enlarge):