Tuesday, March 18, 2014

Data dumps - the problem with low hanging fruits in sports story telling

Disclaimer: The reason I picked recent articles by Kirk Goldsberry and John Schumann for this post is not because they are doing a bad job. The reason is that I like reading their work and thus their articles catch my attention easily. Even my last post ignores some of the things that I'm about to criticize. But, as every up and coming rapper would tell you - the best way to make it in the business is by writing a diss track about the big fishes. ;) (Note: I hope I'll not end up as the Benzino to Kirk's Eminem)
So, here we go:

A tale of two players
Imagine two players taking shots from the right corner of a basketball court. Both of them previously shot 8 of 20 (40%) from beyond the arc at that spot. Now, player A makes the next 5 shots, which raises his shooting percentage to 52%. Player B misses his next 5 shots, which drops his percentage to 33%.
Question 1: How sure are you that player A is a better three point shooter from the right corner than player B.
The scientific answer: There is only an 85% probability that A truly shoots better than B, or in vague terms 'it could be true, but you would not be able to publish it as a scientific result'.
Question 2: How sure are you, if I tell you that player A is Stephen Curry from last season and player B is Stephen Curry from this season? (Note: Up to now, Curry took 22 shots from that position)
So, it is more than a bit misleading, if Kirk uses the term 'Kryptonite' to describe his 33% shooting from that position. This leads to comments by readers like 'Any theories on why he's so much worse from the right corner 3?', followed by others that try to find a reason. The true reason is most likely random noise in making or missing a shot.1

Bizarrely shaped coins
 I always imagine John Schumann as some kind of gold miner.  The amount of interesting stats he finds is close to infinite2. And to compare the stats for contested and uncontested shots is an interesting start for an article. In addition, I am pretty sure that these stats are not completely random. NBA players are not simply bizarrely shaped coins.
BUT - what if they were? Imagine3 you had two coins. The 'uncontested' coin shows head 43.3% of the time. The 'contested' coin shows head 38% of the time.4 Now you flip each of the coins 200 times (John set the minimum threshold of required shots to 100 per 'coin', so the average for all available player will be a bit higher) and count the number of heads. Then you repeat this experiment a thousand times.
You will realize two things: First, your fingers are now bleeding. Second, in around 12.5% of cases, the 'contested' coin showed heads more often than the 'uncontested' coin - which is pretty close to the 11 of 92 (11.96%) players that shot a better contested percentage than uncontested.
Just to make it clear: I agree that there are players that are almost as good at shooting while having a hand in their face, as at taking an uncontested jumper. And these players are not purely coins. But the article is misleading in the sense that it leads readers to actually believe that some players should better be more or less unguarded when they shoot. Because this part can be as well explained by some bizarrely shaped coins. We would need to compare at least several years to see what is really going on. And probably look more carefully what really happens when a player is contested or uncontested. For example, it would be necessary to check if uncontested shots are generally from further away.

Short summary (I promise I make it quick)
Sports data is great and it opens a new way to look at what defines greatness in players. But just like Jeff van Gundy said at the Sloan Sports Conference (I paraphrase):  'If you don't look at the video as well, data can be pretty misleading.'5
I know that things like 'regression to the mean' or 'small sample size' are buzzwords that get boring at some point. But, if articles that rely on data don't regularly mention the caveats of their results, they risk to lose their long term credibility for the short term goal of a good story line. And that's like shooting yourself in the foot. Uncontested.6

1. The more interesting part could be the right side center three, where Curry was above his average in both years and shot a much bigger sample size (164 shots this year). Without watching any video footage , I would guess that it is generally easier to shoot without doing a crossover beforehand (see, I'm also prone to doing this!).
2. Just a rough estimation, I haven't actually calculated his 'Digged Stats Per 48' value.
3. You might need to smoke something for this.
4. These are vague estimations for contested and uncontested percentages. I only know that players on average shot 5.3% better on uncontested shots.
5. I guess he said in a more vigorous way. Like when he gets riled up about a flop.
6. How is that for a 'Hot Sports Take' like finish!? Pulitzer!

No comments:

Post a Comment