Tuesday, December 3, 2019

Man vs Machine


Man versus Machine (Part I)

What the comparison of two annotation sources can tell us about SportVU and psychology

Preamble: I wrote this post in summer 2015. It never got published at Nylon Calculus, as we did not want to anger the NBA and SportsVU gods to take our data away. Alas, away they took. So, while there is a discussion about tracking data on Twitter right now, here's the article... (I'm sorry, links won't work and grammar errors will remain. Also, I think there is a figure missing in the beginning. I think it died with my hard drive.) (Also, Part II was about assists iirc. That's also gone.)

When you start scraping, analyzing and visualizing nba.com's tracking data, you feel like a kid in a candy store. But after the first sugar rushes start to go away, you realize that some of the candies may be dirty and you probably should gobble them a bit more carefully. Before you now start to mistrust everything that has ever been written on Nylon Calculus, let me assure you that the number of bad candies we are talking about is rather small and should not affect any of the previously done results. We are basically talking about one booger between a thousand bonbons. But obviously it would be nice to eliminate as much discrepancy between data and reality as possible.
The easiest extractable SportVU data is shot related. And the information that SportVU gives us about each shot is the shot distance, touch time and dribblings before the shot occurred. In the following I will first tackle the shot distance and then look at problems with touch time and dribblings. I will compare the manual annotation of assists with the information we have given by SportVU in a later article, as assists are a more philosophical question.
The data comprises all shots of the 2014-15 regular season (mandatory hat tip to Darryl). It might be that there are some hiccups that occurred outside of the SportVU data. But I re-checked several entries manually and always found the bizarre results to already exist in the nba.com database.

Shot distance

Shot distance exists already for the time before SportVU, as there have always been hardworking people that manually annotated every shot taking place on an NBA field. If we compare the shot distance given by those worker bees with the shot distance we receive from our new electronical overlord, we get the following picture:
At first sight, this might look worse to you than it actually is. You have to be aware that the color scale is a log scale, so everything that is between blue and yellow are only a fraction of the close to 200'000 shots. Almost all shots occur in the slightly skewed diagonal rectangle in the middle, which is also reflected in the histogram measuring the difference between manual annotated shot distance and SportVU distance.
So, as a batch measurement, SportVU seems to produce useful distances (Phew!). Yet, we can see two clear regions of artifacts in the 2D histogram, which I underlined red. The reason for the upper one seems obvious and a bit embarrassing: SportVu simply does not believe that somebody could shoot from the own side of the field. This explains the negative slope for shots that are manually declared as more than 50 feet away from the basket (with the length of an NBA court being 94 feet).

The problem we see on the lower right corner, where manual annotations estimate a distance of less than 5 feet and SportVu goes up to 30 becomes more obvious when we use an additional information given by or worker bees. Because they gave every shot an action type label, we can for example look at shots that are labeled as “Jump Shot” or shots labeled as “Dunk Shot”. In the following plot, I compare all shots for which the action type contains the words “Layup” or “Dunk” with all shots containing “Pullup”, “Fadeaway” or “Stepback”:
As you can see, we have a clear problem for Layups and Dunks. For the manual annotation, none of them where declared to be further away than 5 feet (Note: Actually, 3 of 10'000 were. One example is a miss by Young, followed by a missed put back http://stats.nba.com/cvp.html?GameID=0021400385&GameEventID=199# ; no idea what happens on the other two). In comparison, 30% of all dunks alone where apparently made from outside the charging area (4 feet circle), according to SportVU. That is an interesting definition of a dunk. Looking at some of the biggest offenders of dunk distance, we can get an idea what goes wrong: You often either have very straight drives to the basket (Noel, SportVU distance 23.9 feet: http://stats.nba.com/cvp.html?GameID=0021400043&GameEventID=388# ; Olynik, SportVU distance 23.8 feet), or alley hoops (Jordan, SportVU distance 24.5 http://stats.nba.com/cvp.html?GameID=0021400391&GameEventID=071#). But some of them just do not make any sense (like this 25 feet dunk from Gerald Henderson, where the ball is never further away from the basket than may 8 feet http://stats.nba.com/cvp.html?GameID=0021400742&GameEventID=049# ). Under these circumstances, it can of course lead to the problem that we inflate shooting percentages from shots that we think are from 3 to 5 feet.

In comparison, for shots for which we can expect motion that is not directed towards the basket seem to work quite well. An interesting observation is that for all jump shots we have a good agreement between manual and SportVU distance for those shots that are from 23 to 25 feet – basically all three pointer. For midrange jump shots on the other hand, SportVU sees the player a little bit further away from the basket.
There are two scenarios for this. The more likely one is that manual observations lack the precision when the 3 point line is not there to guide your estimation. The ugly alternative would be that SportVU has some kind of ridge regression, pulling shots that are likely 3 pointers towards the 3 point line. Let's not hope that that's the case...
For a few shots I looked at, where manual annotation and SportsVU strongly disagreed, the manual annotation was more often right than not.
As examples where the manual one is correct:
SportVU says 38.4 ft distance, manual 23 http://stats.nba.com/cvp.html?GameID=0021400714&GameEventID=157#
SportVU says 37.3 ft distance, manual 24 http://stats.nba.com/cvp.html?GameID=0021400946&GameEventID=197#
SportVU says 42.9 ft distance, manual 19 http://stats.nba.com/cvp.html?GameID=0021401154&GameEventID=323#
On the other hand I am pretty sure that this shot by is closer to 25.3 feet than to 31 feet
http://stats.nba.com/cvp.html?GameID=0021400108&GameEventID=085#

not a 31 footer
[1] http://stats.nba.com/cvp.html?GameID=0021400108&GameEventID=085#
[1] "player ID:2564; Scoring Player:Boris Diaw; ShotType: Pullup Jump shot; Touch time: 0.9; Shot clock: 2.6; Dribbles: 0; SHOT_DISTANCE: 31.0; SHOT_DIST: 25.3; Distance defender: 6.6; Game Clock: 2:23"

much closer manually
manual is right
[1] http://stats.nba.com/cvp.html?GameID=0021400714&GameEventID=157#
[1] "player ID:2045; Scoring Player:Hedo Turkoglu; ShotType: Jump Shot; Touch time: 0.0; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 23.0; SHOT_DIST: 38.4; Distance defender: 21.0; Game Clock: 8:07"
[1] http://stats.nba.com/cvp.html?GameID=0021400946&GameEventID=197#
[1] "player ID:201228; Scoring Player:CJ Watson; ShotType: Jump Shot; Touch time: 4.4; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 24.0; SHOT_DIST: 37.3; Distance defender: 7.4; Game Clock: 2:48"
[1] http://stats.nba.com/cvp.html?GameID=0021401154&GameEventID=323#
[1] "player ID:202339; Scoring Player:Eric Bledsoe; ShotType: Pullup Jump shot; Touch time: 5.6; Shot clock: 18.3; Dribbles: 6; SHOT_DISTANCE: 19.0; SHOT_DIST: 42.9; Distance defender: 20.8; Game Clock: 6:04"

closer automatic
[1] http://stats.nba.com/cvp.html?GameID=0021400058&GameEventID=313#
[1] "player ID:101112; Scoring Player:Channing Frye; ShotType: Jump Shot; Touch time: 0.0; Shot clock: 15.8; Dribbles: 0; SHOT_DISTANCE: 27.0; SHOT_DIST: 19.0; Distance defender: 9.4; Game Clock: 5:36"
[1] http://stats.nba.com/cvp.html?GameID=0021400301&GameEventID=391#
[1] "player ID:101139; Scoring Player:CJ Miles; ShotType: Jump Shot; Touch time: 0.0; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 26.0; SHOT_DIST: 14.8; Distance defender: 11.2; Game Clock: 8:12"
[1] http://stats.nba.com/cvp.html?GameID=0021400394&GameEventID=162#
[1] "player ID:201583; Scoring Player:Ryan Anderson; ShotType: Jump Shot; Touch time: 0.0; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 26.0; SHOT_DIST: 12.1; Distance defender: 4.2; Game Clock: 8:04"
[1] http://stats.nba.com/cvp.html?GameID=0021400606&GameEventID=499#
[1] "player ID:201163; Scoring Player:Wilson Chandler; ShotType: Jump Shot; Touch time: 0.0; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 27.0; SHOT_DIST: 14.9; Distance defender: 11.3; Game Clock: 1:42"
[1] http://stats.nba.com/cvp.html?GameID=0021400668&GameEventID=289#
[1] "player ID:203081; Scoring Player:Damian Lillard; ShotType: Jump Shot; Touch time: 10.2; Shot clock: 14.0; Dribbles: 12; SHOT_DISTANCE: 25.0; SHOT_DIST: 19.9; Distance defender: 6.5; Game Clock: 8:44"
[1] http://stats.nba.com/cvp.html?GameID=0021400694&GameEventID=281#
[1] "player ID:201155; Scoring Player:Rodney Stuckey; ShotType: Jump Shot; Touch time: 2.8; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 26.0; SHOT_DIST: 18.1; Distance defender: 6.1; Game Clock: 7:06"
[1] http://stats.nba.com/cvp.html?GameID=0021401137&GameEventID=207#
[1] "player ID:203496; Scoring Player:Robert Covington; ShotType: Jump Bank Shot; Touch time: 5.6; Shot clock: 17.7; Dribbles: 5; SHOT_DISTANCE: 25.0; SHOT_DIST: 17.1; Distance defender: 4.1; Game Clock: 6:01"
[1] http://stats.nba.com/cvp.html?GameID=0021401145&GameEventID=381#
[1] "player ID:204060; Scoring Player:Joe Ingles; ShotType: Jump Shot; Touch time: 2.0; Shot clock: 5.8; Dribbles: 1; SHOT_DISTANCE: 25.0; SHOT_DIST: 17.7; Distance defender: 5.9; Game Clock: 1:16"
[1] http://stats.nba.com/cvp.html?GameID=0021401158&GameEventID=087#
[1] "player ID:203897; Scoring Player:Zach LaVine; ShotType: Jump Shot; Touch time: 2.2; Shot clock: 10.1; Dribbles: 1; SHOT_DISTANCE: 25.0; SHOT_DIST: 17.0; Distance defender: 3.2; Game Clock: 2:27"



Distant dunks
Yes
[1] http://stats.nba.com/cvp.html?GameID=0021400043&GameEventID=388#
[1] "player ID:203457; Scoring Player:Nerlens Noel; ShotType: Dunk Shot; Touch time: 3.5; Shot clock: 21.6; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 23.9; Distance defender: 5.8; Game Clock: 11:02"
Tough
[1] http://stats.nba.com/cvp.html?GameID=0021400173&GameEventID=236#
[1] "player ID:203100; Scoring Player:Tony Wroten; ShotType: Dunk Shot; Touch time: 0.0; Shot clock: 14.9; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.9; Distance defender: 0.5; Game Clock: 2:56"
Yes
1] http://stats.nba.com/cvp.html?GameID=0021400371&GameEventID=353#
[1] "player ID:203482; Scoring Player:Kelly Olynyk; ShotType: Driving Dunk Shot; Touch time: 0.9; Shot clock: 18.7; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 23.8; Distance defender: 4.2; Game Clock: 11:28"
Yes
[1] http://stats.nba.com/cvp.html?GameID=0021400391&GameEventID=071#
[1] "player ID:201599; Scoring Player:DeAndre Jordan; ShotType: Alley Oop Dunk Shot; Touch time: 4.0; Shot clock: 20.6; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.5; Distance defender: 5.0; Game Clock: 5:50"
Tough
[1] http://stats.nba.com/cvp.html?GameID=0021400406&GameEventID=143#
[1] "player ID:101123; Scoring Player:Gerald Green; ShotType: Driving Dunk Shot; Touch time: 0.0; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.0; Distance defender: 3.7; Game Clock: 12:00"
[1] http://stats.nba.com/cvp.html?GameID=0021400547&GameEventID=349#
[1] "player ID:203084; Scoring Player:Harrison Barnes; ShotType: Alley Oop Dunk Shot; Touch time: 4.8; Shot clock: 18.6; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 22.4; Distance defender: 4.4; Game Clock: 4:34"
[1] http://stats.nba.com/cvp.html?GameID=0021400691&GameEventID=189#
[1] "player ID:201148; Scoring Player:Brandan Wright; ShotType: Dunk Shot; Touch time: 3.4; Shot clock: 24.0; Dribbles: 1; SHOT_DISTANCE: 0.0; SHOT_DIST: 23.6; Distance defender: 7.6; Game Clock: 6:22"
[1] http://stats.nba.com/cvp.html?GameID=0021400742&GameEventID=049#
[1] "player ID:201945; Scoring Player:Gerald Henderson; ShotType: Alley Oop Dunk Shot; Touch time: 0.0; Shot clock: 9.3; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 25.0; Distance defender: 8.4; Game Clock: 6:42"
[1] http://stats.nba.com/cvp.html?GameID=0021401101&GameEventID=147#
[1] "player ID:202687; Scoring Player:Bismack Biyombo; ShotType: Dunk Shot; Touch time: 0.6; Shot clock: 9.1; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.6; Distance defender: 10.1; Game Clock: 7:41"




http://stats.nba.com/cvp.html?GameID=0021400189&GameEventID=537#
walking the dog
takes after shot clock reset
misses the skip pass
shot clock stops
http://stats.nba.com/cvp.html?GameID=0021400730&GameEventID=008#
momentarily unclear position


shot clock reset

no idea

Alley Hoops
"player ID:203500; Scoring Player:Steven Adams; ShotType: Alley Oop Dunk Shot; Touch time: 14.1; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 5.6; Distance defender: 3.9; Game Clock: 8:10"
"player ID:202685; Scoring Player:Jonas Valanciunas; ShotType: Alley Oop Layup shot; Touch time: 9.8; Shot clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 1.0; SHOT_DIST: 1.5; Distance defender: 0.8; Game Clock: 4:25"

Dribble Alley Hoops
[1] http://stats.nba.com/cvp.html?GameID=0021400456&GameEventID=239#
[1] "player ID:203081; Scoring Player:Damian Lillard; ShotType: Alley Oop Layup shot; Touch time: 11.3; Shot clock: 12.8; Dribbles: 9; SHOT_DISTANCE: 2.0; SHOT_DIST: 4.1; Distance defender: 3.9; Game Clock: 11:49"
[1] http://stats.nba.com/cvp.html?GameID=0021400500&GameEventID=415#
[1] "player ID:201163; Scoring Player:Wilson Chandler; ShotType: Alley Oop Dunk Shot; Touch time: 7.8; Shot clock: 16.9; Dribbles: 6; SHOT_DISTANCE: 0.0; SHOT_DIST: 3.1; Distance defender: 1.9; Game Clock: 7:52"
[1] http://stats.nba.com/cvp.html?GameID=0021400909&GameEventID=406#
[1] "player ID:201566; Scoring Player:Russell Westbrook; ShotType: Alley Oop Layup shot; Touch time: 12.2; Shot clock: 12.0; Dribbles: 10; SHOT_DISTANCE: 1.0; SHOT_DIST: 4.3; Distance defender: 3.4; Game Clock: 6:59"

[1] http://stats.nba.com/cvp.html?GameID=0021400031&GameEventID=327#

Thursday, July 23, 2015

Small list of ggplot2 examples

Hello everybody,

I didn't use ggplot2 that much until 2 weeks ago. Which in my opinion was a big mistake on my behalf. I think my train of thought shifted from
"I just want to plot something, I don't directly get what's going on with this ggplot2 thing. I'll just find a quick solution"
to
"Wow! Once you get the basic idea, the world becomes your oyster!"
Personally, I blame Matlab. I'm just so used to use a different function for calling each plot (do a specific thing with Input data), that I did not directly realize that a plot is basically (Input data) + (do something with it)
Once you get a hang for it, it get's really fun, because you often just have to change a geom_point() to a geom_smooth() to get a completely different thing - or simply put them both in a row!
Anyhow, without further ado, here is my tutorial-like pdf:

Cheers,
Hannes

Thursday, July 9, 2015

Links I favorited about data science

I had a few weeks on where I used Twitter mostly on my phone. So I started blindly favoriting tweets that could be usefull. This blog post is mostly for me to curate all these data related links. If it comes in handy for others the better. I try to sort them a bit after topics... hat tips go to all the data scientists that show an immigrant like me some interesting things (too lazy to list them right now...)

General Methods and algorithms


R

  • R introduction plus text mining course -  @StatsInTheWild is probably my favorite twitter handle
  • dplyr tutorial - I still live in a dplyr less world. Which I guess I should regret every time I write df[df[,'Stat1']>0 & df [,'Stat2']>2,] - people might call that dumb, I call it oldschool
  • ggrepel: I finally can use the ggplot package for messy textplots 

Python

Other languages



Thursday, March 19, 2015

NBA players are creatures of habit

Hello everybody,

this is a follow-up to my last post on 'distancology' – the science of turning all shot charts into one colorful picture. You can find a half-way clean code in my github account. If you run MakeHeatMap.R, you should actually be able to reproduce the result.

One question that one naturally can ask, when comparing the shot distribution of players, is how consistent or reliable those shot distributions are. For example, in my last article I sorted around 200 players into 10 distinguishable groups, using a (vague) cutoff. But I could as well have used 5 or 20 groups. Now, the question regarding reliability is: If you compare year to year, how many players would remain inside the same shot cluster?
Because, if I would label somebody as a 'corner three guy' due to his shot distance distribution in one year, but the next year there is a 50% chance that he's actually a 'typical wing player' guy – that would be pretty useless.1

Long story short, what I did is to combine the distance distributions for two years – and the result is pretty mindblowing2. The following plot works very similar to the one that I used in my previous article. I just changed the column on the left, so that it indicates the effective field goal percentage instead of the shot attempts. This way it is easier for people to be in awe about Stephen Curry. It shows both seasons for every player that had at least 600 attempts (data is from the 3rd of March) during this and the last season – and here it is3:

Monday, March 2, 2015

Before this probable nonsense about 'fouling while leading' gets spread

Hi everyone,

I thought about letting it slip, but then I read this quite from an SI.com article:
"28. The most staggering NBA stat from Sloan? Fouling when winning can increase your chances by 11 percent, according to a paper written by Franklin Kenter of Rice University. The paper shows that fouling near the end of games pretty much makes sense in every situation, whether you’re trailing or leading. When behind, it advises fouling one minute out for every six points you are behind. When leading, it suggests fouling one minute out for every three points you lead."
Side note here: 'The paper shows that fouling...' is a typical case of 'reporter overstates what scientist says'. I guess 'The paper says that their model indicates...' would be more realistic.

But more or less, that's what the paper said. Their reasoning was the following:
"The concept of fouling when ahead may be counterintuitive. However, toward the end of the game, the main goal of the trailing team is to increase the total variance in order to widen the window of possibilities that win the game. One main component in this wider variance is the riskier 3-point shot. The trailing team can limit this variance by fouling. the leading team may give up points, on average, but limit the trailing team to 2 points per possession. This decreases the total variance and, with a sufficient lead, increases the leading team’s chances of winning."
Now, I could go on a very lengthy statistical rant about this. I tried to figure out where they made the mistake in their model, but the paper was too vague in terms of their methods.
The point is this:

Thursday, February 19, 2015

Data dump:Isolation is a meritocracy, miscellaneous is something you should avoid

Hi everybody,

this is mostly a data dump after I looked at the new nba.com Synergy Sports data (next stop tracking data!). Feel free to use the plots.
For players I always show those that have enough attempts

One personal note: The Synergy data can be easily wrongly interpreted. For example, a cut is mostly not a play itself. If you look at the players that attempt a lot of cuts, you find mainly center that most probably are beneficiaries from other stuff that is going on (cue to Blake Griffin throwing a lob to DeAndre Jordan).
Even though cuts have a high value for Points per possession, cutting all the time is not the solution. (This is a very personal note, as my last rec league team had a disastrous knack of cutting into whatever real playtype was going on at that moment...)

Cheers,
Hannes