Man versus Machine (Part I)
What the comparison of two annotation sources can tell us about SportVU and psychology
Preamble: I wrote this post in summer 2015. It never got published at Nylon Calculus, as we did not want to anger the NBA and SportsVU gods to take our data away. Alas, away they took. So, while there is a discussion about tracking data on Twitter right now, here's the article... (I'm sorry, links won't work and grammar errors will remain. Also, I think there is a figure missing in the beginning. I think it died with my hard drive.) (Also, Part II was about assists iirc. That's also gone.)
When you start scraping, analyzing and
visualizing nba.com's tracking data, you feel like a kid in a candy store. But
after the first sugar rushes start to go away, you realize that some of the
candies may be dirty and you probably should gobble them a bit more carefully.
Before you now start to mistrust everything that has ever been written on
Nylon Calculus, let me assure you that the number of bad candies we are talking
about is rather small and should not affect any of the previously done results.
We are basically talking about one booger between a thousand bonbons. But
obviously it would be nice to eliminate as much discrepancy between data and
reality as possible.
The easiest extractable SportVU data is
shot related. And the information that SportVU gives us about each shot is the
shot distance, touch time and dribblings before the shot occurred. In the
following I will first tackle the shot distance and then look at problems with
touch time and dribblings. I will compare the manual annotation of assists with
the information we have given by SportVU in a later article, as assists are a
more philosophical question.
The data comprises all shots of the 2014-15
regular season (mandatory hat tip to Darryl). It might be that there are some
hiccups that occurred outside of the SportVU data. But I re-checked several
entries manually and always found the bizarre results to already exist in the
nba.com database.
Shot distance
Shot distance exists already for the time
before SportVU, as there have always been hardworking people that manually
annotated every shot taking place on an NBA field. If we compare the shot
distance given by those worker bees with the shot distance we receive from our
new electronical overlord, we get the following picture:
At first sight, this might look worse to
you than it actually is. You have to be aware that the color scale is a log
scale, so everything that is between blue and yellow are only a fraction of the
close to 200'000 shots. Almost all shots occur in the slightly skewed diagonal
rectangle in the middle, which is also reflected in the histogram measuring the
difference between manual annotated shot distance and SportVU distance.
So, as a batch measurement, SportVU seems
to produce useful distances (Phew!). Yet, we can see two clear regions of
artifacts in the 2D histogram, which I underlined red. The reason for the upper
one seems obvious and a bit embarrassing: SportVu simply does not believe that
somebody could shoot from the own side of the field. This explains the negative
slope for shots that are manually declared as more than 50 feet away from the
basket (with the length of an NBA court being 94 feet).
The problem we see on the lower right
corner, where manual annotations estimate a distance of less than 5 feet and
SportVu goes up to 30 becomes more obvious when we use an additional
information given by or worker bees. Because they gave every shot an action
type label, we can for example look at shots that are labeled as “Jump Shot” or
shots labeled as “Dunk Shot”. In the following plot, I compare all shots for
which the action type contains the words “Layup” or “Dunk” with all shots
containing “Pullup”, “Fadeaway” or “Stepback”:
As you can see, we have a clear problem for
Layups and Dunks. For the manual annotation, none of them where declared to be
further away than 5 feet (Note: Actually, 3 of 10'000 were. One example is a
miss by Young, followed by a missed put back http://stats.nba.com/cvp.html?GameID=0021400385&GameEventID=199# ; no idea what happens on the other two). In comparison, 30% of all
dunks alone where apparently made from outside the charging area (4 feet
circle), according to SportVU. That is an interesting definition of a dunk.
Looking at some of the biggest offenders of dunk distance, we can get an idea what goes wrong: You often either have very
straight drives to the basket (Noel, SportVU distance 23.9 feet: http://stats.nba.com/cvp.html?GameID=0021400043&GameEventID=388# ; Olynik, SportVU distance 23.8 feet), or
alley hoops (Jordan, SportVU distance 24.5 http://stats.nba.com/cvp.html?GameID=0021400391&GameEventID=071#). But some of them just do not make any sense
(like this 25 feet dunk from Gerald Henderson, where the ball is never further
away from the basket than may 8 feet http://stats.nba.com/cvp.html?GameID=0021400742&GameEventID=049# ). Under these circumstances, it can of
course lead to the problem that we inflate shooting percentages from shots that
we think are from 3 to 5 feet.
In comparison, for shots
for which we can expect motion that is not directed towards the basket seem to
work quite well. An interesting observation is that for all jump shots we have
a good agreement between manual and SportVU distance for those shots that are
from 23 to 25 feet – basically all three pointer. For midrange jump shots on
the other hand, SportVU sees the player a little bit further away from the
basket.
There are two scenarios
for this. The more likely one is that manual observations lack the precision
when the 3 point line is not there to guide your estimation. The ugly
alternative would be that SportVU has some kind of ridge regression, pulling
shots that are likely 3 pointers towards the 3 point line. Let's not hope that
that's the case...
For a few shots I looked at, where manual
annotation and SportsVU strongly disagreed, the manual annotation was more
often right than not.
As examples where the manual one is
correct:
SportVU says 38.4 ft distance, manual 23 http://stats.nba.com/cvp.html?GameID=0021400714&GameEventID=157#
SportVU says 37.3 ft distance, manual 24 http://stats.nba.com/cvp.html?GameID=0021400946&GameEventID=197#
SportVU says 42.9 ft distance, manual
19 http://stats.nba.com/cvp.html?GameID=0021401154&GameEventID=323#
http://stats.nba.com/cvp.html?GameID=0021400108&GameEventID=085#
not a 31 footer
[1] "player ID:2564;
Scoring Player:Boris Diaw; ShotType: Pullup Jump shot; Touch time: 0.9; Shot
clock: 2.6; Dribbles: 0; SHOT_DISTANCE: 31.0; SHOT_DIST: 25.3; Distance
defender: 6.6; Game Clock: 2:23"
much closer manually
manual is right
[1] "player ID:2045;
Scoring Player:Hedo Turkoglu; ShotType: Jump Shot; Touch time: 0.0; Shot clock:
24.0; Dribbles: 0; SHOT_DISTANCE: 23.0; SHOT_DIST: 38.4; Distance defender:
21.0; Game Clock: 8:07"
[1]
http://stats.nba.com/cvp.html?GameID=0021400946&GameEventID=197#
[1] "player ID:201228;
Scoring Player:CJ Watson; ShotType: Jump Shot; Touch time: 4.4; Shot clock:
24.0; Dribbles: 0; SHOT_DISTANCE: 24.0; SHOT_DIST: 37.3; Distance defender:
7.4; Game Clock: 2:48"
[1]
http://stats.nba.com/cvp.html?GameID=0021401154&GameEventID=323#
[1] "player ID:202339;
Scoring Player:Eric Bledsoe; ShotType: Pullup Jump shot; Touch time: 5.6; Shot
clock: 18.3; Dribbles: 6; SHOT_DISTANCE: 19.0; SHOT_DIST: 42.9; Distance defender:
20.8; Game Clock: 6:04"
closer automatic
[1] "player ID:101112;
Scoring Player:Channing Frye; ShotType: Jump Shot; Touch time: 0.0; Shot clock:
15.8; Dribbles: 0; SHOT_DISTANCE: 27.0; SHOT_DIST: 19.0; Distance defender:
9.4; Game Clock: 5:36"
[1]
http://stats.nba.com/cvp.html?GameID=0021400301&GameEventID=391#
[1] "player ID:101139;
Scoring Player:CJ Miles; ShotType: Jump Shot; Touch time: 0.0; Shot clock:
24.0; Dribbles: 0; SHOT_DISTANCE: 26.0; SHOT_DIST: 14.8; Distance defender:
11.2; Game Clock: 8:12"
[1]
http://stats.nba.com/cvp.html?GameID=0021400394&GameEventID=162#
[1] "player ID:201583;
Scoring Player:Ryan Anderson; ShotType: Jump Shot; Touch time: 0.0; Shot clock:
24.0; Dribbles: 0; SHOT_DISTANCE: 26.0; SHOT_DIST: 12.1; Distance defender:
4.2; Game Clock: 8:04"
[1]
http://stats.nba.com/cvp.html?GameID=0021400606&GameEventID=499#
[1] "player ID:201163;
Scoring Player:Wilson Chandler; ShotType: Jump Shot; Touch time: 0.0; Shot
clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 27.0; SHOT_DIST: 14.9; Distance
defender: 11.3; Game Clock: 1:42"
[1]
http://stats.nba.com/cvp.html?GameID=0021400668&GameEventID=289#
[1] "player ID:203081;
Scoring Player:Damian Lillard; ShotType: Jump Shot; Touch time: 10.2; Shot
clock: 14.0; Dribbles: 12; SHOT_DISTANCE: 25.0; SHOT_DIST: 19.9; Distance
defender: 6.5; Game Clock: 8:44"
[1]
http://stats.nba.com/cvp.html?GameID=0021400694&GameEventID=281#
[1] "player ID:201155;
Scoring Player:Rodney Stuckey; ShotType: Jump Shot; Touch time: 2.8; Shot
clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 26.0; SHOT_DIST: 18.1; Distance
defender: 6.1; Game Clock: 7:06"
[1]
http://stats.nba.com/cvp.html?GameID=0021401137&GameEventID=207#
[1] "player ID:203496;
Scoring Player:Robert Covington; ShotType: Jump Bank Shot; Touch time: 5.6;
Shot clock: 17.7; Dribbles: 5; SHOT_DISTANCE: 25.0; SHOT_DIST: 17.1; Distance
defender: 4.1; Game Clock: 6:01"
[1]
http://stats.nba.com/cvp.html?GameID=0021401145&GameEventID=381#
[1] "player ID:204060;
Scoring Player:Joe Ingles; ShotType: Jump Shot; Touch time: 2.0; Shot clock:
5.8; Dribbles: 1; SHOT_DISTANCE: 25.0; SHOT_DIST: 17.7; Distance defender: 5.9;
Game Clock: 1:16"
[1]
http://stats.nba.com/cvp.html?GameID=0021401158&GameEventID=087#
[1] "player ID:203897;
Scoring Player:Zach LaVine; ShotType: Jump Shot; Touch time: 2.2; Shot clock:
10.1; Dribbles: 1; SHOT_DISTANCE: 25.0; SHOT_DIST: 17.0; Distance defender:
3.2; Game Clock: 2:27"
Distant dunks
Yes
[1] "player ID:203457;
Scoring Player:Nerlens Noel; ShotType: Dunk Shot; Touch time: 3.5; Shot clock:
21.6; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 23.9; Distance defender: 5.8;
Game Clock: 11:02"
Tough
[1]
http://stats.nba.com/cvp.html?GameID=0021400173&GameEventID=236#
[1] "player ID:203100;
Scoring Player:Tony Wroten; ShotType: Dunk Shot; Touch time: 0.0; Shot clock:
14.9; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.9; Distance defender: 0.5;
Game Clock: 2:56"
Yes
[1] "player ID:203482;
Scoring Player:Kelly Olynyk; ShotType: Driving Dunk Shot; Touch time: 0.9; Shot
clock: 18.7; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 23.8; Distance
defender: 4.2; Game Clock: 11:28"
Yes
[1] http://stats.nba.com/cvp.html?GameID=0021400391&GameEventID=071#
[1] "player ID:201599;
Scoring Player:DeAndre Jordan; ShotType: Alley Oop Dunk Shot; Touch time: 4.0;
Shot clock: 20.6; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.5; Distance
defender: 5.0; Game Clock: 5:50"
Tough
[1]
http://stats.nba.com/cvp.html?GameID=0021400406&GameEventID=143#
[1] "player ID:101123;
Scoring Player:Gerald Green; ShotType: Driving Dunk Shot; Touch time: 0.0; Shot
clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.0; Distance
defender: 3.7; Game Clock: 12:00"
[1]
http://stats.nba.com/cvp.html?GameID=0021400547&GameEventID=349#
[1] "player ID:203084;
Scoring Player:Harrison Barnes; ShotType: Alley Oop Dunk Shot; Touch time: 4.8;
Shot clock: 18.6; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 22.4; Distance
defender: 4.4; Game Clock: 4:34"
[1]
http://stats.nba.com/cvp.html?GameID=0021400691&GameEventID=189#
[1] "player ID:201148;
Scoring Player:Brandan Wright; ShotType: Dunk Shot; Touch time: 3.4; Shot
clock: 24.0; Dribbles: 1; SHOT_DISTANCE: 0.0; SHOT_DIST: 23.6; Distance
defender: 7.6; Game Clock: 6:22"
[1]
http://stats.nba.com/cvp.html?GameID=0021400742&GameEventID=049#
[1] "player ID:201945;
Scoring Player:Gerald Henderson; ShotType: Alley Oop Dunk Shot; Touch time: 0.0;
Shot clock: 9.3; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 25.0; Distance
defender: 8.4; Game Clock: 6:42"
[1]
http://stats.nba.com/cvp.html?GameID=0021401101&GameEventID=147#
[1] "player ID:202687;
Scoring Player:Bismack Biyombo; ShotType: Dunk Shot; Touch time: 0.6; Shot
clock: 9.1; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 24.6; Distance
defender: 10.1; Game Clock: 7:41"
http://stats.nba.com/cvp.html?GameID=0021400189&GameEventID=537#
walking the dog
takes after shot clock reset
misses the skip pass
shot clock stops
momentarily unclear position
shot clock reset
no idea
Alley Hoops
"player ID:203500; Scoring
Player:Steven Adams; ShotType: Alley Oop Dunk Shot; Touch time: 14.1; Shot
clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 0.0; SHOT_DIST: 5.6; Distance
defender: 3.9; Game Clock: 8:10"
"player ID:202685; Scoring
Player:Jonas Valanciunas; ShotType: Alley Oop Layup shot; Touch time: 9.8; Shot
clock: 24.0; Dribbles: 0; SHOT_DISTANCE: 1.0; SHOT_DIST: 1.5; Distance
defender: 0.8; Game Clock: 4:25"
Dribble Alley Hoops
[1] "player ID:203081;
Scoring Player:Damian Lillard; ShotType: Alley Oop Layup shot; Touch time:
11.3; Shot clock: 12.8; Dribbles: 9; SHOT_DISTANCE: 2.0; SHOT_DIST: 4.1;
Distance defender: 3.9; Game Clock: 11:49"
[1]
http://stats.nba.com/cvp.html?GameID=0021400500&GameEventID=415#
[1] "player ID:201163;
Scoring Player:Wilson Chandler; ShotType: Alley Oop Dunk Shot; Touch time: 7.8;
Shot clock: 16.9; Dribbles: 6; SHOT_DISTANCE: 0.0; SHOT_DIST: 3.1; Distance
defender: 1.9; Game Clock: 7:52"
[1] http://stats.nba.com/cvp.html?GameID=0021400909&GameEventID=406#
[1] "player ID:201566;
Scoring Player:Russell Westbrook; ShotType: Alley Oop Layup shot; Touch time:
12.2; Shot clock: 12.0; Dribbles: 10; SHOT_DISTANCE: 1.0; SHOT_DIST: 4.3;
Distance defender: 3.4; Game Clock: 6:59"