Jump to content

Stats DO lie


niremetal

  

13 members have voted

You do not have permission to vote in this poll, or see the poll results. Please sign in or register to vote in this poll.

Recommended Posts

I actually don't disagree with the substance of this post (i.e., that JJ shoots too many jumpers and doesn't get to the line enough). What I disagree with is the idea that "stats don't lie." I've seen lots of variations of that phrase around NBA blogs recently, and it's somewhat frustrating and somewhat amusing. I will now go off on a rant about why statistics DO lie (or rather, can be used in a highly misleading fashion).

One of my favorite quotes is this from Mark Twain's autobiography: "There are 3 kinds of lies: Lies, damned lies, and statistics." The idea behind it is that because most people are either unwilling or unable to examine statistics in depth and reflect on their significance, the tendency is to give statistics way more weight than they deserve. Thus, you can find statistics to bolster pretty much any argument, however weak or strong.

That is especially true in basketball, where the only statistics available are variations or refinements of traditional box score stats (scoring, rebounding, assists, steals, blocks, turnovers, and fouls), which themselves are a terrible measure of what matters in basketball. First off, why does a block inherently matter more than many other good defensive plays, such knocking a ball loose from a player or forcing an opponent to take a contested, low percentage shot? Like blocks, those are things that can lead to a change of possession, but do not necessarily do so. Other statistics are highly arbitrary - who gets "credit" for a turnover is my favorite example. I'd say at least a quarter of turnovers are "credited" to a player who was not, in fact, most responsible for his team losing possession of the ball. Similarly, steals are often credited to a player who picks up a loose ball, even if the ball was knocked loose by someone else; rebounds are credited to the player who ultimately gains possession of the ball, even if the ball simply fell into that players lap after it was tipped to him by a teammate.

What qualifies as an "assist" is perhaps the most arbitrary decision. How many dribbles does the receiving player have to take before it's not considered an assist by the passing player? And why doesn't a pass that leads to a teammate getting free throws qualify as an assist?

And of course, there are plenty of things that happen on the court that affect the box score statistics but that aren't directly factored into the equation. Shooting and passing stats don't reflect the amount of defensive attention players receive. I often point out that the people crowing about Josh's improved three point percentage relative to JJ don't take into account the fact that Josh's perimeter shots are nearly always wide open while JJ's are nearly always contested. Similarly, there is no stat that captures the effect of setting good screens to free a player up for an easy shot, making good passes that lead to assists (think of the 2-pass assist rule in hockey), or forcing a player into the teeth of the defense to help a teammate get a block or steal.

Considering how arbitrary and poor each individual statistic is, you can imagine how I feel about measures like PER, which basically just combine and assign arbitrary weights to a bunch of different box score statistics. Those "statistics" are a lie within a lie within a lie.

And THAT doesn't even get into how people can manipulate or misuse statistics. I've pretty much stopped bothering to point it out when someone around here posts statistics that just happen (purely coincidentally, I'm sure) not to include games or sets of games that would undermine their argument.

So yeah. Stats can, and often do, lie. In fact, there are probably more ways to lie using statistics than there are ways to lie using pretty much any other mode of persuasion. That's why I try hard not to crow about conclusions from any single statistic, or even sets of statistics.

So if statistics aren't to be believed, then how do you measure?

Link to comment
Share on other sites

  • Premium Member

So if statistics aren't to be believed, then how do you measure?

Why measure? Just watch the games, observe, and give your opinion on what you see. Basketball is not a simple game. There's no way to accurately reduce it to a few simple numbers.

This is a big reason why I've gotten more and more into tennis rather than basketball. There simply aren't many statistics to worry about. The only analyses that mean anything come from simply watching the matches and forming opinions based on what you see. No one would dare try to look at a "box score" or a "stat sheet" and say they know something important based on it. I find it refreshing.

Link to comment
Share on other sites

Why measure? Just watch the games, observe, and give your opinion on what you see. Basketball is not a simple game. There's no way to accurately reduce it to a few simple numbers.

This is a big reason why I've gotten more and more into tennis rather than basketball. There simply aren't many statistics to worry about. The only analyses that mean anything come from simply watching the matches and forming opinions based on what you see. No one would dare try to look at a "box score" or a "stat sheet" and say they know something important based on it. I find it refreshing.

I can understand your point. Stats don't tell the whole story, we both agree. But you can't say stats are useless.

Link to comment
Share on other sites

  • Moderators

The best example of this is our fantastic offensive efficiency from last year. We excelled at scoring against teams who didn't play good defense(which luckily is about 70% of the nba), but come playoff time that meant zip. This was evident during the season if you watched our 4th quarter futility against a lot of teams but it was really exposed in the playoffs.

Link to comment
Share on other sites

  • Premium Member

AWESOME post man.

Stats tell the story of the game. Stats will tell you what to expect from a player or a team or a coach. The game itself is not won by comparing statistics. The game is won by putting certain players together in a certain system. It's about resilience and attitude - and hustle more than anything else. Statistically, two equally matched teams would play perpetual overtime games. Realistically, it's all about getting it done.

Kobe Bryant has some of the best statistitics of all time. As does Phil Jackson. The Lakers would not have been in the championship game last year without either of them.

Ron Artest won the series.

Is he a "better" player than Kobe? More valuable than KG or Pierce? More reliable? Artest was the right player at the right time on the right team. On the flip side of that, what kind of team would WE be with Artest? My guess is much worse off because we don't have the same kind of environment to control him. He'd be nothing but a distraction and a failure for us.

Even though he's a solid, defensive SF that is a HIGHLY underrated scorer. On paper, he should be a 1000% improvement over Marvin. He should pair well with a defensive lineup of Hinrich, JJ, Smoove, and Al. But...the stats don't tell the whole story.

I hear you man. Stats are a guide, and they have their place. But there is SO much more to the game than that.

  • Like 2
Link to comment
Share on other sites

  • Premium Member

I can understand your point. Stats don't tell the whole story, we both agree. But you can't say stats are useless.

Useless? No. On the balance, do I think they've done more harm than good in affecting casual fans' understanding of the game? Absolutely.

Link to comment
Share on other sites

To me it comes down to predictive value. if I pick between trying to decide who is goign to win a game then I'll absolutely go to stats over someone's personal observations. There is no question that you can use stats to be misleading of you want to, but at least for me when I am looking to predict whats going to happen then I'll lean heavily on stats.

I have a friend who runs a website selling sports picks. I went over 80% on my picks against the spread this season and I used stats almost exclusively. (or more accurately I looked for trends where I thought that people who don't look at stats would be wrong) I didn't put out a ton of picks and waited for the very best spots but it worked for me. Maybe there are other people who have good predictive value without looking at stats but in my experience I just laugh at people who act like statistics don't matter.

Edited by spotatl
Link to comment
Share on other sites

I would never say that numbers lie. I respect the laws of Disciplines of Natural Science too much. I always say they never tell the whole story, sometimes a few chapters, sometimes only a sentencer two. The people compiling those statistics could possibly be errant, and thinking human beings will always be subjective when analyzing certain statistical data. I agree, some numbers are hard to believe, and easy to discount, like the Bulls having a higher point differential in their favor with Derrick Rose off the floor. But can you clearly throw a stat like that out of the window? There may be a film and statistic super-geek in their film room who has seen every single play and can attest to the "fact" that the Bulls indeed function optimally without the MVP frontrunner. Meanwhile, every member on this board could come up with reasons why that stat is skewed: opposing team's second team line-ups, Rose's minutes, pace with him off the floor, etc.

Stats are extremely useful, but I always use them to back up what I see and can analyze off the strength of my knowledge on the subject, never the other way around. I think most sensible people do the same in any arena where stats are depended upon and every single variable isn't perfectly exact.

Edited by benhillboy
Link to comment
Share on other sites

Why measure? Just watch the games, observe, and give your opinion on what you see. Basketball is not a simple game. There's no way to accurately reduce it to a few simple numbers.

What is the difference between a .300 hitter and a .270 hitter over the course of a month. Three hits? Please, without statistics one could not actually observe what truly occurs with magnitudes. Instead, one is setting inherent biases into play that put more weight on the aesthetically pleasing or emotional events that occur.

This is a big reason why I've gotten more and more into tennis rather than basketball. There simply aren't many statistics to worry about. The only analyses that mean anything come from simply watching the matches and forming opinions based on what you see. No one would dare try to look at a "box score" or a "stat sheet" and say they know something important based on it. I find it refreshing.

You must not be that familiar with First Serve Percentage and Placement. Or probability of winning a point conditional on winning the previous point within a game. Or probability of winning a game conditional on winning the previous within a set. Or second serve winning percentage. Or unforced errors from backhand and forehand. Or winners from backhand and forehand. Or net points won. But I guess when you listen to Pat McEnroe and Mary Fernandez these sort of things don't come up.

Your examples of "who gets the assist? who gets the turnover?" are silly in my opinion. Its a judgement call in each scenario to be made by the scorekeeper in each arena. To say that the data lie is to really say it was recorded with human bias. But we have human bias in just casual observation as well! So its no better off in your ideal scenario than in the statistical sense.

I have found in most cases that people who claim that stats lie they simply don't understand the stats. Notice that your wikipedia (oh the authority!) article is about "How to Lie with Statistics" and not "How Statistics Lie"? Its because its a foolish thing to say that an inhuman entity can take on a human trait. If you were to say that the statistician lies then that makes sense and we can examine that. But to say the stat is a lie is foolish, it may be wrong based on an algebra error or is biased in your scorekeeper examples but its not telling a lie. The stat reflects some deeper meaning of interpretation, one needs to know what the data are telling you and then how the stat is calculated. Sometimes these stats are calculated assuming that the data are normal and so the stat should be interpreted understanding that its assumed the data are normally distributed. Then one could have reasonable arguments about these assumptions but not about how the stat has turned out based on that assumption. But you aren't talking about these complicated regressions of assuming normality, you are referring to simple po-dunk blogger stats and throwing your arms up in the air saying "Lies! Damn Lies!" Well the blogger might be lieing, for instance they could find a positive covariance between field goals attempted and points scored then claim the more one shoots the less one scores. If you believe that reported then the lie sets in because that is NOT what the statistics are saying. This is a simple example and easy to see, but once these beasts of statistics like PER come out we can have the case of lies stemming from the reporter/blogger. But its not the statistics that are lieing, its that the reporter/blogger comes to the wrong conclusion and you believe them. If one doesn't understand the reasoning behind these statistics then it may seem like lies but its not from the statistic.

{I wonder if saying "false" is better than saying "lie" in this thread because lie would imply on knows what is true and avoids it directly or implicitly. It appears that sometimes reporters/bloggers simply don't understand the statistics going on, so because they don't know the truth they really can't be lieing, they are just flat out wrong.}

Link to comment
Share on other sites

  • Premium Member

First off, I think a few people (who know who they are, and who I suspect didn't actually take much time reading my post) are taking my post as saying "stats are useless," which is not what I said at all. Somehow, the thesis statement of my post totally escaped them:

I will now go off on a rant about why statistics DO lie (or rather, can be used in a highly misleading fashion).

Get it this time? Cool.

You must not be that familiar with First Serve Percentage and Placement. Or probability of winning a point conditional on winning the previous point within a game. Or probability of winning a game conditional on winning the previous within a set. Or second serve winning percentage. Or unforced errors from backhand and forehand. Or winners from backhand and forehand. Or net points won. But I guess when you listen to Pat McEnroe and Mary Fernandez these sort of things don't come up.

I don't pay much attention to the talking heads in tennis (except the hot woman who does the score reports on the Tennis Channel). I'm familiar with all the stats you mentioned. Jon Wertheim goes off on tangents about the need to incorporate stats (especially ones like how winning points/games affects later points/games, etc) into more tennis analyses. But the funny thing is that stat sheets showing the top 10 players on tour in every tennis stat category I've ever seen usually shows at least 3-4 guys ranked outside the top 20 and omits at least 6-7 players in the top 10.

Even in tennis, stats have their place. Andy Roddick usually has a very reliable first serve. When he's getting less than 60% of his first serves in play, it's usually a canary in the mine for him. Federer likes to take the ball on the rise. When his return hit points are mostly behind the baseline, he might be vulnerable. You can use it to tell you things players do well, less well, have more margin for error, etc. But the crux of why a player won or lost a match can almost never be gleaned from looking at a match stat summary, even some of the more in-depth ones that I've seen.

Baseball is, I think, a sport that can be reduced to stats remarkably well. In basketball, I think they are far less useful. In tennis, they are less useful still. I'm guessing that in soccer, they are borderline-useless, although I honestly don't know enough about soccer to be sure. But in sports, politics, and life generally, I have become a thoroughgoing cynic of the use of statistics as a tool of persuasion and analysis because of the tendency of people to misunderstand, misrepresent, or misreport statistics.

Sports have both tangible and intangible - or scientific and artistic, if you will - components. Even many of the scientific/tangible components are impossible to summarize, much less reduce to numbers. So while stats always have their place in sports as in other spheres of life, I think people tend to overrate and overstate their importance. I increasingly enjoy watching sports where the talking heads don't try to reduce everything to a few easy-to-remember numbers, and let the game speak for itself. I don't think there's anything wrong with that.

{I wonder if saying "false" is better than saying "lie" in this thread because lie would imply on knows what is true and avoids it directly or implicitly. It appears that sometimes reporters/bloggers simply don't understand the statistics going on, so because they don't know the truth they really can't be lieing, they are just flat out wrong.}

Using "lie" was simply a rhetorical device on my part. People were saying "stats don't lie" and "numbers don't lie." It would have been more accurate for me to say "stats can be used in a misleading fashion" (oh wait... I did say that!), but that doesn't quite have the same ring to it. Better to use their own word and toss it back at them, while providing (as I did) a more accurate explanation of my point. Blog marketing.

But hey, while we're at it, I'll create my own statistic, which I will call LIEs (Lame and Inaccurate Elucidations). I will define LIE sufficiently broadly such that it includes both statements that are objectively false and also statements that are misleading. I also will deem that LIE need not be written in a case-sensitive fashion (so "lie" or LiE" would also be accurate). It can also be used as a verb - to make a lame and inaccurate statement is to LIE. Since I invented this statistic, I get to be the final arbiter over whether a particular statement is a LIE. Thus, my statement that stats lie is accurate. And I declare that hawkfanatic's post contained no less than 263 liEs. :snowballfight:

Edited by niremetal
Link to comment
Share on other sites

First off, I think a few people (who know who they are, and who I suspect didn't actually take much time reading my post) are taking my post as saying "stats are useless," which is not what I said at all. Somehow, the thesis statement of my post totally escaped them:

You claim that, yet your following argument has embedded inside it all the implications of the statistics lieing and not the statistician (really the reporter/blogger). Its akin to me saying "now I won't call niremetal stupid" to then later on say "now that niremetal is just dumb as bricks, ya hear?" One could respond "hey HF, you just made an argument that niremetal is stupid!" and I could easily point back up to my first comment and say "hey look, there is a spot where I previously said niremetal was not stupid so therefore my second comment about niremetal clearly shows that I am not calling him stupid!"

{Of course that example is made absurd, I don't actually view niremetal as stupid}

I don't pay much attention to the talking heads in tennis (except the hot woman who does the score reports on the Tennis Channel). I'm familiar with all the stats you mentioned. Jon Wertheim goes off on tangents about the need to incorporate stats (especially ones like how winning points/games affects later points/games, etc) into more tennis analyses. But the funny thing is that stat sheets showing the top 10 players on tour in every tennis stat category I've ever seen usually shows at least 3-4 guys ranked outside the top 20 and omits at least 6-7 players in the top 10.

This seems to be more of a problem with aggregation than with the stats doing something funny. But also one needs to not jump to the conclusion that because one variable is a good predictor, it is not the best and it does not imply it should hold in every case. There are ceteris paribus conditions that need to be imposed when looking at determining a complex statistic like ranking (not how it is structured by the ATP in terms of points, but in terms of what individual level data make up the accumulation of those points). The ATP plays over a wide array of court surfaces, and those play a crucial role in the style of play for particular individuals. It also plays a huge role in which tournaments are entered. I don't have whatever statistics that you are referring to, but my guess is there are some outside the top 20 ranked players who are in the top 10 of average 1st serve speed. My guess is that those outside the top 20 who make it in that are dominant grass and fast hard court surface players. So they don't enter or perform as well on the clay and slower hard courts where average 1st serve speed is much less important. However, if the match series # is the same then those two tournaments are weighted equally. But in the ATP, there is not an equal distribution of Grass - Hard (fast/slow) - Clay which makes the ATP ranking a biased statistic. So one needs to take that into consideration when observing these "Top 10" lists.

Sports have both tangible and intangible - or scientific and artistic, if you will - components. Even many of the scientific/tangible components are impossible to summarize, much less reduce to numbers. So while stats always have their place in sports as in other spheres of life, I think people tend to overrate and overstate their importance. I increasingly enjoy watching sports where the talking heads don't try to reduce everything to a few easy-to-remember numbers, and let the game speak for itself. I don't think there's anything wrong with that.

I do not think being a cynic of the way certain people purport statistics is a justified approach to statistics. You are rejecting an idea before it is being presented to you. I believe a skeptic approach is much more valuable, as you do not accept an idea but you also do not reject it. From my observations though, typically the main reasons why people reject a statistic are:

  1. They went to George Mason, took a class by Russ Roberts on "how math lies" and then conclude that every time we see math we should shout lies!
  2. They do not understand what the statistic is representing so they instinctively shout lies!
  3. The most rare, the person actually understands what is going on and can make the connection that the assumptions aren't justified or the conclusion isn't justified.

Using "lie" was simply a rhetorical device on my part. People were saying "stats don't lie" and "numbers don't lie." It would have been more accurate for me to say "stats can be used in a misleading fashion" (oh wait... I did say that!), but that doesn't quite have the same ring to it. Better to use their own word and toss it back at them, while providing (as I did) a more accurate explanation of my point. Blog marketing.

But hey, while we're at it, I'll create my own statistic, which I will call LIEs (Lame and Inaccurate Elucidations). I will define LIE sufficiently broadly such that it includes both statements that are objectively false and also statements that are misleading. I also will deem that LIE need not be written in a case-sensitive fashion (so "lie" or LiE" would also be accurate). It can also be used as a verb - to make a lame and inaccurate statement is to LIE. Thus, my statement that stats lie is accurate. And I declare that hawkfanatic's post contained no less than 263 LiEs.

That seems about right, I see bloggers do that kind of stuff all the time and it seems to work. I certainly can't refute your definition of LIE now, but I certainly object to having anywhere near 263 lies in my post!

Link to comment
Share on other sites

Why measure? Just watch the games, observe, and give your opinion on what you see. Basketball is not a simple game. There's no way to accurately reduce it to a few simple numbers.

This is a big reason why I've gotten more and more into tennis rather than basketball. There simply aren't many statistics to worry about. The only analyses that mean anything come from simply watching the matches and forming opinions based on what you see. No one would dare try to look at a "box score" or a "stat sheet" and say they know something important based on it. I find it refreshing.

I understand the tennis thing. My favorite sport to participate in growing up was wrestling. Why? Because win or lose, there was very little in doubt. I either did good or not good enough. No one could take credit for my win or loss but me. Tennis is very much the same thing.

My least favorite growing up was football. Two out of three years I was MVP of my little league team. Over that stretch of time the team was 6 and 21. So even though my personal stats were awesome, we never even smelled the playoffs.

But I assume for sake of this thread we are talking about basketball. For me, the most misleading stat is Points Scored. For my example I'll use Scottie Pippen. In Jordan's last year before his first retirement, Pippen averaged 19.6 PPG. The next year without Jordan, Pippen scored 22.0 PPG. The difference was 1.5 more shots per game taken and a greater leeway to shoot the 3 (he shot 2.5 a game that year...less than 1 a game the previous year) and shot the 3 with a slightly higher percentage. Pippen was not a markedly better player that year. He was just utilized 10% more.

So too it is with the Hawks. Joe is not the leading scorer because he's better. FG% would speak otherwise. He's the leading scorer because he takes more shots a game than anyone else. This is evidenced by the increase of points scored by Josh Smith in the second half of the season. Josh's shots per game went up...his FG% was mostly unchanged...he scored more.

So PPG is in my opinion a system stat except in a few small exceptions. But RPG/APG/SPG are a constant by position. On average, a typical power forward gets a similar number of chances for a rebound based solely on his position on the floor. A point guard a similar chance at Assists....etc. Those 3 stats are very telling within a position. For example, many centers have very low points and turnover numbers. This is due to them not having the ball in their hands often due to skillset. Other Centers (Al or Dwight for examples) have point guard level turnovers but PPG and rebounds to match. This is because they are trusted based on their skillset to post up and pass out of the double team.

Stats tell an awful lot, be it in small samples or large samples. My favorite use of stats are to tell if an opposing player is playing hurt. If a consistent Blocks/Steals guy (ie Josh) has a marked fall off in production from an obvious game time line....it would be very telling he has a nagging leg injury. 100% accurate, of course not, but over the course of time stats tend to give a very accurate account of ability and/or hustle.

Link to comment
Share on other sites

The main thing is that statistics are a tool, not an omen. There really are no "bad" statistics. There are bad uses of them.

I think Hollinger et al have done advanced metrics in sports a disservice because they've made it into a sort of a shtick. But if you look at more serious stats people, the statistics and their measures are hardly arbitrary. Some of the more advanced metrics, like adjusted +/- and shares of win, are not even directly dependent on specific box score measure.

Granted, no single statistic will ever capture the entire picture. But I still like David Berri's stuff a lot more than I like most of the NBA talking heads around. Berri et al are much better analysts than Jalen Rose, JA Adande, Bill Simmons and the like.

In fact, the true hoop blog every year posts playoff predictions by the stats folks, and espn does the same as well for the "regular" analysts. I bet that the stats folks will do much better in their predictions than the regular folks.

Link to comment
Share on other sites

When Josh Smith attempts a 3 pointer, Hawks fans cringe. Do Laker fans get aggravated when Kobe attempts a 3pt shot?

Based on this season's "stats", they should scream "NO" when Kobe pulls up at the 3pt line.

Kobe

3Pt% .323

Josh

3Pt% .331

Stats may not lie, but they can't possibly tell the entire story or the history.

Link to comment
Share on other sites

When Josh Smith attempts a 3 pointer, Hawks fans cringe. Do Laker fans get aggravated when Kobe attempts a 3pt shot?

Based on this season's "stats", they should scream "NO" when Kobe pulls up at the 3pt line.

Kobe

3Pt% .323

Josh

3Pt% .331

Stats may not lie, but they can't possibly tell the entire story or the history.

See, this is where I scratch my head.

Does this stat definitively stat that Josh Smith is a better 3 point shooter than Kobe? Of course not. But it's those that discount the stat as not telling that are just as frustrating.

When Josh takes the ball to the rim, and shoots 50%, he also draws fouls a certain percentage of the time. His EFG on this stat is 50% on 3's....but when including his foul shots drawn and made he's more like 62% on drives to the basket. He'll never draw a foul on 3's. So I completely understand the argument of Josh is better at driving to the basket (ie more efficient) and therefore should choose that over 3's. But if Josh is an efficient 3 point shooter on it's own (ie 33% or greater), it forces his man to step out on him some and changes the dynamic of the offensive set. This year, because of his improved jumper, the floor has spaced more. the same can be said for Al. So yes, this does have an impact on the floor. I don't think it should be eliminated from his shot rotation at all, but the problem this year has been that with that space opened, our softs have not taken advantage of the lanes provided and gotten to the glass. So any positive is lost. Additionally, this creates a situtation where the ball returns to Josh in the back half of the clock when time is running down.

I was looking at 82games.com and a large number of Josh's Shots are still at the end of the clock (46% in the last 8 seconds...22% in the last 4).

Sometimes you have to dig very deep to see if something (like shooting from the outside) is a problem or just a symptom. IMHO it's a symptom of Joe holding the ball and throwing off the timing of cuts and set plays. Fact is, when Joe is posting up an opposing guard and Al is under the basket, the other 3 players have to space and that is what is pushing Josh/Marvin out to the 3 point line.

From casual observation and wanting stats to back it up (not sure how), I see our offensive sets usually being more effective when we run pick and roll/pop at the elbow and or our 2 bigs posting up off circle and making choices. Not sure if there's a "player x initiated the play" stat.

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...