Today's post is a long one, but it's one that I thought needed explanation. I ask that you read the entire thing and try to keep in mind the strengths and flaws.
Everyday Statistics and Their Flaws
Stats We're Talking About: Errors, Fielding %, and the Eye Test
So this should be fun.
Defensive statistics are easily the most controversial points of debate in the statistical community, but before we get to that, let's talk about what we have been using. The major point of focus is usually errors. I don't think we need to spend a lot of time talking about the issues with errors. We've all noticed points where we've argued with the official scorer over a call, and there are always the times when the players themselves argue. For a majority of the time, errors are pretty straightforward, but there is a significant portion of time when it's hard to tell whether or not a play should have been made. I think most of us can agree that errors can be somewhat iffy.
So the question becomes why you would want something based on such an iffy call. Fielding percentage is simply errors over the number of plays made plus the number of errors. Fielding percentage does tell us something - essentially, how good the player is at making the plays he gets to - but it makes no attempt to answer the question of what plays should a player make. It doesn't because it misses the main point of defense - to turn all balls in play, not just the ones you get to, into outs.
To turn plays into outs, the scouts have it right. You need range, good hands, an arm, and arm accuracy. Range is largest component because most MLB players will make the plays they get to. They just have to get there. Errors and fielding percentage don't measure range, and they can even penalize players with good range because those players will make errors on plays other guys won't even get to - think of the David Wright-Chipper Jones Gold Glove debate from a few years ago. The result is that the play wasn't made, but one guy gets penalized by errors while the other guy gets to act like the play never existed.
Scouts have been writing scouting reports using those defensive attributes we mention for a long time, but the issue becomes the infamous "eye test". Scouts are actually pretty good, but they have a lot of experience doing it, they see the entire field, and they see a lot of different players each season. But even with that experience, people are prone to treat other people differently - even if it's not intentional - and see people at different points of the season, leaving sample size issues. Good scouts can differentiate between the good and the bad, but it's still always good to have multiple views of a player.
When fans use the eye test, they often forget what would make an eye test valuable - experience, seeing the field, and having points of comparison. This isn't to say that fans are idiots. Most of us have spent years watching the game, and we know what good defense is supposed to look like. But here's the thing - major league players are good at defense. No matter who they are, they are better than you at defense. Adam Dunn could play shortstop better than you. Here are some common misconceptions and why they're misconceptions:
- "I watch a lot of Braves games ..." - I bet you do, but even if you do, you only see 6-19 games of a certain opposing player. And American League players are seen even less. For NL East players, I'm guessing you can rank defensive players pretty well. The rest of baseball, though? I'm more skeptical, even of myself. You just don't get a great point of comparison, and being "good" is relative to your peers.
- "I've been watching for years ..." - Again, I bet you have. But memories tend to fade, and you tend to remember things you want to remember - good or bad. You'll have to forgive me for not just taking your word for it.
- "I know what good defense is ..." - Aaaagain, I bet you do. But again, we're differentiating between the best in the world. People often talk about the shift in offensive environments, but they tend to forget that defensive players are probably better now than they ever were. The defensive talent level fluctuates position-by-position just like anything else. Chances are that you can't detect those differences.
- "Player A would have made that play." - Just stop. Chances are that you're talking out of the place where the sun don't shine. For one, you didn't see where the player started because the TV just doesn't show that, and for two, I doubt you can remember a batted ball speed and exact placement for each player. Sure, you have a decent idea, and guys like Andrelton Simmons make it more obvious because they are simply the extreme. But you really can't say this with any sort of certainty.
The real issue with defense is just that it's hard for everyone. It forces us to get hypothetical and play the "what if" game. If we're going to play that game, we need hard evidence. Conjecture gets us nowhere.
Nuanced Stats and Why
Stats We're Talking About: UZR, DRS, fRAA and dWAR
I promise I'll get to the weaknesses, but for now, let's focus on what these stats do well. The actual mathematical calculations are pretty complicated - shocker, I know - but let's get the general idea.
DRS and UZR are pretty similar. They split up the field into zones, similar to this. Baseball Info Solutions (BIS) is the company that records where the ball is hit (the zone) and how hard it's hit (flyball, fliner, liner, groundball, and popup with added differentiations of slow, medium, and hard). Human scorers will watch every game from a pressbox or on TV, and they will record every play, noting where the ball is hit and how hard. This is how we get the batted ball data we have, and it gives each system the basis for their calculations.
What each system does with that data is how they differ. DRS will first focus on how many plays a player makes above the average. So what DRS will do is measure how many plays the average player will make in a given zone on a given batted ball type. The field is also marked into zones for each position so a third baseman isn't penalized for a ball into the right-center gap. Essentially, they have an area of the field a position is responsible for, and they are not penalized for someone else making the play. They are credited when they make a play, and they are penalized when a ball comes into their zone that they couldn't make the play on.
They average each zone up, and based on the data from the scorers, they will run their calculations. Depending on where the ball is hit, run values are then assigned. A play on a ball in the gap that only 15% of CF would make is going to get the player who actually catches it a lot of credit, while it won't penalize him much for not getting to it. It's a fairly common sense approach. Guys get credit for spectacular plays, some for normal plays, and are docked depending on how hard the system thinks it was to get to the ball.
The initial calculation for plays above average is fairly straightforward. If 24% of players make a certain play and Simmons gets to it, he gets .76 - or 1 minus .24 - plays above average, and if he doesn't get it, he loses .24 points. After that, a run value is tacked on in order to add it into total WAR. All of his plays are added up, and that's how you get his run value for that season.
UZR isn't as straightforward. It attempts to break the calculations down into range runs, error runs, and fielding arm runs. The calculations that go into make my head hurt, but the goal is ultimately the same - look at how many players make a play on a given ball in a given spot and then compare. Once you've compared, you add up the run values on all the plays, and you get your answer. UZR does use the same zone data from BIS that DRS does, so it's based off the same data. UZR does make park adjustments, but I'm not sure if DRS does. But essentially, they use the same data but a different methodology.
fRAA, however, is completely different. This one is less commonly used, but it is Baseball Prospectus' version. Instead of using zone data, it uses base-out states. What are "base-out states"? *deep breath* Base-out states are basically exactly what they sound like - who is on base and how many outs are there. Using historical data, we have a run expectancy based on each base-out state. Here is the table for the run expectancies. For example, with 0 outs and a man on third, we would expect 1.433 runs to score on average. While that sounds odd because there is only one man on base, the man at the plate can score, and following players in the inning can score as well. The run expectancy is simply how many runs we can expect in this inning from this point forward.
When an out is made, we can get the run value of a certain play. Let's say Jason Heyward leads off with a walk. With a man on 1st and 0 outs, we expect .941 runs on average. Next up, Fredi has Justin Upton sacrifice Heyward to 2nd, and it's successful. The base-out state is now 1 out and a man on second, and from the table, you can see the run expectancy is now .721 runs on average. That play cost the team (.941 - .721) .220 runs. When people say sacrifice bunts cost the team runs, this is what they mean. Basically, holding on to outs is really good, even if it means sacrificing the possibility of moving a player up.
Okay, so that's base-out states, and I hope you grasped it because it probably deserves its own post. Anyway, fRAA uses these base-out states and run expectancies. The next thing they do is use those run values and give credit/penalize defenders based on whether or not they make the play, and they make other adjustments for pitcher batted ball tendencies, batter handedness, and park adjustments. Again, it's complicated, and people are more comfortable with the zone methodology than with this, which is why you see DRS and UZR used more often (also because BPro's site tends to be harder to use to find stats compared to FanGraphs and Baseball-Reference; amazing how usability of a site affects the use of certain statistics).
Basically, defensive metrics try to use actual information, historical data, and run values to give defensive players a defensive value, and they focus on plays made and not made instead of plays made and errors. DRS and UZR utilize BIS data and zones to assign defensive value, and they are the most commonly used. While there is some subjectivity involved because of the human scorers and the vague terms of flyball, medium, fliner, etc., it's less subjective than "He would have made that play." Again, we're looking for improvement while hoping for perfection. It's not perfect, but it is better than people making conjectures based on small sample sizes and a TV screen.
As for the "fluctuations" of defensive stats, I think it's a bit overblown. One, offensive stats have fluctuation, too. It's called BABIP, HR rates, and so on, and if you actually look, offensive run values fluctuate a lot, too. Defense slumps just like anything else, and while defensive stats will fluctuate, it's also likely that a guy could have been hot/cold for a period of time - hello, Justin Upton in left field - and along with the small sample size, things might get wonky occasionally. Personally, I think people take the weakness and run with it without really considering what's going on, but I'll admit that there are some methodological issues.
The final piece of the puzzle is positional adjustment. We've gone over this before, but this is an adjustment based on the difficulty of playing a position. The numbers here are based on comparing how players move around defensively. A guy like Martin Prado would be a good example as he's played a lot of different positions. We can use him along with other players and compare how they play at each position. From there, people ran some calculations to compare difficulty, and those adjustments are what we get. To get the adjustment for each player, you have to prorate the innings played along with the position played. Some guys are easy because they only play one position, but you have to be more careful with guys who play multiple positions. Those run values are per 600 PA (basically a season and in PA to give you an idea of the amount of time), so actual run values can differ based on playing time. There is a point of contention about positional adjustment and if these are the "right" numbers, but they've been around for a few years without being changed. Either way, it's more of a guideline than strict fact.
What's Left to Accomplish
What's really left to improve defensive stats will be FIELD f/x info. We need it desperately. For anyone who doesn't know, FIELD f/x is a set of cameras set up in stadiums, and they track where players start, where balls are hit, how hard, and if they player makes that play. It's completely objective information, and it's valuable information. It resolves the positioning issue because it tracks where players start. It resolves the batted ball type problem by giving us a time period from bat-to-ground. The initial position and time will give us the answer to range. And we can calculate arm strength by looking at the point of the throw and how long it takes to get to a given point. FIELD f/x could be a gold mine. It's a possibility for absolute facts in a world in which we need them. But it's not public information, and no one other than teams who pay for it will have the information for now.
Until then, we're left to deal with the issues of these metrics. The first is sample size. Everyone wants a seasonal value for statistics and production, but they rarely realize that 162 games is an arbitrary end point. There's no statistical significance to 162 games. Some stats stabilize earlier than that, and some, like defense, stabilize after. If you demand a season's run value of a stat that doesn't stabilize within a season, you're going to have issues. That's why people tell you to look at three years worth of data, and why people still aren't sure if Freddie Freeman is a good defender - yes, I know I just started a fight, and yes, it was going to happen anyway. There's only so much you can do with a season's worth of defensive data. There's nothing anyone can do about it. But it is an issue.
My biggest beef with zone metrics, however, is the lack of noting a player's position at the beginning of a play. From a TV standpoint, there's nothing you can do about it. You get the camera angle you get. Being at a game can give you the starting position. At that point, you can combine the batted ball type, speed, etc. to evaluate a player's defense, but none of the metrics track this important piece of information. Until then, range is uncertain, and we can't answer the Derek Jeter Question - do players really position themselves better than others, which would be good to know. Again, this really isn't a fault of defensive metrics as much as just something they can't do much about. But it is a problem.
Zones also have the added issue of not differentiating between one part of the zone and another. There's a difference between the edges of a zone, and sometimes, that is the difference between making a play and not making a play. The problem is that the zone metric has to draw a line somewhere, and smaller zones will increase the human error of judging which box it goes in. Again, double-edged sword.
The next issue is the use of batted ball types. Again, the terms are vague. What's the real difference between a liner, fliner, and flyball? It's not as easy as it looks. And what's the difference between hard, medium, and soft? Hard and soft are easy to differentiate, but medium and hard aren't always. We'd prefer the objective information of exactly where a ball was hit, how hard it was hit, and how long it took the ball to reach a point from the bat. But alas, we simply don't have that information until FIELD f/x is released.
The final issue is the difference between systems and how they view the same player. With pitching and hitting, the systems largely agree, though they may order the players differently. Defensive metrics sometimes vary by over 10 runs on a certain player, and that's a little troublesome. That can swing their overall WAR significantly, and that's why you have to break WAR down into the components. Why does this happen? Sample size issue and the accrued amount of human error in the initial observations are probably good starting points. Until that gets straightened out, the best thing to do is look at a combination of metrics and try to find patterns.
- Errors and fielding percentage answer a specific question - how good a player is at making an out once he gets to ball - but they leave out a lot about range, which is crucial.
- DRS and UZR break the field into zones and assign run values to plays based on the zone, batted ball, and if the play was made, and they compare each player to his peers.
- They're best used over a period of a few years, not one year. If forced to use one-year samples, use multiple stats and understand that talent fluctuates on defense, too.
- Defensive stats get a bad rap, but they are based on better information than errors or your gut. It's likely that you're baseball smart, but defense requires more conjecture than anything else. Don't completely ditch your eyeballs, but beware of what you think you know.