Projections are one of my favorite parts of baseball analysis. There’s something that borders on magical about drawing lines in the sand about the occurrence of hundreds of future events, and then seeing those events happen (or not). While the release of projections is always an exciting time that leads to a lot of interesting discussion, a question that always comes up is, “Okay, but how accurate are these?” Unfortunately, the answer to that question is way more elusive than coverage of the projections themselves. So, for the second year, I’m trying to rectify that in the corner of Braves baseball, through a brief projections retrospective.
(As a side note, it’s worth noting that in most cases, projections are just point estimates of a wide distribution of possible outcomes, each with a different but non-zero likelihood. As a result, assessing the “point estimate” of a projection system for a given player can conceptually be a bit dicey. In other words, if a projection system says that there’s an 80 percent chance that a player puts up 2 WAR, but a 20 percent chance he puts up 0 WAR, and he ends up putting up 0 WAR, the projection system wasn’t wrong for projecting him for a point estimate of 1.6 WAR. It’s really just an issue with the fact that projection systems are meant to reflect the likelihood of outcomes occurring over hundreds or thousands of trial runs, but in reality, there’s only one season that actually ends up occurring. It’s not a problem that can really be solved. But that doesn’t mean it isn’t useful to understand the successes and limitations of projection systems as a whole.)
For reference, you may want to be familiar with the following before perusing the below:
- Last year’s projections retrospective;
- Last year’s projections summary for catchers;
- Last year’s projections summary for infielders; and
- Last year’s projections summary for outfielders.
The main summary, in case you are not too interested in the details, is the same as last year: the projection systems are very good at predicting player performance, especially in aggregate. While there will be underestimates and overestimates aplenty, I’m not so sure that they really occur in a systematic way, or that even the point estimates of projections shouldn’t be taken seriously.
This retrospective focuses on three different aspects of position player production:
- wRC+ and batting runs. These are your measures of hitting value. wRC+ is an indexed rate stat: it tells you how much more or less a hitter produced relative to league average, such that a 105 wRC+ means a hitter’s outcomes were five percent better than average, and a 95 wRC+ means a hitter’s outcomes were five percent worse than average. Batting runs are essentially just wRC+ converted into a counting stat. The shorthand measure is that each point of wRC+ is worth 0.75 runs over 600 plate appearances. The reason why we focus on batting runs rather than solely wRC+ is because players got different levels of playing time in 2017, and it’s “costlier” for a projection to be wrong about a player who got 500 PAs than one who only got 50 PAs. This same logic also applies to using other “rate value” stats below.
- Def and Def/600. These are your measures of defensive value, after taking the positional adjustment into account. The only difference between them is that Def pro-rates (scales) the player’s actual defensive value out to 600 plate appearances, whereas Def more directly answers, “for the playing time this player got, what was his actual defensive value?”
- WAR and WAR/600. Same principle as Def, but a catchall stat that agglomerates the difference aspects of player value, including batting, defense, and baserunning.
Note: For the catchers presented in this retrospective, the relevant stats are scaled to 450 PAs rather than 600, to reflect the assumptions that informed the original preseason projections for these players.
A complete summary of the projections and the actual performance of the relevant Atlanta Braves positional players for the 2017 season is below. You can also access it in Google Sheets here, in case you’d like to do anything else with the figures. (Be advised that you should make your own copy if you’d like to edit it, as this version does not allow editing.)
Also in that Google Sheet is a summary of the differences between what actually happened, and the point estimate of what the projection systems said would happen. It looks like this:
Note that the rows for Anthony Recker and Micah Johnson are grayshaded - these players got very few PAs in 2017, and as a result, they are not included from the various metrics discussed below to avoid biasing the results.
A key thing to note is that in the “differences” table above, all of the differences are estimated over the number of PAs the player actually got in 2017. For example, Adonis Garcia was projected by Steamer, ZiPS, and IWAG to produce 0.6, 0.2, and 0.7 fWAR over 600 PAs, respectively. In reality, he produced -0.4 fWAR, but that was over just 183 PAs. So, the difference between, say, Steamer and reality isn’t 1.0 fWAR (0.6 minus -0.4), but the Steamer WAR/600 projection pro-rated to 183 PAs, or 0.2 fWAR less the -0.4 fWAR that Garcia actually accumulated. This is why you see 0.6 fWAR in the “WAR” column for Steamer in the second table.
Another way of assessing the differences between projection and reality, not illustrated above, is to actually take the difference between each player’s rate of a given stat over 600 PAs, and compare that to the projected rate. This calculus is easily done using the Google Sheet, if anyone were so inclined to do it, but there’s nothing really “hidden” by focusing on production over the actual PAs that these players got.
The players listed above got 85 percent of the non-pitcher PAs for the Braves last year. The others were largely taken up by players either not really expected to contribute in 2017, or otherwise not on the roster: Matt Adams, Lane Adams, Johan Camargo, and Danny Santana.
Below, I quickly discuss each of the three metrics reviewed, and the performance of the projection systems for these metrics.
Batting Runs (i.e., hitting)
- Average error: Steamer = 2.4 run underestimate; ZiPS = 2.4 run understimate; IWAG = 2.7 run overestimate.
- Average error, absolute value (prevents under- and overestimates from cancelling each other out): Steamer = 8.9 runs, or ~4.5 runs in either direction; ZiPS = 8.6 runs, or 4.3 runs in either direction; IWAG = 8.8 runs, or 4.4 runs in either direction
- Root Mean Square Error (a way of capturing the effectiveness of the prediction beyond just simple averages, which more heavily penalizes bad predictions): Steamer = 10.7 runs; ZiPS = 10.5 runs; IWAG = 11.4 runs.
In other words, the projection systems are hardly far off for offensive performance, especially in aggregate. IWAG continues to be way more optimistic than Steamer or ZiPS, but in terms of average magnitude of deviation, it isn’t really “more off,” at least not for Braves players. This is consistent with the retrospective for the 2016 season.
Note that this exercise uses a formula to convert wRC+ to batting runs, and did not use the actual batting runs projected by the systems as reported on the Fangraphs player pages, due to the author failing to record those for posterity when they occurred. As a result, there’s a bit more error added to the batting run projections than there should be, but we’re talking an error of a partial run relative to an average of error of only a couple of runs.
- Closest projections: Steamer = 4; ZiPS = 4; IWAG = 6.
- Best projection: IWAG and Brandon Phillips, as the veteran was projected for, and hit for, a 93 wRC+. Steamer (95 wRC+) also came very close to Nick Markakis’ actual production (96 wRC+).
- Worst projection: IWAG and Dansby Swanson. IWAG projected Swanson for a 108 wRC+, and instead he put up only a fraction of that, at 66. The need to avoid this is actually what’s driven major adjustments I’ve made to IWAG since its last iteration. Meanwhile, Steamer and ZiPS also whiffed, but their estimates for Swanson were 89 and 92 wRC+, respectively, which were much closer. By contrast, Steamer and ZiPS were lower on Kurt Suzuki 78 wRC+, while IWAG had him at a marginally higher 83 wRC+; Suzuki, of course, busted out with a remarkable 129 wRC+ across 309 PAs.
- Markakis was once again, collectively, the player with the most spot-on hitting projections, followed by other veterans (Matt Kemp, Ender Inciarte, and Brandon Phillips). Meanwhile, the aforementioned Suzuki had the most unexpected breakout, while Sean Rodriguez’ injury-derailed season featured the second-biggest collective deviation. Rodriguez was projected to be an average-ish bat, but ended up with a miserable 55 wRC+ over 153 PAs.
As usual, if you are curious about any other summary statistics, let me know and they can easily be calculated. Or, just grab the Google Sheets link and try it yourself!
Def (i.e., defense)
- Average error: Steamer = 0.3 run underestimate; ZiPS = 0.2 run overestimate; IWAG = 0.0 runs.
- Average error, absolute value (prevents under- and overestimates from cancelling each other out): Steamer = 2.1 runs, or ~1.1 runs in either direction; ZiPS = 2.4 runs, or 1.2 runs in either direction; IWAG = 2.9 runs, or 1.5 runs in either direction
- Root Mean Square Error (a way of capturing the effectiveness of the prediction beyond just simple averages, which more heavily penalizes bad predictions): Steamer = 2.9 runs; ZiPS = 3.1 runs; IWAG = 4.3 runs.
Def values are, in some ways, biased towards projections, because the positional adjustment is a huge chunk of it for most players, and that tends to be pretty static as it doesn’t depend on player performance. Therefore, analyzing Def isn’t super-interesting. Really, the only big whiff here was Inciarte’s defensive value across all three projection systems. Inciarte, as you may know, had a very down defensive season by UZR and DRS, driven by a serious decline in his arm metric (by DRS), and a joint decline in arm and range metrics (UZR). The only other miss even remotely notable was IWAG being a bit too high on Swanson’s defense; the others were all within five runs of Def, and often much smaller than that.
Here we go, this is probably what folks care about.
- Average error: Steamer = 0.3 win underestimate; ZiPS = 0.2 win underestimate; IWAG = 0.2 win overestimate.
- Average error, absolute value (prevents under- and overestimates from cancelling each other out): Steamer = 0.9 wins, or ~0.5 wins in either direction; ZiPS = 1.0 wins, or 0.5 wins in either direction; IWAG = 1.0 wins, or 0.5 wins in either direction.
- Root Mean Square Error (a way of capturing the effectiveness of the prediction beyond just simple averages, which more heavily penalizes bad predictions): Steamer = 1.1 wins; ZiPS = 1.2 wins; IWAG = 1.1 wins.
Again, this is the critical thing: these projections tend to be pretty close. It’s not really an issue of downscaling because players don’t tend to get 600 PAs, either:
- Across the 14 projected players, 50 percent of the PAs went to players whose WAR/600 estimates were within 1.0 WAR/600 of their actual totals (average was an overestimate of 0.3 WAR/600);
- Another 17 percent of the PAs went to players whose WAR/600 estimates were between 1.0 WAR/600 and 2.0 WAR/600 of their actual totals (average was an overestimate of only 0.5, because again, underestimates and overestimates will cancel each other out); and
- The remaining third of PAs went to players whose WAR/600 projections were further off, though only one of these was a regular with more than 400 PAs (Swanson).
I don’t think the story has changed much: projections are fun, and they’re pretty good, in aggregate. There’s a good chance even the point estimate of the projection won’t whiff, and if it does, it will probably be counterbalanced by a whiff from someone else on the roster in the opposite direction. (And, again, we’re just judging point estimates here, and not probability distributions, which are largely impossible to judge from a single season trial, anyway.)
Some more fun trivia:
- Closest projections: Steamer = 5.5; ZiPS = 4; IWAG = 4.5 (ties are half a point each).
- Closest projections, WAR/600 basis, weighted by number of PAs: Steamer = 31% of PAs; ZiPS = 32% of PAs; IWAG = 36% of PAs.
- Best projection: ZiPS and Ender Inciarte (2.5 WAR/600); Steamer and Rio Ruiz (Steamer had Ruiz at -0.4 WAR/600 and he ended at -0.3 WAR/600).
- Worst projection #1: ZiPS and Kurt Suzuki. Suzuki took everyone by surprise, as mentioned, but ZiPS had him even closer to replacement level than IWAG or Steamer, at 0.4 WAR/450 (compared to the others at 0.7 WAR/450). Suzuki, of course, ended with 3.9 WAR/450.
- Worst projection #2: IWAG and Dansby Swanson. No real surprise here; IWAG had Swanson at 2.6 WAR/600, and he ended at 0.1. However, IWAG wasn’t really alone on this whiff, as Steamer and ZiPS had him at 1.8 WAR/600 and 2.5 WAR/600, respectively.
- Depending on how you tally it, Phillips, Inciarte, Markakis, and even Jace Peterson were the “easiest” to project across all three systems. Markakis ended up in the exact middle of the three projections. Meanwhile, Suzuki deviated the most from his projections, followed by Swanson. Also, a humorous note here that Chase d’Arnaud was projected for between 0.2 and -1.0 WAR/600, and ended up with -0.6 WAR/600 in just 63 PAs, across three teams, which actually made him the biggest WAR/600 collective miss (because no projection system is going to spit out -6ish WAR/600).
Some charts are below, in case you find them helpful. Stay tuned for the same with pitchers, which will actually be not-too-interesting this time because the Braves only used a few starters in 2017.