The main takeaway from the prior post in this series was that the Atlanta Braves position players in 2018 rode a wave of projection outperformance, carrying the team into the playoffs. For the pitching, though, there was no such clear-cut narrative when comparing the projections to the results. The projections underestimated some players and overestimated others; the net result ended up being a fairly modest underestimate.
This post is going to focus largely on starting pitchers, though it’ll include some brief notes about relievers as well. (Relievers get so few innings and are so hard to project as a result that words written about them versus their projections feel especially empty.) The Braves used 13 pitchers in a starting role in 2018; of these, nine* were thought of as candidates to actually open the year in the rotation, and seven actually made more than one start. These seven pitchers, which are shown in the tables below, accounted for about 90 percent of the team’s innings thrown by starting pitchers. The remaining frames were mostly taken up by Kevin Gausman (midseasona cquisition), Mike Soroka (early-season callup but spent most of the season on the shelf), and Touki Toussaint (made five starts down the stretch) — these three guys do not feature in this analysis. In addition, Luiz Gohara (one start, lost season) and Lucas Sims (zero starts) do appear in the table, but aren’t assessed in the comparison because each threw so few major league innings in 2018.
* Anibal Sanchez was not initially assessed in my original projections piece before the season, as he was a late-Spring signing. However, given that he threw a substantial chunk of the Braves’ innings (about 15 percent of total innings pitched by starters), it was not entirely appropriate to leave him out, so I fetched his preseason projections and have added them below.
This post will be organized somewhat similarly to the one for position players. In addition to the tables below, there will be a brief discussion of each starting pitcher’s performance relative to their projections. That will be followed by a quick comparison of the projection systems, and then some tables and notes about relievers used by the Braves in 2018 and their respective projections.
For reference, if you’re stumbling across this post without seeing things from past years or months, you may want to review the following:
- Last year’s projection retrospective for pitchers;
- The pre-2018 projections for the starting pitchers; and
- The pre-2018 projections for the relievers.
A brief note: for pitchers, this analysis considers only two stats: FIP- and fWAR. While I understand the temptation to desire a comparison of results to projections to focus on “what actually happened,” I focus specifically on FIP- and fWAR because these, in my view, reflect the cleanest separation of “what actually happened” between fielders (whose WARs have already been credited with making/failing to make outs on balls in play) and pitchers (who get/lose credit for everything else). As a team, the Braves had the fourth-largest gap in the majors in FIP outperformance, which is consistent with their assessment as one of the league’s top defensive teams (they’re generally top 10 by most defensive measures, and fourth by DRS). For ease of reference, FIP- rather than FIP is used; I personally find FIP numbers confusing because I never know whether a 4.00 FIP (or even a 4.00 ERA) is good or bad, I always need that comparison to league average, which is all the little minus indicates. Just like wRC+, a 90 FIP- means a pitcher prevented runs at a rate 10 percent better than league average (based on the things he could control). Just like for position players, the fWAR values in the second table below are scaled to compare a pitcher’s projected rate to his actual rate over the actual number of innings he pitched.
A brief player-by-player review of the projections
The Braves came into the season with Julio Teheran drawing an Opening Day start, and Mike Foltynewicz generally expected by all three projection systems to prevent runs at a slightly-below-average rate. The Braves finished the season with Mike Foltynewicz having hurled a dominant, 3.9 fWAR season, with an 84 FIP-. Since the projection systems were largely in agreement about him before the season, they all missed by pretty much the same amount. Foltynewicz also threw the most innings by any Atlanta starter, so the aggregate miss was greater — the projections didn’t miss for any other player by more (except arguably for Max Fried, who pitched very few major league innings in 2018).
On a distributional basis, IWAG did project Foltynewicz with a “hump” around 2.5 fWAR/200, but it diluted this with a lower but widespread possibility that he would be wholly ineffective and finish between -1.0 and 1.0 fWAR/200. In reality, he blew right past both of these possibilities, finishing outside the “likely” bounds that IWAG put up. On an innings basis, this was also because his 183 frames blew past his previous career high by a substantial chunk; on a rate basis, the fact is just that he ended up being dramatically more awesome than expected. The real question going forward is to what extent his 2018 success will stick, and what extent he can continue to outpitch his xFIP (which was higher than his FIP by 0.40, and nine points on a minus basis) going forward.
Fried didn’t get very much usage in 2018, compiling just 33 and two-thirds innings on the year across five starts and nine relief appearances. The projection systems were split but not all that sanguine: Steamer saw him as 2.0 fWAR/200 starter with a below-average FIP- (due to the fact that relievers post better run prevention than starters, a starter does not need a 100 FIP- to post 2.0 fWAR/200, but can trend somewhat above that); ZiPS and IWAG agreed that he’d be more of a fifth starter under 1.0 fWAR/200 and an FIP- in the 120s.
Fried’s results ended up being somewhat skewed because he worked out of the bullpen for part of the year, but he finished with an unexpected 91 FIP- and 2.4 fWAR/200. He was actually much better as a starter (72 FIP-, 84 xFIP-), albeit in even fewer innings. Due to the low innings total, the projection systems didn’t really miss by very much, since he didn’t give them much of an opportunity to fall behind. Still, he ended up outside IWAG’s distributional bounds in terms of rate-basis performance, as IWAG saw him maxing out the range of likely outcomes as a league-average arm.
Projected as a solidly-league-average if injury-prone starter, McCarthy ended up suffering an unfortunate final major league season in 2018. He finished with a 119 FIP- and a barely-positive 0.2 fWAR in 78 and two-thirds innings, but his 92 xFIP- was actually better than was likely expected. He somehow managed to allow a homer rate more than twice what he had done to date in his career, and a pop-up rate just north of one percent, compared to nearly 10 percent for his career. As a result, the projection systems whiffed pretty badly despite him hurling fewer than 100 frames, overestimating his contributions by nearly a win of value.
Distributionally, his paltry 0.2 fWAR and 0.5 fWAR/200 were something like fifth percentile outcomes as laid out by IWAG. On a rate basis, IWAG saw McCarthy as most likely to end up somewhere between 2.0 and 5.0 fWAR/200, and he widely missed that mark. Generally, these types of disparate results are not unexpected, and cancel each other out — this is why projections “work,” in a general sense. Again, what I come back to is that McCarthy having an unexpectedly poor season was part of the counterbalancing to, say, Foltynewicz having an unexpectedly good season — the main takeaway is that the place where this counterbalancing didn’t happen was essentially with the entire array of position players.
The projection systems saw hard-throwing young lefty Sean Newcomb as very similar on a rate-basis to softer-tossing old righty Brandon McCarthy (at least on a rate basis): league-average run prevention, and somewhere around 2.5 fWAR/200. Those projections fell very wide of the mark for McCarthy, but they were really quite close for Newcomb, who finished with a 103 FIP- (projections were 97 to 101) and 2.3 fWAR/200 (projections were 2.4 to 2.7). Newcomb’s success was overestimated by the projections, but only slightly.
IWAG gave Newcomb a very unbalanced bimodal projection: a very high likelihood of an fWAR/200 somewhere between the mid-2.0s and mid-3.0s, and a much lower likelihood of one somewhere between 0.7ish and 2.1ish. He, like a variety of other players, ended up finishing kind of in-between these two possibilities — an outcome around the 20th/30th percentiles
Anibal Sanchez was a place where the projection systems disagreed preseason. Steamer saw him as having above-average run prevention potential (96 FIP-, 2.3 fWAR/200). ZiPS was more middling (109 FIP-, 2.0 fWAR/200). IWAG was very skeptical of his ability to rekindle success (117 FIP-, 0.9 fWAR/200). In reality, Sanchez blew past all of these, making IWAG look very wrong in the process: 90 FIP-, 2.4 fWAR in 136 and two-thirds innings, 3.5 fWAR/200.
As a result, Sanchez was the other Braves starter, along with Foltynewicz, to hurdle past expectations. Even Steamer missed by 0.8 fWAR across his actual innings pitched; IWAG blew it badly by finishing 1.8 fWAR off the mark. For what it’s worth (which isn’t much), IWAG did see Sanchez a slight chance (around 10 percent or so) of rediscovering a way to be effective, so he finished around the 80th and 85th percentiles in its distributions by fWAR and fWAR/200, respectively.
A lot of this exercise has been “haha player xyz did better than the projections.” Even in Mccarthy’s case, where the performance was worse, the result was more odd and idiosyncratic than just a whiff. Unfortunately for the Braves, none of this really applies to Julio Teheran.
The projection systems saw Teheran uniformly as basically a number-four starter, i.e., an FIP- around 110 and fWAR/200 around 1.5 and below 2.0. Instead, Teheran pitched more like a number-five starter over the course of the year, with a 120 FIP- and 0.8 fWAR/200. The result was that the projection systems overestimated his run prevention by about 10 points of FIP-, and his fWAR by somewhere between 0.5 and 1.0 fWAR/200.
(For those that want to talk about Teheran’s FIP-beating ways, the reality is that for his career, he’s had about a 15-point gap between his ERA- and FIP-, which ballooned to 23 this year (likely to the Braves’ superior defense relative to earlier in his career). So even if we apply that “deserved” 15-point gap to his 120 FIP-, he still ends up preventing runs at a below-average rate. This is the real danger of using only ERA or RA9-WAR/etc. to track player value, because even if a player is an FIP-beater, he starts absorbing the value of his defense when doing so.)
Distributionally, IWAG featured a pretty simple bimodal curve for Teheran: a slightly higher chance of being somewhere around 1.0 fWAR/200, and a slightly lower one of being around 3.0 fWAR/200. It makes sense that with his poor performance, he ended up around the 10th to 15th percentile of both fWAR and fWAR/200 in terms of this distribution.
Before the year began, the projections saw Matt Wisler as a number-four-ish starter. Steamer was the least positive (120 FIP-, 0.7 fWAR/200), ZiPS was middling (115 FIP-, 1.0 fWAR/200), IWAG was weirdly positive (109 FIP-, 1.7 fWAR/200).
(The tendency for IWAG to overrate guys who seem to do fine at Triple-A but can’t successfully transition to the majors is one of the main tweaks I am working on this offseason.)
Wisler ended up making three starts for the Braves, in addition to four relief appearances. He was then traded to the Reds, where he made another 11 appearances out of the bullpen. He was actually okay as a starter in those three games (89 ERA-, 100 FIP-, albeit a 119 xFIP-), but his aggregate stats were probably more akin to what an observer that wasn’t a projection system would expect: 116 FIP-, 0.5 fWAR/200. Distributionally, IWAG saw Wisler as perhaps centered around 1.0 fWAR/200, but with a “fat tail” extending out to the right, to better performances. In reality, his performance was more akin to the ninth percentile of the IWAG distribution, but as noted, this is probably an issue with the underlying IWAG calculus rather than Wisler just performing particularly poorly — at this point, this might be who he is.
Projection system comparison
The real story here is, yes, they were all close, but Steamer kind of clean-swept the fWAR/200 component of the assessment. For each of these seven players, Steamer had the smallest gap between the player’s actual fWAR, and Steamer’s rate applied to the player’s actual innings pitched. Now, some of these gaps were still quite large, i.e., a 2.2-fWAR underestimate of Mike Foltynewicz, but seven-for-seven is pretty good. IWAG had the same-ish projection as Steamer for Foltynewicz and McCarthy so it “tied” it in that regard; ZiPS technically got the closest to Wisler’s FIP- (and really, the important thing is that they were all pretty similar in aggregate anyway) but never got closer than the other systems to an fWAR value.
The sum total differences accumulated by these seven hurlers across their actual innings pitched were:
- For Steamer, an underestimate of 1.6 fWAR.
- For ZiPS, an underestimate of 1.5 fWAR.
- For IWAG, an underestimate of 2.1 fWAR.
For FIP-, the three systems averaged an overestimate of 3, 5, and 5 points (an overestimate of FIP- is an underestimate of run prevention effectiveness), respectively. On an absolute value basis, these become 11, 16, and 17 points. In terms of root mean square error, the values are 21, 26, and 23.
For fWAR, the three systems averaged an underestimate of 0.2 fWAR, 0.2 fWAR, and 0.3 fWAR, respectively. On an absolute value basis, the average error was 0.6, 0.8, and 0.9. For root mean square error, the values are 0.9, 1.0, and 1.0, respectively.
Steamer and ZiPS both got very close to Matt Wisler, but he had a limited number of innings. Steamer hitting Sean Newcomb within 2 points of FIP- and 0.1 fWAR over his actual accrued innings might be more impressive. The biggest whiff was ZiPS and Mike Foltynewicz on an fWAR basis (2.3 off the mark), but IWAG also missed bad on Anibal Sanchez, while ZiPS and IWAG severely overestimated Max Fried’s 2018 FIP. Generally, and as noted above, Mike Foltynewicz (and Max Fried, for FIP) ended up being the projection-buster; Sean Newcomb and Matt Wisler fell most in line with expectations.
Some small notes on relievers
Of relievers projected to contribute at least to some extent before the season started, seven actually ended up throwing some meaningful chunk of innings for the Braves in relief. These seven pitchers combined for 74 percent of the relief innings thrown for the team in 2018; midseason acquisitions Brad Brach and Jonny Venters, along why the usual reliever churn (see Luke Jackson) made up the remaining innings.
Reliever projections, as shown below, tend to be all over the place. But, for the same reason they are uncertain, the reality is that relievers don’t really throw too many innings, so the overall “miss” in terms of a projection being too optimistic or pessimistic relative to reality tends to be fairly minimal.
For Jesse Biddle, the expectations were between “quite bad” and “decent.” He ended up being okay. Shane Carle outperformed all of his expectations, but his xFIP- of 108 belies his 88 FIP- and should be pretty scary for his future employer(s) going forward. Sam Freeman had a weird year that was basically wholly expected in terms of peripherals, but not so much for the agonizing way in which he was used and the way ball-in-play luck messed up parts of his season. A.J. Minter was expected to be good, and indeed, was good. Peter Moylan was expected to be pretty decent, and instead was super-awful. Arodys Vizcaino ended up somewhat worse than generally expected. Dan Winkler was the counterbalance to Peter Moylan — he was perhaps expected to be decent but instead he ended up being dominant.
On net, Steamer underestimated the production from these seven relievers by 1.5 fWAR, while ZiPS did so by 1.7 fWAR. IWAG was somewhat closer, at an 0.6 fWAR underestimate. On average, the three systems had a per-player error of -0.2 fWAR, -0.2 fWAR, and -0.1 fWAR, respectively. Steamer generally ended up with the closest projection, except for the two guys with standout relief seasons (A.J. Minter, Dan Winkler) where IWAG was more positive to begin with and therefore ended up somewhat closer, though in reality everyone missed on Winkler by quite a bit.
As mentioned in the other posts, all charts/etc. available on demand, just ask.