FanPost

Lineups, optimality, scoring runs, and Fredi: an attempted smorgasbord

Hi friends. This is gonna be long. It's gonna have math. It's also largely an experiment with me messing around with the numbers at my disposal and my own understanding of statistics and baseball to attempt to quantify the impact of the Braves using very poor lineups in 2014.

We commonly hear that lineup construction doesn't really matter except for teams on very specific points on the win curve. The 2014 Braves were likely not one of those teams, as they enjoyed no career seasons from any players, and very poor seasons all around the diamond from a number of guys who really had to be key contributors for the Braves to get anywhere. With that said, I'm undertaking this exercise for a couple of reasons, aside from interest. First, a lot of the research into how much lineups matter was done back when the run environment was a bit more robust, and as a result, squeezing a marginal run here and there wasn't as important. That's likely not the case anymore. On top of that, the Braves magnified this issue horribly, with a terrible offense.that ranked second-worst in runs (573, or 3.54 a game), 18th in baseball at out avoidance (position players made outs in 68.5% of their plate appearances), and 21st at wRC+ for position players (94). The fact that the Braves outplayed their Pythagorean expectancy by one win is just a depressing coda on a sad season at the plate.

(I'll pause here for astute readers to note the slight absurdity between context-neutral stats indicating that the Braves were in the bottom third of offenses, but the game stats indicating they were second-worst at scoring runs. Taking park adjustments out doesn't adjust their placement as 21st, so on top of having a poor hitting season, they also sequenced and Barved their way into not scoring any runs ever. Thanks, guys. This would also be a segue into broader issues regarding non-linearity of context-neutral hitting stats for run-scoring, but that's a completely separate post with way more math.)

WiD2D1A.0.png

The Christmas-themed chart above summarizes total season runs scored and allowed. It's not all that informative by itself, but it's going to form the basis for some stuff below. We can see that the Braves were shut out more than they held the opposition to zero runs, and while they most commonly scored four runs, the pitching most commonly held the other team to two runs. Still, the curves are close to mirrors of each other. Another, more helpful way to visualize this data is to actually look at the run differential in each game.

90ozq9H.0.png

You can see that the most common outcome for the Braves in 2014 was to lose a 1-run game, followed by winning a 1-run game. Over 54% of games played by the team were decided by 2 runs or fewer; the Braves lost two more such games than they won.

While not entirely independent, one thing I've been interested in is whether the actual sequence of runs scored/allowed over the course of the season is representative of how a team with similar run scoring/allowing patterns can be expected to play over infinite games, in terms of wins and losses. To do this, I simply run a simulation where runs scored and allowed are selected independently of one another, and see how many games are won at the end of 162. (I throw out any ties.) I acknowledge that in many cases, these two items are not independent due to managerial choices about which relievers to use, or whether to play for one run or more runs, and so on. But it's a good step into seeing whether the Braves in 2014 were lucky to end with 79 wins, or horribly unlucky to finish there despite a bad offense because of their good pitching.

The chart below does pretty much exactly that - it plots the likelihood of winning a specific amount of games based on 10,000 simulated seasons where for each game in each season, runs scored/allowed were randomly drawn based on the distribution of runs scored/allowed in the first chart.

ZH9Kf4O.0.png

In what is perhaps not at all surprising given the way the season ended, the most common outcome is the one actually experienced by the Braves this season: 79 wins, which has about a 6.5% chance of happening in any given season simulation of 162 games. Overall, the Braves ended pretty much where they should have given their end-of-game runs scored/allowed results. Given said results and the simulation, they end up with a 68% chance of finishing below .500 and 29% chance of finishing above .500. The simulations give the team just a 7% chance to finish with 88 wins, the win total that would've given them a Game 163. On the flip side, this sorry run scoring total results in a season of 72 or fewer wins 22% of the time, which would've placed the Braves last in the division. But hey, at least the pitching was good enough to yield just a 3% chance of a 64-win season or worse, so suck it, Diamondbacks?

Now, I'm sure everyone clicked this post to talk about lineups. And don't worry, I'm getting there, albeit very slowly. (You've already read 900 words and 3 charts about how much our offense sucked, and I apologize for that.) So, okay, lineups. Here's my overall plan, laid out here without results just so it's clearer when I do go through it:

  1. Pick a lineup used this season, simulate that lineup over 162 games.
  2. Scale the simulated result in terms of runs per game to the actual runs per game achieved during those games, just to have a fair basis for comparison. (Basically to make sure that the lineup simulator is scaled appropriately.)
  3. Use a lineup optimizer tool to optimize a lineup with the same players in different spots.
  4. Run this new, optimized lineup through the same simulation, and apply the same scaling factor as before.
  5. Compare the runs per game with the optimized lineup against the original. Once this comparison is done, it is easy to re-run the simulation (blue chart) above with an adjusted runs scored number, and see how wins would've changed while holding the pitching results constant.

One issue is that runs-per-game between two different lineups are unlikely to be discrete. For example, I'm not sure that runs per game between Optimized Lineup A and Terrible Lineup B are likely to amount to even one whole run unit. As a result, this isn't very helpful for the charts above, because even if we 0.9 runs to each game the Braves lost 2014, they'd still lose that game. So, to deal with this issue, I've converted the runs scored/allowed distributions from discrete curves to continuous ones, using a modified smoothed-normal distribution. Because of the pretty weird patterns of the two histograms on the first chart, it was hard to get any kind of equation-based, smoothed line that correctly assessed their contours. Note that the results are similar but not exactly the same: where previously it was a 68% chance to finish below .500, it's now a 63% chance; the previous 29% chance of finishing above .500 is now about 32%.
J9MyqEe.0.png
A lot of the difference is specifically around the .500 mark; the chances of the Braves finishing with above 88 wins are very similar in both simulations, as is a total above 73 wins or so. This whole "discrete runs" issue is a considerable source of issues and uncertainty going forward, so we'll just have to keep it in mind as we go along and extrapolate stuff. Because of the fact that the 88-win plateau probability is similar across the discrete and continuous distribution simulations, I'm going to try to reference that number more often going forward moreso than others.

Okay, now it's been 1,300 words about not-lineups, so let's talk about lineups. Talking about lineups with the 2014 Braves is difficult. The Braves started 66 total different defensive arrangements in 2014, and the top 6 such arrangements accounted for just 43% of the team's total games. The standard arrangement of Gattis-Freeman-La Stella-Johnson-Simmons-Upton-Upton-Heyward played for just 21 games together. Similarly, and as a result, the Braves used 101 different batting orders across 162 games, and the most common order didn't even include Gattis. As a result, it is very difficult to even find out what to compare where and how, because the Braves didn't have one standard batting order or even one standard defensive alignment.

As a starting point, I think it makes sense to examine: 1) the most common lineup; and 2) the most common lineup with the "starters" playing. The most common lineup is BJU-AS-FF-JU-JH-CJ-TLS-CB; the most common lineup with starters is JH-BJU-FF-JU-EG-CJ-TLS-AS. Evaluating the former tells us about the value of not batting two of the worst hitters (BJ, Simmons) at the top; evaluating the latter is largely about the value of not hitting BJ second. When I first envisioned this post, I thought it would be doable to take the common lineups, or at least the common lineups with starters, and do this analysis for all of them. But that's not that simple: of the 101 lineups the Braves used, the 6 most common lineups were used for just 37 games. And in those 37 games, 10 had C-Beth, 11 had Uggla, and 5 had Laird. So really, we have no "generic" lineup to compare or analyze. As a result, my plan is just to just look at the two lineups above, extrapolate from there, and then maybe test some other lineups to see what the impacts of those was. So I'll muddle through, but it's a thornier exercise than it might seem to really come up with the overall impact of Fredi's lineups. Which is a shame, because I think if we didn't keep using different lineups for the whole season, there could be something pretty interesting and straightforward here.

So let's start with the most common lineup. BJ Upton and Simmons at the top (agh), followed by Freddie and Justin, with Heyward, CJ, Tommy, and C-Beth following up. No Gattis, so it's not the most accurate representation of the team as assembled, but Gattis missed a lot of time down the stretch and somehow this lineup snuck in to the lead. And it's used 67% more than the next-most-common lineup, so it's got that going for it, which is nice? Except for the whole part where BJ and Simmons got the most PAs, which is certainly not nice. Believe it or not, this lineup, used exclusively between June 30 and July 19 (Mets, Diamondbacks, Mets again, Cubs, Phillies), actually scored 48 runs in 10 games, or 4.8 runs per game. Using a really crappy lineup simulator that I built myself, which incorporates only station-to-station baseball and steals/caught stealings with no double plays, sacrifice flies, bunts, and the like, this lineup is estimated to score 2.96 runs per game (assuming a composite pitcher consisting of all Braves pitchers' PAs in aggregate for 2014). 2.96 is really bad, well below the 3.54 runs per game the team scored overall, but this is not surprising because A) no Gattis and B) my lineup simulator takes away only double plays, but doesn't allow for runners to score from first on a double, and the like. Overall, I think this is an impact that depresses the run output, in general. However, it's hard for me to "test" or scale this 2.96 number, because 4.8 runs per game with this lineup is clearly an aberration and I know that my lineup simulator is not off by 2 runs per game due to earlier test (also that would be insane).

My first "test" is to use the lineup optimizer here. Now, not only is that tool old, it's also outdated and has been heavily critiqued in a few different places, including on Fangraphs (search Google for "Be Cautious with Lineup Analysis Tool" or something similar). With that said, using the 1989-2002 (steroids) run environment, you get an estimate of 2.95 to 3.71 runs per game, depending on how those players are arranged. It also says that the lineup as listed would score 3.42 runs per game. So basically, we know that this tool is overly inflated due to steroids era hijinks and it's saying 3.42 runs per game, my own model is saying 2.96 runs per game, and the increase from optimizing this lineup is 0.28 runs per game. The lineup optimizer I'm talking about says that the best lineup is a rather silly Freddie-Heyward-Bethancourt-Justin-CJ-Simmons-BJ-Pitcher-TLS; when I run that through my own model I get 2.95 runs per game, which just shows the incompatibility of these two methods, and also that that optimized lineup is very silly for this day and age. If we use the broader 1959-2004 run environment (less steroid-y) in the lineup optimizer, we get an even sillier estimate of 3.46 runs per game from the optimizer, or 3.59 at the top end. The difference between the two is 0.12 runs. So while the optimizer is saying that the lineup Fredi used in these 10 games was 0.12-0.28 runs per game worse than an optimized lineup, its optimized lineups are so silly (this latter one is Heyward-Freddie-BJ-Justin-CJ-Pitcher-TLS-Simmons-Bethancourt) that we can just move past it and try some other stuff. For the record, that second lineup via my tool yields 2.97 runs per game, so basically we're just in the land of irrelevance no matter what. On to bigger and brighter things:

Using The Book's simple lineup optimization, we can come up with something akin to Heyward-Freddie-TLS-Justin-CJ-BJ-Simmons-Bethancourt-Pitcher. The three best hitters are in the 1-2-4 spots, Freddie is the most balanced in the 2 spot while Heyward's OBP leads off while Justin's SLG puts him 4th. TLS and CJ are the next-best hitters and TLS's OBP hits third while CJ's better SLG hits 5th; the remaining hitters are just descending in quality (or ascending in uselessness). This lineup yields an even 3.00 runs per game by my simulator, and it's hard to come up with anything else that's better by playing with the numbers. So basically we've got +0.04 runs per game by not hitting BJ/Simmons up top, if Gattis is out of the lineup. +0.04 runs per game would mean 6.48 runs over a 162 game season, so if we call it 6, that's really about half a win in a context neutral sense. No real damage done there.

But of course, that's all with no Gattis. The most common White Bear-containing lineup is Jason-BJ-Freddie-Justin-Evan-CJ-TLS-Simmons-Pitcher. My tool indicates 3.31 runs per game from this lineup; in reality, this lineup was used only in early June (June 3 through June 14) and included two games at Coors; scoring 31 runs over 6 games for an average of 5.2 runs per game. So yeah, again, not happening. But 3.31 runs per game is not that far off from what the team did all year. Interestingly, when these players (the starters) were on the field together, the team scored 3.13 runs on average, which is actually worse than both what my crappy tool predicts and the team's average runs per game tally. I'm going to ignore the lineup optimizer exercise for the time being with this lineup, just because it has really wacky results. Using The Book's simple optimization, and bear with me, something like Freddie-Justin-TLS-Gattis-Heyward-CJ-BJ-Simmons-Pitcher makes sense. The logic is the same as above, but since Freddie/Justin/Gattis are now the best three hitters, Heyward is relegated to the second-tier fifth spot instead. This lineup doesn't result in any real change via my tool. Meanwhile, a facsimile of the 2013 lineup that really clicked (Heyward-Justin-Freddie-Gattis-TLS-CJ-BJ-Simmons-Pitcher) yields only 3.33 runs per game per my tool, so an improvement of +0.02 runs per game. That's basically 3 runs over the course of a season, or a third or so of a win. Ultimately, it seems like it doesn't make a huge difference who bats where.

To bring it full circle, I used the less conservative +0.04 runs per game estimate and added it to my smoothed runs scored/runs allowed distributions from above. In general, you can probably already foretell that it's not going to have any real impact: even in cases where the distributions coincide with runs scored/allowed to the nearest integer, the 0.04 will rarely make a difference.

rSANsLp.0.png

That's the modified win distribution simulation chart, adding +0.04 runs to every game outcome. It's pretty much identical: the expected value jumps about 1 win (but only due to rounding), and in general it results in about a 3% greater chance of finishing above .500 and a 1% greater chance of finishing with 88 wins. These aren't substantial and noteworthy improvements. Lastly, as a lark, I ran the same simulation but assumed that the Braves scored runs like the Marlins did - they were about average for the NL, a bit below the major league average in terms of scoring runs at around 4 runs per game. This is essentially adding 0.47 runs per game to the Braves, but keeping their pitching results the same.

OeQt0kr.0.png

The expected value of doing this jumps from 78-79 wins all the way to 86. It's not quite 88, but it's something. Chances of finishing above .500 are 78%; chances of finishing with 88 wins and playing a game 163 are 43% under this "I wish" scenario. The fact that they're not 100% shows that it takes more than just an average offense to make the postseason; though if the Braves were able to have even an average offense, they'd have made it a much more exciting September.

Anyway, thanks for reading. I was hoping to end this with some kind of decent estimate of "Fredi cost the team X wins with his stupid lineups" but at this point I'd be hard-pressed to say it was more than one win. The net effect of adding more poor hitters into the lineup actually makes their order matter less because you can't go lower than zero in terms of scoring runs, so on an average basis, it's hard to see where he robbed the team of more wins by virtue of filling out the lineup card in a weird order. It was bad process, and there were bad results, but if BJ and Simmons weren't batting at the top, they'd just be giving away free outs at the bottom. Missing Gattis for that last month hurt the team way more than any kind of lineup shenanigans.

In conclusion - I just want to say that I have all my models and everything up, so if there are specific things you're wondering about - the actual impact of this over that or what have you, that can easily be quantified. So just ask away, and I'll try to answer.

Side note if you got this far: my results are pretty similar to the ones here from someone far smarter than me, who also did a better job of making this same point in not quite a Braves-specific context. So yeah, no one should be surprised given that MGL finished his post in August.

This FanPost does not express the views or opinions of Battery Power.