clock menu more-arrow no yes

Filed under:

Braving New Territory: Being at Peace with WAR

New, 65 comments

The WAR puns will only get worse, I promise.

Christian Petersen

This is the first installment of the primer series. For an explanation of the goals of this series, click here. Today, we'll take a look at WAR (Wins Above Replacement) in general, and over the next few weeks, we'll look at offensive, defensive, and pitching WAR.

Everyday Stats and Their Flaws

Everyday Stats We're Talking About: ...

I think one of the biggest issues when discussing WAR is that there's nothing to connect it to. There is no ancestor for WAR - or technically its ilk, as Win Shares and the like were technically precursors. When we talk about everyday statistics, there is no one stat that attempted to encompass what a player means to his team. No one really even tried it until Bill James. We had pitcher wins, which I guess somewhat tried but not really. We had the Triple Crowns for hitting and pitching, but those didn't really come around very often. There simply was no stat that encompassed a player's value.

And even if you bent some of them enough, none could do the magical thing WAR does - compare hitters TO pitchers. Not hitters to hitters. Not pitchers to pitchers. Hitters TO pitchers. That's the real brilliance here. WAR and its family were the first real attempt at comparing two incredibly different parts of baseball to each other. It's really hard if you think about it. On one hand, you have hitters who do very little day-to-day but play 162 days, and on the other hand, you have pitchers who do very much but play only about 34 days. It's a monumental task.

Which is basically why no one ever did it in a concrete way. The only way it was ever done was arbitrarily. "Okay guys, we have a really good hitter and a really good pitcher. They're the best at what they do. But who's better than who?" And that's when you get the arbitrary and controversial "team player," "winning player," "he's on a winning team," "well he plays in more games," "well he has a bigger impact in those games," and so on. They're all perfectly logical, but there's just never any real evidence that those things exist. And even if they did, no one ever said how much they were worth, and that just makes that stuff look convenient - "How much do I need for my favorite player to be the best? Oh okay, that's how much it's worth." No one - and I mean NO ONE - ever explains how much the intangibles are worth or is even remotely consistent with it. That's a big reason why intangibles have been derided by "the sabermetric community." It's not that they don't exist. It's that they're often used without rules or restrictions to justify a certain belief.

What WAR also does is allow us to see the guts of the process. Prior analysis was primarily hitting-focused and left out defense and baserunning, and if they were added into the discussion, it was largely in a positive way - "He steals a lot of bases" or "He's a Gold Glover". Essentially, people picked and chose when to add those elements even though they play a significant part in every player's value. WAR doesn't pick and choose.

Nuanced Stats and Why

Nuanced Stats We're Talking About: fWAR, bWAR, rWAR, WARP

Wins Above Replacement is a pretty massive topic, so you'll have to bear with me through the basics here. Over the course of the offseason, we'll cover things like DRS, wOBA, FIP, etc. and how they play into WAR, but for today, what you need to understand is that WAR is one metric made up of many statistics. The objective is to measure a player's value to a team.

The first step is establishing a baseline. We need a baseline as a source of comparison. You could set this anywhere you'd like - HoF, All-Star, Average, or Replacement Level. The reason Replacement Level is chosen is based out of the economic purpose of WAR - it's around to aid in deciding how much to pay players. If you have a finite amount of money to use - ie. a budget - you'd like to know how to spend it. As you might expect, you want to spend more money on good players and less on bad ones. WAR helps because, as we talked about above, it helps compare hitters to pitchers, and Replacement Level works really well as a baseline because we KNOW replacement players cost $500,000, or the league-minimum.

As the world "replacement" implies, the players that define this area are easily replaceable. They're the 25th man on the roster and might spend the season bouncing between AAA and the MLB. Essentially, this player is Jose Constanza. Sure, he does some good things, but if another team wanted him, the Braves would take a few thousand dollars, a similar AAA player, or a duffle bag to be named later. If you want Gory Mathematical Details (h/t to Russell Carleton for the term), you can look here. But the basic idea is looking at a bunch of Jose Constanzas and analyzing how they perform.

Now that we have a baseline, we can compare relative production and, more importantly, relative cost for that production. How that is done is really hairy, and if you're really that interested, here is enough to keep you busy. But the general idea is to breakdown in what areas players contribute. For position players, that means hitting, baserunning, and defense over the amount of PAs or innings played, and for pitchers, that means pitching performance over the number of innings pitched. Those components are derived from stats like wOBA and FIP to give relative production, and the PA and IP give the amount of time they've played at that level - obviously, more is better at a given production level (well, unless you suck and make everything worse).

These components use linear weights and are made into run values because runs are the currency of wins - again, we break things down to build them back anew. Every ten runs means another win (here is why). It is worth noting that these run values are compared against the "average" MLB player, not replacement level. Replacement level is an adjustment added in based on playing time, but it's pro-rated for everyone. The adjustment is how they get to Replacement Level.

If all of that isn't enough for you, the last part is the positional adjustment. If I asked you to list the number of players who could play first base in one column and the number who could play shortstop on the other, the first column would be waaaaaaay longer. That part is common sense. If I then ask you to pick the best 30 hitters from each group, the first column is more likely to have 30 good hitters than the other. This is why we need an adjustment - we expect better offensive production from certain positions as a result of the number of possible hitters involved. It's not exactly fair for a shortstop to be compared to a first baseman, but we obviously need both. The general defensive hierarchy is C, SS, CF, 2B/3B, RF/LF, 1B, DH, and the positional adjustment is calculated by the number of innings played at each position. Here are the Gory Mathematical Details for the adjustments.

So let's do a little recap. The first thing we need is a baseline, which is Replacement Level because it gives us a nice production and economic starting point of comparison. Then, we look at the components - hitting, baserunning, defense, and/or pitching - and add them up for each player. That gets us to WAR. We can then use the league minimum that Replacement-Level players make along with a helping hand from free-agent market forces to get the "value" of a "win."

To give you some context, let's talk about what the numbers mean relative to the league. Anything around -0.5 and 0.5 WAR means you're replaceable. Worse than that, and I don't know why you even showed up (see: Francoeur, Jeff). Here's a general range (Position Player / Pitcher) for everything else in fWAR and rWAR terms (WARP is a little different, but the other two are more common):

  • 0.5 to 1.5 - Bench Player / Reliever or Swing Man
  • 1.5 to 3.0 - Decent Regular / Elite Reliever or 3 or 4 starter
  • 3.0 to 5.0 - All-Star / 2 starter
  • 5.0+ - Awesome / Ace
  • 7.0+ - MVP / Cy Young
  • 10.0+ - Best seasons ever

Now, what about these different versions? First of all, don't cross streams. Each of these are isolated systems - they compare well within their own system but not to each other. FanGraphs and Baseball-Reference finally compromised on an identical Replacement Level to help with this, but Baseball Prospectus, for their own reasons, decided against it. Compare players within each system, and then look at how another system compares those players. If they disagree, you'll need to look into the components, figure out where they disagree, and use good ole fashioned common sense to decide which is more accurate. Yes, that requires more effort. And yes, stop being lazy.

I, however, do understand that the various systems cause some concern, but again, it's not perfect. There will always be some arbitrariness because you have to pick and choose what to count and not count, and there are philosophical differences - for instance, fWAR for P is based on straight peripherals (FIP), but rWAR uses the number of runs allowed and makes an adjustment for defense. But "arbitrary" is relative, and this is way better than just winging it and saying, "Well, his leadership makes him better." Again, we're aiming for "perfect," but what we EXPECT is "better." Evidence is better than winging it.

What's Yet to Accomplish

Oh, where to start. First, let's talk about production vs. talent. They are two different things. For instance, Jason Heyward "produced" 3.5 wins this year, but I wouldn't call him a 3.5-win talent. He's better than that, but the injuries affected his playing time and overall production. Chris Johnson produced 3 wins this season, but he's not that good - his massive BABIP helped him produce over his "talent level" (in all likelihood). In this case, WAR is basically telling you what happened and doesn't account for regression or growth. Once you've accumulated enough data, you can start saying what talent level a player is, but it's hard to declare that from just one season. You have to look at peripherals, etc., to see if it's all sustainable, and then you have to ask if he'll improve or decline due to aging. WAR doesn't really account for that in-season. WAR is better for production than talent, but it does describe talent after a few seasons.

It also doesn't account for the difference in parks (Edit: for position players; pitchers do get an adjustment, as noted in the comments). This is by design. There are good things about this and bad things. It's nice because we get a picture of what the player actually did, and it treats everyone the same. The bad part about that is that players aren't in the same situations. Inflated offensive numbers in Coors Field, for example, will inflate position player values and deflate pitcher values - the opposite for places like Petco and Safeco - because they are ultimately based on raw numbers. Parks impact performance and the raw stats that are at the root of WAR, but WAR leaves it out in order to avoid some of the murkiness of what players "deserve." It, however, would be nice to have a version that did.

Some of the components also have issues. Remember, overall WAR is based on different components that combine to make one stat. So when the topic comes to something like defensive metrics (we'll talk about this more in a few weeks), issues in the components can throw off the total - to be honest, I'm not convinced we have offense and pitching "figured out," either. You can make some adjustments by simply paying more attention and adjusting it yourself, but the overall system works pretty well. That being said, it could work "better."

The last issue is a matter of use. There are some who use it as a conversation-ender. We need to avoid that. When we ask the question, "Who is the MVP?", WAR narrows the field. But you can't just take it for granted for the reasons we've just mentioned. Don't blame the stat for something it didn't do. WAR puts together a player's contributions in the most reasonable manner we've ever seen. But there's work yet to do.

Central Lessons

  • WAR gives us an overall idea of a player's production.
  • It adds up the contributions of all possible areas of production into one number for easy reference.
  • This number is really good for an "impression," but if you want a more exact analysis, you'll need to examine the components in context.
  • Don't declare anything to be "true" based on a glance at WAR. You MUST look deeper in order to start making definitive statements about a player.