Everyday Stats and Their Flaws
Stats We're Talking About: W-L
Kill the Win.
It's simple and elegant, but it kinda misses the point. One of the main themes for today is isolating what a pitcher does from outside contributing factors - defense, bullpen, run support. While that sounds like "duh", it's the basic misconception behind the statistics we'll be discussing. Offense and defense are fairly easy to see what they're responsible for - hitters either get out or they don't, and fielders either make the play or they don't - but pitchers aren't quite so easy. They have the cannon arms and the break-neck curves, but what happens after the ball leaves their hand is still a bit of a mystery.
But pitcher win-loss records aren't particularly good at isolating what a pitcher does. Although a starting pitcher will likely have a huge influence over a particular game, there's a lot they have little influence over that can still make a major impact. The first thing is the run support from the offense. No matter how well the pitcher pitches, a good or bad offense can kill him or save him. Defense is another thing. Once the pitch leaves his hand and is contacted, it's up to the defense to make the play and avoid errors. And then we get to the bullpen. The starting pitcher could have done everything well, and the bullpen can blow it with a few swings of an at-bat. If you're looking for something to indicate a pitcher's performance, a W-L record tells very little of the story, and it's a story that will need a lot of context.
But wins are often used to tell the story of a pitcher who overcame, who dominated, and who did what he could for his team. The issue is that the story leaves out the rest of the team. One of the beautiful things about baseball, in my opinion, is that you can't hide players like in other sports. Everyone gets a turn through the order, and as much as you try to hide Evan Gattis in left field, you can't prevent the ball from being hit there. So trying to ascribe a cumulative effort - no matter if the contributions are not equal - to one person seems ... well, inaccurate.
That doesn't mean pitcher W-L records say absolutely nothing, of course. Good pitchers tend to rack up more wins than bad pitchers because they tend to put their teams in better positions more often. But it's just not as simple as giving a starting pitcher the win. It never was, even back in the day when pitchers pitched 9 innings and a win was more appropriate. It was more appropriate, but it wasn't actually appropriate - a guy pitching for the 20s Yankees was more likely to win than if he pitched for the Senators. But someone wanted a way to differentiate between how a team did with one pitcher than they did with others, and because it had some logic to it, it stuck. A few decades later, here we are trying to undo the mess of unintended consequences caused by implicit acceptance (yeah, you reread that because it's briliant ... brilliant).
One of the other problems with win-loss records is, of course, the rules surrounding what counts as a win. Starting pitchers have to go at least 5 innings. Relievers who happen to be in when the team takes the lead - no matter how many outs they get or runs they gave up - can get/vulture a win. The scorer could even give it to whoever he wants if he/she wants to pull rank, though that doesn't happen often. There are a number of convoluted rules that go into a pitcher's W-L record, so while nuanced stats will take some learning, the rules to getting a win did, too. It's just that you have to learn again.
But we're smart people, and we can recognize the issue here. Pitcher W-L records simply have a lot of other things going on besides just the pitcher who won or lost. As we go through these posts, I'll be showing you other statistics that are able to more specifically key in on things that pitchers, and only pitchers, do. You'll notice there are some differing theories, and you can feel free to choose the one that makes the most sense to you. But I think we can all move past pitcher wins at this point.
Nuanced Stats and Why
Stats We're Talking About: fWAR, bWAR, and WARP
Instead of win-loss records, let's try to focus on stats that better isolate pitcher performance. For the most part, these stats are guided by DIPS (Defense-Independent Pitching Statistics) Theory. The idea behind this theory is that pitchers have little control over the ball once it leaves their hand. Once the ball leaves their hand, the batter is now primarily responsible for what happens next. If he recognizes or guesses the pitch correctly along with hand-eye coordination, he can smash the ball or let a bad pitch go, but if he fails, he'll either mishit it or miss it entirely. Once the ball is hit, the defense takes over, and they either make the play or they don't depending on positioning, where the ball is hit, and how fast the runners are.
After much research, it was surmised that pitchers didn't have much control over balls in play (BABIP), but they did influence the amount of strikeouts, walks, and home runs they gave up. Essentially, the quality and location of their pitches determine their strikeout rate. Location affects the walk rate. And location and quality affect home run rate by avoiding leaving easy-to-hit meatballs over the plate. None of those - what are called "peripherals" because they aren't the runs scored part but what leads into runs scored - involve the defense at all, which is how they are "defense-independent".
Not everyone was fully convinced that pitchers had "no" control over batted balls, however. Some guys are groundball pitchers, and some are flyball pitchers. Those do effect the number of home runs to an extent - giving up more home runs, as you might expect, generally leads to more home runs - but people still thought/think pitchers could/can affect balls in play as well. After all, some pitchers "broke" the DIPS theory by not having a BABIP in the .290-.310 range, and they did it fairly regularly. Those guys, of course, were Hall of Famers and weirdos that through special pitches like knuckleballs. This debate has led to the major nuanced pitching statistics.
Let's first talk about fWAR (FanGraph's version). It's based on FIP (Fielding-Independent Pitching). We'll get into more detail on it in a later post, but for now, we'll focus on the major theory. And FIP is a firm believer in DIPS theory, basing it's calculation strictly on Ks, BBs, and HRs. But FIP only describes the quality of pitching performance, so if we want his overall WAR, we need to also find how long he can keep that up. The more times he can replicate that performance, the better. Essentially, that means the quality of pitching (FIP) over how many innings he's pitched. Add in a park adjustment, a replacement-level adjustment, and a scale for the run environment, and we get his fWAR.
On the flip side, FanGraphs also caters to the crowd that believes pitchers do have control over sequencing and BABIP. RA/9 wins - what Brian Kenny has been screaming about all week - is pretty straightforward. It's simply the amount of runs - earned and unearned - allowed per nine innings with the run values from above added in. Basically, it's like ERA with a run value to make it a win value. BIP (Balls in Play) and LOB (Left on Base) Wins give run and win values to being good/bad at preventing hits (BIP) or runners from scoring (LOB). For the most part, BABIP and LOB% are two key spots for looking for regression, but these stats give an alternate option if nothing else. BIP and LOB-Wins add together for FDP-Wins (Fielding-DEPENDENT Pitching).
That's a bit of a crash course on those statistics, but let's recap. fWAR is based on FIP, which is based on the peripheral statistics of Ks, BBs, and HRs. RA/9-Wins gives full credit to the pitcher for the runs that score, but it adjusts those runs to a win value. BIP-Wins tells us the run value of having an above or below-average BABIP, and LOB-Wins tells us the run value of have an above or below-average LOB%. BIP and LOB-Wins add together for FDP-Wins, basically the opposite of FIP.
Moving on to Baseball-Reference, bWAR is probably a little closer to FDP than FIP. It takes the runs allowed like RA/9, but it makes an adjustment for defense, park, and opponents. So bWAR believes that a pitcher has some control over balls in play, but it does make an adjustment because ... well, defense and park factors do play a role. Mix in IP and replacement-level, and you get bWAR (or rWAR - same thing). Some will argue that bWAR gives the pitcher too much credit (or blame) for run prevention, but others would argue that fWAR sticks too closely to DIPS theory. At this point, it's a bit of a judgment call because we're still not sure how much a pitcher does or does not have control over balls in play.
Finally, we roll over to Baseball Prospectus. They use SIERA - again, we'll discuss these more in the coming weeks - which is probably a bit closer to FIP than FDP. It concentrates on the peripherals, but it also adds in effects of batted balls. It's in a complicated algorithm, but it uses a certain logic. For instance, walks aren't as bad for groundball pitchers because they can turn into double plays, but they're worse for flyball pitchers because they turn into two-run homers. That's a bit of an oversimplification, but I hope you get the idea - SIERA uses the main ideas of DIPS theory, but it adjusts for batted balls. Add in the IP and a different replacement-level - it's different than the agreed to level between fWAR and bWAR, which tends to make WARP numbers smaller than the other two because it's a bit higher - and you get WARP.
So pitching statistics have a bit of whatever you want. If you are all into peripherals, FIP and fWAR are your go-to stats. If you don't like those so much and think runs allowed is a pitcher skill, RA/9 wins and rWAR are more your style. If you like peripherals but also like a little batted ball thrown in, go SIERA and WARP. But which do I prefer?
When it comes to FIP, SIERA, and the other run estimators - estimating how many runs "should" have scored is basically what they do - they're all pretty good, and there's little difference in the overall scheme of things. I prefer SIERA if I had to go all end-all-be-all, but because FIP is a bit more prevalent and essentially as good, I will usually use it. Baseball Prospectus, for as good as their analysis is, makes using their statistics much more difficult than FanGraphs does. If I don't get a major advantage using SIERA, I'll just stick to the one that I can look up easier. For the most part, I'll just use fWAR.
What We Have Left to Accomplish
The big issue in pitching is just how much defense plays a role. At this point, we know strikeouts are good while walks and home runs are bad. But do pitchers have some control over balls in play? And how much? Is it just the extremely good and the extremely bad that have control? Or is it the extreme sinkerballers and flyballers? Until we've sorted out defense, we can't be for sure. And maybe not even then.
I'd also like to see a metric more focused on PITCH f/x. PITCH f/x has a lot of wonderful data, and I'd love to see some sort of algorithm that compared velocities, movement, etc. with location and results. I can't even imagine the breadth of calculations needed for it, but I think it would be worth pursuing.
But nuanced pitching statistics are pretty good. What they do is, at the very least, extract some of the defense and completely ignore any effects of other pitchers relieving them. You may not be fully convinced by any of them, but they do a better job than W-L records or single-season ERA (we'll discuss that issue in like two weeks). And again, what we're looking for is "better". We may eventually get to "perfect", but we're more focused on better for now. While W-L records, etc. do describe pitching performance to a degree, they include a lot more outside factors than the more nuanced stats.
- Pitcher W-L records and other everyday statistics do measure pitching talent to a certain degree, but they also incorporate large amounts of outside factors - offense, defense, and bullpen.
- Nuanced statistics focus more on DIPS Theory and using peripherals - K, BB, HR - along with batted balls to isolate pitcher performance from other contributions.
- fWAR focuses strictly on DIPS Theory and uses the peripherals only.
- bWAR uses runs allowed with adjustments for defense and park.
- WARP uses the DIPS peripherals, but they also include batted ball information.