It feels a bit odd to start an introduction with, “I’m not sure this object really needs to exist,” but, well, I’m not sure this object really needs to exist. This entire exercise is essentially being done due to demand. In reality, I’d probably rather not make this, for multiple reasons (discussed below). But, in the hopes that doing this saves time in the future by allowing people to just link to this rather than explaining a concept anew, I guess it’s worth doing.
Before getting into an actual introduction, I want to put some critical disclaimers up front. Baseball analysis, like pretty much all analysis, discussion, and thought is enveloped in a blanket of norms, and the primary norm is that nothing is authoritative just because of what it is. Instead, the usefulness and primacy of any idea or bit of information should be derived from how accurate, correct, truthful, reproducible, etc. etc. it is. To that end, I want to make it very clear that the entirety of this is not some kind of baseball canon -- it’s not a bible. If you think the contents of this exercise are helpful, it should be because the arguments are logical and therefore compelling, rather than because someone took the time to write up a thing and make it sound vaguely authoritative. So, with that said, disclaimers:
- All content here is pretty much an explanation of how I understand baseball analysis with publicly-available data as of the start of 2019. There are no pretensions in this document that this is how anyone else understands baseball analysis, nor any sense that you too should understand things exactly in this way.
- The body of baseball knowledge grows every year by leaps and bounds. While I’ve tried to emphasize comprehensiveness here, it’s possible that certain things are already outdated due to the limited extent of my knowledge. New things should be treated as any other thing: evaluated for usefulness and then either incorporated and relied upon, discarded, or put into a wait-and-see holding area. In other words, just because something isn’t mentioned here doesn’t mean it’s better or worse than things that are mentioned here.
Before charging ahead to actual explanations, a refresher on principles might be helpful. If these principles don’t seem compelling to you, it’s unlikely that the rest of this will; the whole idea of analysis is to think of things logically and systematically, so if you’re more interested in thinking of things anecdotally and speculatively, that’s certainly your prerogative. It just means you may have a harder time finding common ground with those that want to “work through” questions rather than taking a different approach.
Principle 1: Descriptive stats versus stats for forecasting. A lot of confusion tends to arise as people talk past each other regarding these two types of stats. Descriptive stats tell you what happened. Other stats don’t care about what happened, but rather what should have happened or what’s likely to happen in the future. When you say, “Batter McHitFace is a good hitter,” someone could either take this to mean that he has already had good results or that the inputs he provides to hitting will generally lead to good outcomes. Those two things aren’t the same. It’s important to clarify which of these two things a given discussion is focusing on. This document discusses both types of stats separately.
Principle 2: Crediting players for stuff they do. Baseball is uniquely suited to analysis because unlike other sports/games, all action on the field can be discretely diced up into specific events. (You probably already knew this.) Because of that, it makes sense that baseball analysis focuses on things players did/didn’t do, as well as things they may/may not do in the future. It doesn’t focus on anything else, and if that clause makes little sense to you, it’s just because it’s unclear how else to even analyze baseball if not in the context of, “A player does xyz when involved in the game.” Because of this, it’s important to separate things the player did from things he did not do. The obvious example here is RBI: when a player hits a double, what he’s done is hit a double. Whether it drives in a run or not is not dependent on him -- it’s dependent on whether a teammate was already on base. If we lose sight of the idea that players should only get credit for their own achievements, we stop being able to analyze baseball in any useful fashion.
Principle 3: Probabilistic thinking. Baseball is a game of repeated attempts to do things at uncertain likelihoods. This is enshrined even in rote, old time-y stuff like “batting average,” which is essentially a probability of a hitter reaching base via hit per at-bat. This scales all the way up and down to every aspect of baseball: because nothing is certain but things are more likely or less likely, it’s important to remember that everything is driven by some kind of underlying probability distribution and nothing is deterministic. A terrible hitter can homer off a great pitcher. An awful team can sweep a good one in a series. A slick fielder can bobble a ball. All these things just happen; the question is whether they happen rarely (in which case the adjectives used to describe the player/team in question seem reasonable) or often (in which case we should think of new adjectives). In other words, if a slick fielder constantly bobbles balls, he’s probably not that slick. If we think probabilistically, we can acknowledge that a good hitter could have a worse game/week/month/season than a bad one. If we limit ourselves only to, “Okay but here’s what happened,” we stop ourselves from being able to forecast anything, and force ourselves to backfill narratives for certain outcomes, when in reality, those outcomes are just the result of the inherent uncertainty in baseball.
Principle 4: Measurement and evidence. Let’s say you are a GM. You are debating signing a few different but similar players for a position of need. The agent for one of these players comes to you with a pitch.
“Look, buddy. You need to sign my guy, Player McVeteranFace. He’s not the best hitter anymore and he’s definitely slowed down a step or two, but he makes everyone around him better with his clubhouse presence.”
None of the other agents for the other players made a claim like this one. What should you do? You can take it at face value, and improve your valuation of that player accordingly. You can do some research, make some calls, talk to scouts and acquaintances, see if others agree with the claim, and improve your valuation of that player accordingly. Or, maybe you could go back to the agent and ask him what exactly he means by “makes everyone better.” The agent might come back and give you the names of the player’s former teammates who had career years when playing with him. Maybe that would be enough improve your valuation of the player.
Or maybe it’s not enough. Like, what does “improve your valuation” even mean? By how much? To what extent did he make players better? Did he make all players better, or just some? Are there players that got worse? How do you know that he made those players better, rather than something else? The thing is, these aren’t trick questions meant to be annoying. Each of those questions is answerable. Not easily, perhaps, but with enough effort, sure. None of the questions are asking for something that can’t be measured. In other words, “he makes everyone around him better” is a verifiable claim, and as such a claim, it should be possible to verify it. What if you’re not convinced when the agent rattles off some names of players with career years, and ask him to provide better quantitative evidence, and he doesn’t? What then? It stops seeming like a compelling case, right?
I probably could have summarized the above more succinctly, but the general idea is that it generally only makes sense to consider things that have measurable evidence in some way. If they don’t, you have no way of knowing anything about them, so it’s weird to consider them alongside things you do know quite a bit about. The good news is that there’s so much baseball data out there that you can test pretty much any hypothesis; the bad news is that explanations for things happening or not happening are rarely so simple as to be able to say, “Ha, it’s because of this one specific thing.”
This principle is discussed here mostly as a blueprint for the sorts of things that are/are not discussed in this primer. If something can’t be measured right now, that doesn’t mean it doesn’t exist, it just means there’s not much reason to discuss it alongside things that can be measured. Maybe it matters. Maybe it doesn’t. We have no way to know, so let’s spend time on things that we actually can determine the importance of.
So, with all that said, here’s how this is going to work. There will be one post per day on a baseball concept, posted at 10:00 am ET, Monday through Sunday. As is going to be mentioned in a companion post on the site today, comments are disabled on these posts, and limited to that companion comments post. Our hope is to maintain this primer in the future as the state of baseball research evolves, but for now, this is what we have. I look forward to further discussing these concepts with anyone that’s interested.