If a team projected to win 72 games, starts the season winning 11 of their first 14, are they more likely to win 61 games or 65 games the rest of the way?
Now many may recognize this mystery team as the Florida Marlins (who now hold a 11-4 record), who have surprised many by jumping off to an 11-3 start. Now this is obviously a small sample size of the full regular season, and most people would agree that they are unlikely to end the season with the same .786 winning percentage, good for a whopping 127 wins. However, are these early wins significant in that they will likely lead to a final record featuring more than 72 wins, or are these wins completely statistically irrelevant and the Marlins continue on pace for 72 wins? Before we can answer that question, let’s review dependent and independent probability.
When you flip a coin, each subsequent coin flip is independent of all the coin flips that preceded it. When you flip a coin, you have a fifty-fifty chance of seeing heads. So when you begin a series of 50 coin flips, you would be projected to see 25 heads and 25 tails. However, if your first ten flips all come up heads, you are no longer expected to see only 25 heads at the end. That’s because your next 40 coin flips are independent of your first ten flips, so each of those flips still has a fifty-fifty chance of seeing heads, leading to a projection of 20 heads over the final 40 flips. So your projection of seeing heads over 50 coin flips when you know that the first ten all landed heads, will not be 30, your 10 head flips that already happened, with 20 more projected head flips to come.
Monty Hall Paradox
The Monty Hall Problem was famously posed in Parade Magazine:
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
The answer surprised many, because it is to your advantage to switch your choice, decidedly so. Switching your choice increases your chances of selecting the right door by a full 1/3, to 2/3 chance.
Now there is an important detail missing from the problem, the host knows which door the prize is in. Therefor each event, the opening of the doors, is dependent on the events preceding it.
When you first select a door, there are three choices, therefore your chance to select the prize door is set at 1/3. When the host goes to open a door, he knows which ones have the goats and which the prize. Therefore he will always open the door with a goat in it. Now you are left with two doors, one with a goat in it and one with a prize.
Now you probably think that each door holds a 1/2 chance of being the prize door, since there are only two doors. And you would be correct, if you hadn’t started with three doors and the opened door was not dependent on the other doors. But instead the host does know which door held the prize, and he specifically selected a door that did not hold it. Thus the revelation of information to you that was already known does not change the initial odds for the door you selected.
Instead, the door you selected remains with only a 1/3 chance of holding the prize. However, the odds for the door you did not select did. Initially there was a 2/3 chance that one of the other doors held the prize. That is still true, however rather than those odds being split between two doors, only a single door holds those odds. Thus the single door that you did not select now has a 2/3 chance of holding the prize, and it is a good idea to change your selection. Feel free to read more about the Monty Hall Paradox.
Back to Baseball
Now back to the Marlins. If they are projected to win 72 games and they start the season 11-3, are they still projected to finish the season with 72 wins, or should that projection raise to 77 games. Are future baseball game outcomes dependent or independent of past game outcomes?
Now there is a chance that future games are are slightly dependent on past games. A winning team further down the season will likely get challenged a bit harder, especially by fellow contending teams who know that a win will swing the standings two full games. A winning team will also have lower waiver rights, and may miss out on some team dumping that situational lefty they badly need. On the other hand, opposing teams may letup against a weaker team, maybe resting a regular or two, or just subconsciously slacking in their play. However, those factors are relatively small in the grand scheme of things, and the outcome of a game in April really doesn’t affect the outcome of a random game in July.
Another key point to realize is that when a team is predicted to win 72 games, that is a projection, it’s a guess. It may be an estimated guess but it is not made with any actual knowledge of the future–that is until Nate makes a few updates to PECOTA–but for now the final record of a team is conjecture. There is no host opening a door that he knows does not contain a prize like in the Monty Hall Paradox. No one knows that the Marlins will definitely win 72 games.
This means that the next 148 games of the Marlins are more or less independent of the past 14 games. The Marlins current 11 wins do not affect any future wins or losses. Thus since the Marlins were thought of as a 72 win team with a .444 winning percentage, they should still be considered a .444 team. That doesn’t mean they will be projected to finish the season with a .444 winning percentage, but that they would be projected to win the next 148 games at a .444 clip, good for 66 more wins and 77 total wins for the season.
Let me further explain this from a more extreme real life example. Consider Chien Ming Wang. He’s thrown perhaps the worst six innings in baseball history in his last three starts. In this six innings Wang has given up 23 earned runs to a tune of a 34.50 ERA. Now Pecota predicted Wang to sport a 4.28 ERA at season’s end. Wang’s next start is being skipped over due to an off day, and he’s been sent to Tampa to work on his mechanics with the possibility of a DL stint.
Let’s say that he comes back from that with either his unknown injury fixed, or they found a flaw in his mechanics and corrected it so his sinker sinks again. So now he comes back as the same pitcher Pecota predicted to finish the season with a 4.28 ERA. Except now thanks to the six innings and 23 earned runs that already happened, he’s just not going to finish the season with that 4.28 ERA. Just because he threw the worst six innings in baseball history to start the season, doesn’t mean he’s going to turn into an ace and throw a 1.50 ERA the rest of the way to finish the season at a 4.28 ERA. No, instead he’s going to be that 4.28 ERA pitcher that Pecota predicted for the rest of the season, which then added with the six innings and 23 earned runs that already happened will mean that his final season line will be decidedly worst than a 4.28 ERA.
Now there may be a caveat, depending on the variables taken into account by the projection system. Part of that original 72 win projection may have accounted for the Marlins playing against the pitiful Nationals 19 times this season. Their past 14 games include 6 of those 19 games against the Nationals as well as two against the Pirates. The remaining 148 games may include a slightly tougher projected strength of schedule. Taking strength of schedule into account, the Marlins may have been expected to go .500 (pulled that number out of thin air for this hypothetical) with 7 wins over the the first 14 games of the season, and .439 with 65 wins the rest of the year. The actual record of 11 and 3 would not change the projected 65 wins and .439 winning percentage the rest of the year, leading to a final projection of 76 wins and a .469 winning percentage.
While early season blips like 11 wins in 14 games, a batter hitting .400, or a pitcher allowing only a run or two over the first three starts may be prime examples of small sample sizes at play, those results are not completely insignificant for final records or totals. If you thought a hitter would hit .300 for the season and he’s now hitting .400, he’ll still likely hit .300 for the rest of the season ending the season over .300. Players and teams will regress back to average, that hitter won’t hit .400, the Marlins won’t win 127 games and Chien Ming Wang (hopefully) won’t break the record for allowing the most earned runs in a season, but that doesn’t mean the regression will go all the way back to average either. What happens in April doesn’t stay in April, early season results matter, whether they are good or not.