Optimization Pressure
From Games to Goodhart's Law (with a bonus about math contests)
All of the following have something in common:
The Civilization computer games
High School Debate
The U.S. News college rankings
Quiz Bowl
Scrabble
What is that thing? Briefly, it’s that all of them have their character transformed by optimization pressure. Like metamorphic rock, the nature of something at the beginning can be quite different from what it becomes after undergoing this pressure. This transformation is the commonality between Goodhart’s law (”When a measure becomes a target, it ceases to be a good measure”) and Soren Johnson’s dictum that “Given the opportunity, players will optimize the fun out of a game.”
Every game, competition, or metric can be an optimization target. The more the players take the target seriously, the more they optimize for it. Sometimes this optimization is benign — the players just do the same thing as before, but better, because they spend more time becoming good at it. But in many cases, the actual character of a game or competition, or value of a metric, varies quite a lot depending on how much the participants are optimizing.
For simplicity, I’ll talk about the participants in all of these as “players” who are playing a “game”. It is, after all, no accident that we refer to the subverting of metrics by optimizing for them in a perverse way as “gaming” the system.
Three Levels of Play
While the amount and type of optimization that can be applied to any game forms an entire spectrum, it seems to me that we can usefully divide it into three separate categories.
At the lowest level of optimization, we have Playing for Fun. Those who play a game for fun are not really attempting to optimize against the metric. They might want to do well, but they don’t really act as though that is their goal. They will do whatever seems fun to them, regardless of whether it actually helps them win. (This doesn’t mean they won’t be bitter if they lose, of course, but at least in some domains, many such people don’t really care.)
At the next level, we have Playing to Win. People who play to win make the objective of the game their goal, at least while they are playing. They will think strategically, act in disciplined ways, and may do some outside-the-game strategizing and preparation. They act, in other words, as if they have something at stake in the outcome.
At the final level, we have Winning at All Costs. This level of optimization extends far beyond the game itself. To such players, winning is not merely a goal within the game, but a life goal; they train and practice extensively to eke out any edge, or they maximally exploit the rules to gain an advantage in the game.
It’s worth pointing out that none of these levels of optimization is a priori better or worse for a game, competition, or ranking metric. There are, in fact, competitions where competitor behavior falls very close to the winning at all costs end of the spectrum, where the results are meaningful and participants enjoy the competition. And there are plenty of “casual” games which are designed primarily around players who are merely playing for fun. But for most competitions, nearly all metrics, and many games, the sweet spot is when play occurs at the playing to win level, and breaks down the further typical behavior deviates from that level.
Some Examples
As an illustration, consider the Civilization games — one of Soren Johnson’s examples in his original essay (he is primarily famous for being the lead designer on games in that series). As in many strategy games, playing for fun is actually not very fun. If you are not giving a decent amount of attention to your strategic play, you will lose badly (unless you give yourself tons of arbitrary advantages), and you will also miss out on much of the point of the game. On the other hand, when subjected to too much optimization, many such games end up with degenerate, unfun strategies being dominant; one example is the “Infinite City Sleaze” of early Civilization games, where the best strategy was to build as many settlers as possible and found as many cities as possible as close together as possible, ending up with a huge grid of as many as a hundred or more tiny cities. This strategy made beating high difficulty levels easy, while being completely unfun for most players.
In a primarily single-player game like Civilization, one could ask why players would engage in this kind of optimization to the point where they find the game unfun. And in fact, many players don’t. But some, including many of the most dedicated players, do, because it’s very natural behavior to try to do something as well as possible — and the rules of the game determine that “playing well” amounts to deploying such degenerate, unfun strategies.
It isn’t always the case that the transformation of play due to optimization pressure is bad. Sometimes, the only players who adopt a winning at all costs approach to a game are exactly the ones who find doing so fun, even though most players would find the resulting gameplay to be tedious or frustrating. A great example of this is video game speedrunning. Most people would find the idea of speedrunning a game like Stardew Valley to be an exercise in missing the point; but there’s a dedicated set of players who seem to greatly enjoy engaging speedruns and challenge runs (explicit optimization challenges where something other than completion time is the target), even though the strategies they use are frequently degenerate and bizarre compared to typical gameplay. 1
On the other hand, Civilization and Stardew Valley are both primarily single player games, engaged in purely for the entertainment of the individual player. Things become messier when the players are in competition with each other, and at least some of the players value not just the gameplay itself but the outcome. In this case, the optimization pressure can transform the game not just for the players who choose to employ a win at all costs level of optimization, but for their competitors (direct and indirect) as well.
Take the board game Scrabble. Casual players, playing merely for fun, will often just play words that they find interesting, or try to find high-scoring plays from the letters currently in their rack. The winner between such players frequently comes down to luck, with a bit of a bonus for having a larger vocabulary and better pattern-recognition. A player who is playing to win will approach the game more strategically: aggressively exchanging their tiles when no high-value plays are available, saving key letters for future plays, and sculpting their hand to maximize the chance of getting the extremely valuable bonus for playing all seven tiles at once. It’s fundamentally the same game, but even still, many casual players would get frustrated playing against a player who is playing to win.
But a highly optimized winning at all costs approach looks very different. Any serious player will have memorized the list of Scrabble-legal two-letter words (did you know that “mm”, “ka”, and “oe” are all allowed?), and many more besides. No need to actually know the meanings, of course — it’s not that uncommon for Scrabble tournaments in some languages to be won by players who don’t even speak the language. And there’s no need to play only letter combinations that you know are words, either; if you can pull a fast one and your opponent does not challenge your non-word, you get the points just the same. I suspect that most people would find playing against such players to be a frustrating experience.
Still, in the case of Scrabble, you can avoid playing in tournaments if you don’t want to engage in this kind of gameplay. Scrabble is primarily a game that friends or family play together, and not many put a lot of value on winning Scrabble competitions. Even though competitive Scrabble has effectively become a different game than kitchen-table Scrabble, few people mind.
This is not the case for some other activities, which exist primarily as competitions. This category includes things like Quiz Bowl, Debate, and other activities which mostly consist of high school or collegiate tournaments. Some debate formats, in particular, seem to have largely lost their original purpose; rather than functioning as a contest of reasoned arguments, they have degenerated into the execution of purely exploitative strategies that make a make a mockery of true intellectual exchange. Quiz Bowl does not suffer as badly (I don’t think), but at a high level, doing well is less about a deep knowledge base, and more about memorizing frequently-asked-about topics, people, and characters, and guessing quickly where the question is likely to be going.
In both these cases, it’s practically impossible to really participate without being impacted by this degeneration under optimization pressure. Almost nobody does Quiz Bowl or Debate outside of a tournament format, and participants generally actually care about winning. If you just want to play to win, without engaging in the dominant, degenerate strategies, you are at best going to be at a substantial disadvantage, and at worst going to be effectively locked out of meaningful participation.
The worst outcomes are associated with metrics with real-world import that are subject to these transformative pressures. This is classic Goodhart’s law, and the U.S. News college rankings are a good example. It’s unclear whether the original scheme was actually a reasonable metric. What is clear is that “gaming” these metrics — that is, adopting a win at all costs attitude towards reaching as high a ranking on them as possible — has led to perverse behaviors (such as trying explicitly to decrease acceptance rate) and degeneration of the value of the ranking, as it more and more represents success in targeting the ranking metric rather than college quality.
Transformation vs. Barriers to Entry
It’s worth making explicit that there are two separate effects that impact games, competitions, and metrics that are subject to optimization pressure. The first is the one I’ve been talking about above: optimized winning at all costs behavior is sufficiently different in character from less optimized playing to win behavior that it effectively transforms the game or competition or subverts the value of the metric. The other is when investment in optimization has such good returns to performance that players who do not invest substantial resources into winning at all costs are at such a disadvantage that they are effectively excluded from high level competition.
These two effects need not always co-occur. In many sports, for instance, there is a very large return to intensive training — the winning at all costs approach in this context — which excludes from high level play nearly anyone who does not engage in it. But a sport played at a high level is (usually) recognizably the same thing as a sport played at a lower level, and retains its good qualities.
Conversely, one can imagine a degenerate, optimized strategy for a (competitive) game which is only marginally better than a less-optimized and less investment-heavy strategy that engages with the game as intended. In that case, the fully optimized gameplay is very different from intended play, but not engaging with it provides only a minor disadvantage, and so players who don’t want to invest the time, or warp their play, can still compete.
Either of these effects alone might, or might not, be deleterious. The barrier-to-entry effect alone is only a problem if there are no “amateur leagues” where lower-investment players can still participate, or if the competition exists for some additional purpose which is not well-served by limiting the high ranks to people who devote an inordinate amount of time and effort to it. The transformation effect alone is more likely to be a problem (because people will do lots of things for just a tiny advantage if they care about the outcome), but as long as the mere existence of a cohort of degenerate-strategy players does not make the game (or competition) unrewarding for others, it can be worked around.
The real trouble comes when employing highly optimized strategies both provides a substantial advantage, and also transforms the game, competition, or evaluation in a deleterious way. The designers of games and metrics, and organizers of competitions, must keep this in mind if they want their products to remain valuable.
Dealing with Optimization Pressure
Most games, competitions, and target metrics are designed, either implicitly or explicitly, with a moderate level of optimization in mind. That is, they are designed around people playing to win. But when people find winning valuable enough, they will no longer merely play to win; they will try to win at all costs. This must be taken into account.
There are, roughly speaking, three options for dealing with this.
First, you can segment your audience. That is, try to divide those who approach the game or competition at different optimization levels into groups that generally don’t need to interact with each other. This can work for single-player games, competitive games where the fun of the gameplay (rather than being good at the game or winning) is the dominant motivation for most players, and many sports. It doesn’t always succeed, though, because in many cases you can really only sort by skill rather than by optimization, and a player who is highly skilled but not optimizing hard can end up competing with the win at all costs players, to their frustration if the hyperoptimized gameplay really is degenerate.
Second, you can accept that the most highly optimized version is now the actual game; that winning at all costs is the dominant approach, even when that approach fundamentally changes the game. This is, of course, disastrous when it comes to metrics — here we have Goodhart’s law in action. Giving up like this is generally a bad plan for games and competitions as well. Unless you are very lucky, players will have less fun, and competition outcomes will diverge from their original purposes.
Finally, you can actively attempt to design the game, competition, or metric to be resistant to optimization pressure. As noted in the previous section, optimization pressure need not always cause a serious problem. For instance, you can try to minimize the extent to which hyperoptimized strategies diverge from the intended good strategies. This is a very difficult problem, but in some domains, and with enough attention, it’s possible to solve it well enough. When game developers patch their games to buff or nerf certain strategies, this is what they are attempting to do. Alternatively, you can try to minimize the returns to hyperoptimization relative to normal engagement. This is what test makers who want to minimize the trainability of their tests (such as designers of aptitude tests like the SAT) are attempting to do.
The third option is by far the hardest, and not the default approach. Nevertheless, it is often necessary. If you find yourself in charge of developing a high-profile game, running a serious competition, or designing a high-stakes metric, you need to be aiming to make it resilient against transformative hyperoptimization. Otherwise, only blind luck will keep it from being subverted.
Optimization Pressure on Math Competitions
I’m concerned about math competitions.
The current reality in the United States is that math competitions play three critical roles in the lives of mathematically inclined pre-college students, and two of them subject the competitions to a substantial amount of optimization pressure.
First, math competitions are one of the few ways that students get exposed to interesting mathematics. Most curricular mathematics in US primary and secondary schools is, frankly, rather boring. Certainly it does not evoke the kind of joy and sense of fun and beauty that math lovers want; exercises are routine, repetitive, and have none of the thrill of discovery that motivates mathematicians. For many students, math competitions are their first indication that there is something to mathematics beyond doing the even numbered problems between 16 and 50 of this week’s textbook section for homework. And for some of these students, math competitions will be their primary source of these interesting ideas and problems until they get to college.
Second, math competitions give mathematically talented students a “sport” of their own. There are few venues where such students can exercise their skills and compete against others, and math competitions fulfill a similar role as sports do for the athletically oriented. Of course, like sports, this means that people will train specifically to do better in math competitions: our first contribution to optimization pressure.
Third, math competitions are one of the very few ways for talented young mathematicians to legibly demonstrate their aptitude. The usual standardized tests (such as the SAT, AP exams, and the like) simply do not have a high enough ceiling, and are also testing something quite a bit removed from mathematics as done by mathematicians. Taking college classes is not a consistent signal, and is not available to many students. Math Olympiads (and even lower-level competitions like the AMC and AIME) have a much higher ceiling and correspond much more closely with real mathematics than almost anything else most talented students have access to. A solid performance on the USA Math Olympiad can establish someone as one of the top hundred or so mathematics students in their graduating class in the country; there is almost no other accomplishment that can do that. And with this comes the biggest source of optimization pressure: college admissions. If good performance on math competitions is a massive leg up in getting into a top college (and it is), you can bet that many students (and their parents) will be looking for every way to maximize their performance on math competitions.
So math competitions have all the warning signs of being subject to large amounts of optimization pressure. They also — by virtue of being a metric, a way of legibly demonstrating aptitude — stand to lose a lot of their value if the optimization pressure produces fundamentally different behavior in students than merely getting better at math. After all, the goal is to reward mathematical problem-solving ability in general, not the narrower ability to solve the specific problem types that appear on the exam.
Moreover, it appears that competitors have gotten much better at math competitions in the last two decades. Scores on the AIME are up quite a lot (as I observed in a previous piece), and subjectively, it seems that many current students are much more well-versed in problem-solving techniques than I was. So it appears that the optimization pressures have, in fact, had a substantial effect. The question is: what kind of effect?
There are two things to look out for. First, is hyperoptimization distorting the competition landscape in such a way that top performance on contests corresponds more to mastery of some set of skills that is so narrow or divergent from general mathematical ability that the top performers are no longer actually among the best young mathematicians? Second, are the returns to contest-specific optimization such that students who don’t train much specifically for competitions (and instead focus their efforts on learning math in general) are at a massive disadvantage compared to equally talented students who focus on contest preparation?
I can’t answer these definitively, but as far as I can tell, the answer to the first question is in the negative. Contra the common accusation that people who do well on tests are merely good at taking tests, it appears that people who do well on math competitions really are quite good at math in general, and not just at doing the specific things needed to do well on competitions. In this way, at least, math competitions as signal of talent have not succumbed to Goodhart’s law.
About the answer to the second question I am much less certain. It certainly seems that current top performers on math competitions spend much more time, and at much younger ages, training for the competitions than I and others did twenty-odd years ago. On the other hand, it’s not clear exactly how much advantage this gives them over someone who just does a lot of math without specifically preparing for contests that much; or whether they are truly stuck in a Red Queen’s race competing for the top positions, to the detriment of their general mathematical development. And it’s hard to know how to separate the effects of more contest-specific training from the generally better availability on mathematics resources.
I don’t know what steps the major math competition organizers are taking to protect against these pitfalls. Whatever they are doing appears to mostly be working, but the pressures are large, and the situation seems precarious. Given the important role that math competitions currently play in the lives of these students, I really hope they succeed in keeping math competitions about mathematics.
I can’t resist a digression into the absurdities of highly optimized Stardew Valley play. A couple years ago, some players attempted a challenge run where the goal was to maximize the amount of cash on hand (net earnings) at the end of one in-game year. The strategies looked nothing at all like normal gameplay. See, the most efficient way of making money in the game is to buy seeds for the most expensive crop, Starfruit, grow as much of it as possible, and put it into kegs to make Starfruit Wine. But doing this requires a lot of resources, including money for the seeds, wood, metal bars, and oak resin for the kegs, and rain totems to force it to rain every possible day in summer (the season when Starfruit grows) because hand-watering is prohibitively time-consuming. Producing oak resin requires tappers on oak trees; the trees must be planted from harvested acorns, and the tappers made from metal bars and wood. All the metal and wood can’t be effectively gathered; it must be purchased (and smelted, in the case of metal) in absurd quantities from in-game vendors; this also requires lots of money. So how does one get the money for all this? The second most efficient way of making money in the game: diving deep into the late-game mining cavern area, mining iridium (not with a pickaxe, of course, that’s too slow, but with bombs and explosives that can be put in a slingshot, all of which must themselves be crafted), smelting it, and selling the bars. Also, the rain totems can’t be crafted with readily available resources, so they must be gotten via random drops from treasure chests in these same mines. So the player must get to this late-game area as soon as possible — in the first 10 in-game days — so as to have a chance to start harvesting and building up resources and all the ways to process them, to fund and enable all the other stuff. And even the farming part is stupidly optimized: to be able to plant seeds quickly enough, the farm must be primarily cleared and tilled with bombs, then tilled (the remainder), fertilized, and planted by carefully stepping square by square and repeatedly pausing the game between actions to make in-game time pass more slowly. The whole situation is just absolutely insane and entirely divorced from the chill, casual play that the game is usually associated with. But people had fun with it!

