The Math Behind Wordle: Information Theory and Optimal Play

Wordle Is a Math Problem Disguised as a Word Game

Every time you make a guess in Wordle, you are running a calculation. You might not realize it — most people do not think about information theory while typing five-letter words over morning coffee — but the game is fundamentally about minimizing uncertainty. Each guess takes a pool of possible answers and splits it into smaller groups based on the color pattern. The better your guess, the more evenly it splits the pool, and the faster you converge on the answer.

I have been fascinated by the math behind Wordle since I first read a paper from MIT researchers who formally analyzed optimal play. You do not need to understand the math to play well, but understanding the principles changed how I think about each guess — not because I am computing entropy at the keyboard, but because the framework gives me better intuitions about what makes a guess good or bad.

Wordle as an Information Theory Problem

Information theory, developed by Claude Shannon in the 1940s, quantifies uncertainty. The core unit is the bit — one bit cuts possibilities roughly in half. Eight bits cuts them by a factor of 256.

In Wordle, you start with about 2,309 possible answers. How many bits do you need to identify one specific word? The base-2 logarithm of 2309 is roughly 11.2 bits. So if every guess gave maximum information, you could always solve Wordle in about 3 guesses (3 guesses can encode up to 15 bits with 243 possible color patterns each).

In practice, guesses do not give maximum information because color patterns are not equally likely. A guess like CRANE will produce diverse patterns — some all-gray, some with greens and yellows. A guess like XYLYL (a real word, a chemical group) will almost always produce all-gray because X and Y rarely appear. Both give you some information, but CRANE gives substantially more on average.

What Entropy Means in Wordle Context

Entropy measures how much a guess reduces uncertainty on average. It is calculated by looking at all possible color patterns a guess can produce, determining what fraction of answers produce each pattern, and computing expected information gain.

A guess splitting answers into 100 equally-sized groups has higher entropy than one where 90% of answers produce the same pattern and the remaining 10% are scattered across 99 others. The first resolves uncertainty reliably. The second usually tells you little, with a rare chance of being very informative.

The highest-entropy openers are SALET and SLATE, both producing roughly 5.8 bits of entropy. That means the first guess, on average, cuts possible answers by a factor of about 55 (2 to the power of 5.8). Starting from 2,309, you are down to roughly 42 on average. A poor opener like OUIJA produces only about 4.2 bits, cutting the pool by roughly 18 — leaving you with about 128 possible answers.

How Many Bits Each Guess Provides

Optimal opener (SALET, SLATE): approximately 5.8 bits
Good opener (CRANE, TRACE, RAISE): approximately 5.5 to 5.7 bits
Vowel-heavy opener (ADIEU, AUDIO): approximately 4.8 to 5.0 bits
Mediocre opener (QUICK, JUMBO): approximately 4.0 to 4.5 bits
Poor opener (OUIJA, XYLYL): approximately 3.5 to 4.2 bits

Second guesses typically provide 3 to 5 bits. By guess three, you are often in the 1 to 3 bit range because remaining uncertainty is small and it is harder to split a small pool evenly.

The theoretical minimum for solving in 4 guesses is about 11.2 bits. Optimal play extracts roughly 16.3 bits over four guesses, so there is substantial slack — which is why most strategic players solve in 4 on average.

Why Some Words Are Worth More as Guesses

A guess's value depends on two things: which letters it tests and where they are positioned. Testing common letters is more valuable because you are more likely to get non-gray feedback, and non-gray feedback splits the possibility space.

Position matters too. Testing E at the end is more informative than E at the beginning — E is much more common at the end of five-letter words. SALET (S-A-L-E-T) and SLATE (S-L-A-T-E) test the same five letters, but SALET places E at position 4 and T at position 5, while SLATE places E at position 5 and T at position 4. The positional frequency differences give SALET a tiny edge in entropy — about 0.01 bits.

In practice, this difference is invisible. You would need millions of games for it to show up in stats. But the math is clear: SALET is technically the optimal first guess by expected information gain.

The Calculation: Expected Remaining Words

For each possible color pattern a guess can produce, count how many answers would produce that pattern. Multiply by the probability of that pattern (count divided by total answers). Sum across all patterns. This gives you the expected pool size after one guess. Lower is better.

SALET: roughly 70.8 expected remaining words. CRANE: about 78.4. ADIEU: about 119. OUIJA: about 213.

These numbers are not linear with entropy because entropy and expected pool size measure different things. Entropy measures how evenly the pool splits. Expected pool size measures average remaining candidates regardless of distribution. They are correlated but not identical.

Why CRANE Beats XYLYL on Average

XYLYL contains X and Y, two of the least common letters in the answer pool. The most likely outcome is all five gray — eliminating maybe 30 to 40 percent of answers. Not useless, but not great.

CRANE contains five of the most common letters. The most likely outcome is a mix of grays, yellows, and maybe a green. The diverse pattern variety means CRANE splits the answer pool into more, smaller groups — exactly what you want.

Expected remaining words after XYLYL: roughly 430. After CRANE: 78. Same guess slot, five times the elimination power.

The Optimal First Guess Depends on Your Metric

Optimizing for expected remaining words? SALET wins — 70.8 versus 71.2 for SLATE.

Optimizing for worst-case remaining words? SLATE wins — its worst-case pattern leaves fewer words than SALET's worst case.

Optimizing for minimax (fewest total guesses needed in the worst case)? SALET wins again, guaranteeing a solve in at most 5 guesses.

The differences are marginal. Any top-10 opener is within a few percentage points of optimal. The math matters more for understanding why some words work better than for choosing between SALET and SLATE.

Why the Best Guess Is Not Always a Possible Answer

The answer pool (roughly 2,309 words) is a subset of valid guesses (roughly 12,000+). Words like TARSE, LARES, or AESIR are valid guesses that will never be answers.

Sometimes a non-answer word splits remaining candidates more evenly than any answer word, because it can use letter combinations absent from the answer pool. This matters most in the late game with few candidates. In normal mode, you can probe freely with non-answer words. In Hard Mode, they must still include all green and yellow letters, limiting their usefulness.

How AI Solvers Work: Minimax vs Expected Value

Two main approaches, optimizing for different goals.

Expected value solvers minimize the average remaining words — the entropy-maximizing approach. Over many games, this minimizes average guesses. SALET is the optimal first guess here.

Minimax solvers minimize the worst-case outcome — asking "what is the biggest pool I could face after this guess." This guarantees solving any answer within a fixed number of guesses (5 for optimal play). It sacrifices average-case performance to ensure you never need more than 5 guesses.

Neither is "better" — they optimize for different goals. Streak players should prefer minimax (bounds worst case). Average-seekers should prefer expected value.

In my play, I use simplified expected value for the first two guesses and switch to minimax thinking for guesses 3 through 6. Not optimal, but practical.

My Simplified Approach: Using Math Without a Calculator

I do not compute entropy at the keyboard. But I do use the principles the math reveals. Before typing any guess, I ask two questions: which untested letters am I testing? And how many possible answers does this help me distinguish? If the answer to the first is "none," I am wasting a guess. If the answer to the second is "one," I had better have enough guesses left for the alternatives.

No spreadsheets, no entropy calculations. Two questions that encapsulate the core insight of information theory: a good guess reduces uncertainty, and the best guess reduces it the most evenly.

The math is elegant and worth understanding. But the daily puzzle is played by humans, not algorithms. Use the math to choose a good opener and understand why some guesses feel productive. Then close the spreadsheet and play the game.