Everybody's playing Wordle. It looks as if computers should be able to play it well. But how well? And what is the best first guess?
If you have been living under a rock, or if you are from The Future when the Wordle fad has passed, here's a summary. It's a simple web page where you have to guess a hidden five-letter word. There's a different five-letter word every day, but on any given day it is the same word for everyone. So people can compare their success in guessing the word.
To play, you enter a series of guesses as to what the word might be. Each guess has to be a real five-letter word. The game shows a score for each guess. If a letter in the guess is also in the hidden word, in the same position, then that letter is coloured green. If a letter is in the hidden word but in a different place, the letter is coloured yellow. Letters in the guess that are not in the hidden word are coloured grey. For example if the hidden word is CREEP and I guess TRACE then the R of my guess will be green, because it is in the hidden word at the same position. The C and E of my guess will be yellow, because the hidden word does contain those letters, but not in the same place. So I know that the hidden word contains R, C, and E; that R is the second letter; that C is not the fourth and E is not the fifth letter; and that T and A are not present. I don't know whether any of R, C, and E occurs more than once, as in fact E does.
Old-timers like me may recognize that this game is very similar to Mastermind, a board game that was very popular in the 1970s. Mastermind is played with coloured pegs rather than letters. There is no restriction on what you can guess, like the Wordle restriction that your guess must be a word. And you learn how many pegs are the right colour in the right place but not which ones they are. The games are otherwise the same, so strategies for one translate into the other.
The aim, of course, is to guess the hidden word or code in as few guesses as possible. There is a limit on the number of guesses allowed, which is six in the case of Wordle. So how should we proceed? Before talking about that, let's look at the details of Wordle a bit more.
Wordle was created by Josh Wardle, which explains the name. Mr Wardle has made a simple game with a straightforward and elegant user interface. I'll note that (at the time of writing) there are no ads and no communication between the game and the outside world. Mr Wardle could have been tempted by the game's popularity to look for ways to make a quick buck. Instead he has deliberately created a little island of play where you are not bombarded with ads or flashy graphics, and where you can only play for a minute or two every day. I think that's a great gift to the world.
Another great thing about Wordle is that everything is out in the open. I initially supposed that when you play it in your browser, each guess would be sent to a server somewhere. The server would know the hidden word and would send back the score for your guess. But that's not it at all. The code that runs in your browser allows you to play the game without any network communication. It contains a complete list of all the possible hidden words, and it uses today's date to figure out which word is today's. So we can see that there are 2,315 possible hidden words, and if we are so inclined we can see what tomorrow's word will be, or the word for any other date past or future. There's also a longer list of words that are allowed in guesses, 12,972 in all, many of them quite obscure and some arguably not really words at all. But the 2,315 words are all common words, so there's no frustration because the hidden word is ROATE or whatever.
Since everything is out in the open, it's fairly straightforward to change the wordlists, so people have created Wordle variants in many other languages. It's also straightforward to test computer strategies against the wordlist and see which ones perform the best.
Like many computery people, I was interested in what an optimal strategy for Wordle might be and how I might try programming it. It turns out that there's been quite a lot of research on strategies for Mastermind, and as I mentioned that translates quite well to Wordle.
The Wikipedia article on Mastermind mentions an article by Donald Knuth from 1976. That was probably close to the height of Mastermind's popularity, which may have been what inspired Knuth to look at it. Or maybe it was just an interesting problem. (If you haven't studied Computer Science, Knuth's name may not be familiar, but if you have, it should be. He is a giant in the field.)
Here's how Knuth's technique might apply to Wordle. Let's say again that I have guessed TRACE and the hidden word is CREEP, though of course I don't know that. So I see TRACE: R green, C and E yellow, others grey. If I am a computer then I can just try every one of the 2,315 possible hidden words to see whether it would result in that score for TRACE. I'll determine that the ones that do are: CREDO, CREED, CREEK, CREEP, CRESS, CRIED, CRIER, CRUEL. So what should my next guess be?
Importantly, I am not limited to guessing a word that might be the solution. My strategy should be to guess the word that reduces this list of possible solutions as much as possible, and often that guess will not actually be a possible solution itself.
Let's say for example that I did guess one of the possible solutions, CREDO. Then if that is the hidden word I have won! But if it isn't, my scores will be as follows. The word on the left is the hidden word in each case, with the scores I'll get if I guess CREDO.
CREDO: CREDO (+++++)
CREED: CREDO (+++/-)
CREEK: CREDO (+++--)
CREEP: CREDO (+++--)
CRESS: CREDO (+++--)
CRIED: CREDO (++//-)
CRIER: CREDO (++-+-)
CRUEL: CREDO (++-+-)
(For people who may not be able to see or distinguish the colours here, I'm adding a code where + means a green letter in that position, - means a grey letter, and / means a yellow letter.) We can see that some hidden words lead to the same score. If the hidden word is CRIER or CRUEL we will see CREDO (++-+-). If it is CREEK, CREEP, or CRESS, we will see CREDO (+++--). So this guess doesn't do a great job of distinguishing the possibilities.
Suppose instead I guess PLIED. This is what I'll get for all possible hidden words that are consistent with what I've already seen:
CREDO: PLIED (---//)
CREED: PLIED (---++)
CREEK: PLIED (---+-)
CREEP: PLIED (/--+-)
CRESS: PLIED (---/-)
CRIED: PLIED (--+++)
CRIER: PLIED (--++-)
CRUEL: PLIED (-/-+-)
That's much better! Now, whatever score I get for my guess, I will immediately know what the hidden word is, because each possible hidden word leads to a different score.
Knuth's method, then, is to consider each possible guess and calculate the scores it would get for each possible hidden word that is consistent with the scores so far. Then choose a guess where the maximum number of scores for any possible solution is the least. That's not very easy to follow when stated in words, but it should be clearer from the examples. After TRACE (-+-//), we considered CREDO as a guess, and we saw that there is one score, CREDO (+++--), where 3 possible words could have produced it. So the maximum number of scores is 3. Then we considered PLIED as a guess and we saw that every score corresponded to exactly 1 possible word. So the maximum number of scores is 1. We can't do any better than that, so we can guess PLIED, or any other word where every score corresponds to a different possible word. (The others are EIKED, EUKED, KIDEL, LIKED, PIKED, PILED, PSEUD, SPELD. I did say many of the allowed words are obscure!)
That's basically Knuth's method. There's just one further trick: if the method produces several possible guesses, and some of them are also possible solutions, then obviously those ones are better because your guess might win immediately.
Guessing the first word
Knuth's method can be applied even if we haven't guessed any words at all yet. We can try every one of the allowed 12,972 initial guesses with every one of the 2,315 possible hidden words to look for the guesses that minimize the maximum number of words for any score. If we do this, the method will tell us that our best guess is RAISE.
The method is good, but it isn't perfect. We can instead consider every possible first guess and see how many total guesses it will need for every possible hidden word, if we apply Knuth's method for the guesses after the first. That's the sort of exhaustive search that computers are good at. It might take a day or two (it did when I tried it), but eventually it will tell you how each first guess fares. It turns out that if you start with RAISE you will need an average of 3.563 guesses, averaged over all 2,315 possible solutions. That's pretty good, but we can do better. The best word to start with is TRACE, which leads to an average of 3.493 guesses.
How many guesses are needed?
If you start with either RAISE or TRACE and apply Knuth's method, you will definitely guess the word in at most 5 guesses. Here's an example of a word where the method needs 5 guesses starting from TRACE:
(SOLEI is the plural of SOLEUS, a muscle in the leg. Obviously I knew that and didn't have to look it up in the dictionary.)
Robert W. Irving published an improvement to Knuth's method in 1979. I haven't hunted down the exact article, but from a brief summary in this article I think it is as follows. Previously, when we started with TRACE (-+-//) and considered guessing CREDO, we saw that the 8 possible hidden words produced scores as follows: 3 scores corresponding to exactly 1 hidden word, 1 score corresponding to 2 hidden words, and 1 score corresponding to 3 hidden words. Knuth's method then used that last bit of information only (3 words for 1 score) to rank this choice as 3 when looking for the lowest-ranked choice. Irving instead considers the expected number of hidden words that remain possible after guessing CREDO. If the hidden word is CREDO, CREED, or CRIED, the number of hidden words possible after guessing CREDO is 1, since those all produce unique scores. If the hidden word is CRIER or CRUEL, the number of hidden words possible after guessing CREDO is 2, since those words both produce the same score for CREDO. If the hidden word is CREEK, CREEP, or CRESS, the number of hidden words possible after guessing CREDO is 3, since those words also all produce the same score as each other. So if we summarize the number of possible scores for each hidden word, we get CREDO(1), CREED(1), CRIED(1), CRIER(2), CRUEL(2), CREEK(3), CREEP(3), CRESS(3). The expected number of hidden words that remain possible after guessing CREDO is then the average of these numbers: (1+1+1+2+2+3+3+3)/8 = 16/8 = 2.0. Similarly if the guess is PLIED again then the expected number is (1+1+1+1+1+1+1+1)/8 = 8/8 = 1.0.
The two methods produce the same results for this example, but if we pit them against each other for all 2,315 words, with TRACE as the initial guess, we find that the average number of guesses is 3.444 for Irving versus 3.493 for Knuth. So Irving is a bit better. It isn't better in every case: of the 2,315 words, Irving does better for 561 while Knuth does better for 418. But Irving is better on average.
Earlier we used Knuth's method to determine a first guess, and got RAISE. We saw that TRACE is actually better than that. If we use Irving's method to determine a first guess, we get ROATE, which is an older spelling of ROTE and also an acronym for “return on average tangible equity”. However, TRACE is still better, and indeed is still the best starting word for this algorithm too.
Neuwirth's entropy-based method
A 1982 paper by Erich Neuwirth (which I have not read) apparently suggests using the concept of “entropy” from information theory. One way to look at entropy is that the more of it you have, the more information. Since we're trying to choose the guess that will give us the most information, it makes sense that we would try the one that gives the greatest entropy. Without entering into the mathematical details, I'll say that I tried this, and indeed it gives slightly better results. The average number of guesses is 3.493 for Knuth, 3.444 for Irving, and 3.435 for Neuwirth, still starting with TRACE. The first guess determined by Neuwirth's method is SOARE (an obsolete word for a young hawk) but again TRACE does better. With this method, CRATE does slightly better than TRACE.
An optimal method
In principle the computer could compute the results of every possible guess at every possible stage of the game and use that to determine the best sequence of guesses for every possible hidden word. That would lead to what we in the business call a “combinatorial explosion”, meaning that there would be so many possibilities to consider that it could take years if not millennia. Various optimization techniques could probably be applied to make it tractable (taking days or weeks, say), but I'm not really motivated to do that. This paper by Geoffroy Ville gives lots of background on the techniques he used to make an exhaustive analysis of Mastermind with 4 holes and 7 colours, which has 2,401 possible codes, compared with the 2,315 hidden words for Wordle. So it may be doable. But, the 12,972 possible guesses might mean this is still out of reach.
Which is the best first guess for people?
We've seen that TRACE is the best first guess for the Knuth and Irving methods. It might also be the best first guess for the optimal method. But is it the best word for people?
There are a few reasons why it might not be. First, the computer has an unfair advantage, namely that it knows the exact list of 2,315 possible hidden words. Earlier we saw a list of 9 words that are all allowed guesses and that can all be used to the same effect after a certain score for TRACE: EIKED, EUKED, KIDEL, LIKED, PIKED, PILED, PLIED, PSEUD, SPELD. Which of those might be the solution to a Wordle game? Probably not SPELD (a splinter) or KIDEL (a fishweir, not that that helps). But you might be surprised to learn that only PLIED is possible. Not LIKED or PILED?
Second, computers are very good at trying every possibility to see what works. People not so much. For example, it's probably much easier for most people to think of words that do contain certain letters than words that don't. So that suggests that people may prefer a strategy that allows them to find the most letters that are present, rather than one that technically allows them to reduce the number of possible words but doesn't make it easy to see what those words are. The commonest letters in the possible solutions are EAROTLISNUCYHDPGMBFKWVXZQJ in that order, so TRACE has 4 of the 5 commonest, but then C is far behind. ROATE, which we saw earlier, is in fact the best word to tell you the most letters on average, and the most green letters if possible.
Finally, people are not necessarily trying to get the lowest average number of guesses. If you follow an information-maximizing strategy you will almost never succeed in two guesses, because your second guess usually serves to refine the information you got from the first one. If your second guess is instead a word that might be the solution then you might end up taking more guesses overall, but you might also get the solution in two guesses, which is bragworthy. I played this a while ago:
There's an I, and it's almost certainly the fourth letter, so I made a guess straight away, which turned out to be right. The entropy-based method would have played:
Impressive in its own way – who'd have thought of CLAPT, an alternative to CLAPPED? And CLAPT allows this method to get 11 of the 14 possible solutions on the next word (MANIA, MANIC, and MAXIM take one more guess). But getting the solution in 3 guesses is not that special.
Summary and code
Computers can solve any Wordle puzzle in at most 5 guesses and on average about 3.4. The best initial guess for computers is TRACE, though it might not be for people. (The worst initial guess is QAJAQ, an alternative spelling of KAYAK, for computers and almost certainly people too.) To minimize the average number of guesses, computers and people should try to extract the most information possible with each guess. That may mean guessing a word that you know cannot be the solution but that will tell you more about which words might be.
I put the code I used for the experiments mentioned here in this GitHub repository.