This post is also available in Dutch.
For a few weeks now, I’ve been obsessed with the game Semantle. The goal is to guess a secret word daily based on its meaning (also called semantics). With each guess, the game indicates the extent to which your guess has a similar meaning to the answer, using a score between -100 (your guess is very different from the answer) and 100 (your guess is the right answer). For example, if I were to type in my guess dog now, that word would get a similarity score of 2.20. This is quite low, so dog is not very similar to the answer. But how does a computer know how similar two words are?
How is the similarity calculated?
To calculate how similar two words are, the game uses an algorithm called Word2Vec. The main idea behind Word2Vec is that you can infer the meaning of a word from the words with which the word is often used together, just as you could assess someone based on their friends. Words with the same “friends” are likely to have similar meanings.
The Word2Vec algorithm uses a large number of texts to find connections between words and their “friends”. Each word is “translated” into a series of numbers based on the connections found, since computers cannot calculate with words, but only with numbers. Then the similarity between two words is calculated just like the difference between numbers: the closer together the numbers, the more similar the words.
Applications beyond the game
This particular way of making words understandable to computers was obviously not merely invented for this game, but has all kinds of useful applications. For example, it is used to improve computer translation software such as Google Translate as well as spam filters. What is also interesting is that based on these calculated similarities you can also predict language phenomena, for example, to what extent a word like cat activates another word like dog in our mental lexicon (the dictionary of our brain). It is a fascinating question whether the way computers learn and store the meaning of words might be similar to how our brain does it!
Smart Semantle tips
- Semantle’s algorithm learned the words and their relationships by reading newspapers, so typical newspaper words like politics or policy are often a good guess.
- The type of word also plays a role, e.g., a noun or a verb, because verbs often occur in similar contexts to other verbs (“I enjoy _”).
- Antonyms (e.g., hot and cold) have similar meanings to the algorithm because they occur in similar contexts (“The tea is too _”).
It’s a difficult game, so if you still can’t figure it out despite my tips, you can always check the Reddit forum for hints 🙂
Image by Marlijn ter Bekke