The challenges with synonyms in search and how to solve them
Why the meaning of words matters more than how they're spelled.
Table of Contents
In this blog post we provide an overview of why synonyms in search matter so much, what some of the major challenges are and how to get synonym configuration right.
If you are only curious about how to solve the challenges then jump to the last paragraph, or visit this page to have an idea about what vector search can do for you.
What are synonyms and how do we use them in search?
The Cambridge Dictionary defines synonym as “a word or phrase that has the same or nearly the same meaning as another word or phrase in the same language.” Simple, but less so when it comes to search.
In most cases, people have no idea how the information they’re looking for is presented in a database. Imagine wanting a broken window repaired and having to search for the very specific phrase, “glass repair service”. A good search engine should allow you to search for “window repair” and surface the right results regardless of exact matches. Handling synonyms is evidently useful, but doing so with traditional keyword matching can be a painful process.
Semantic search on the other hand seeks to understand a user’s intent by matching synonyms using contextual meaning. In the following sections, we’ll discuss why implementing an AI-powered semantic search engine (akin to Google) is the way to go when you want to effectively handle synonyms.
Why is it so important to handle all synonyms?
People search because they want to find things and they want to find things that are relevant to what they’re looking for. It sounds obvious but doing so with speed and accuracy isn’t an easy thing to achieve.
Most people only use 2000-3000 words out of the English language, so even asking users to try every possible synonym they know of would only get you so far. People often do synonym search. If a word has multiple synonyms it would be an arduous, time-consuming task to try them all and not an onus we should place upon users.
Content is written with jargon
Finding relevant information written with jargon is another example of users struggling to search. In such instances, being able to handle synonyms is paramount for a search engine. An estimated 70% of site search tools only allow for searching in the website’s own jargon, making users’ lives very difficult.
The term “bank guarantee” is a phrase specific to the industry, for example. A guarantee indicates that a bank is willing to offer you a loan and it would be natural to lookup “bank loan” if you were interested. A good search engine should be able to surface information regarding “bank guarantee” with such a lookup given its strong relevance to “bank loan”.
What’s so hard about synonym configuration?
Let’s take the example of “search” and “explore”. Users will want to be able to search these words interchangeably. In that case, we want our search engine to be capable of identifying these words using the same semantic meaning. “Search” has 8 synonyms, so to identify and operate with the meaning of all search synonyms, we’d need to consult a thesaurus and hardcode 8 rules that specify the connection between all synonym for search.
There are an estimated 171,000 words currently in use in the English language! Most words have several synonyms, making it an unreal effort to hardcode every rule and connection between each one.
Too many permutations to configure
We can easily see that implementing ‘thesaurus logic’ for most of the English lexicon becomes overly complicated and quickly fails. The permutations become too numerous, searching gets even more complex and performance drastically worsens.
When you implement such an approach you also need to maintain it and update the rules behind its hardcoded logic every time a new entry is added to the database. Your reward is more development effort, less accurate search results and increased time to value for your users.
Contextual meaning further complicates
The situation gets even more complicated if we consider that words can have different meanings in different contexts and phrases. A search engine shouldn’t always consider synonyms actually synonymous. For example, “smart” and “clever” in the context of a “smart student” or “clever student” are synonymous, yet “clever” cannot be used as a synonym in the case of “smart contracts”.
Fun fact: According to Guinness World Records, the English word with the most meanings is the verb ‘set’. You’d need 60,000 words to describe its 430 different meanings!
Typos multiply the scope of the challenge
Perhaps you’ve spent the time to integrate a thesaurus approach and even logic to handle contextual understanding. To top it off you’ll need to throw in typo handling. In a practical sense, typos are also synonyms in that they’re words/phrases with the same meaning as another.
It would be ridiculous to include every permutation of typo (i.e. relevanc, relevants, eelevance…) so you’re left with distance algorithms, complicated string manipulations and unwieldy RegEx. Combining that with hardcoded thesaurus logic quickly (or slowly, if you actually do it!) exposes the challenge of working with synonyms.
Synonym configuration taken care of
Besides the world being Cadbury, wouldn’t it be nice if you didn’t have to think about all this? Wouldn’t it be nice if this was the last article on the challenges of synonyms you had to read? Wouldn’t it be nice if the blog you were reading had a solution to it all?
No more synonym search with thesaurus. Relevance AI’s Qualitative Cloud provides synonym-capable search in just minutes of configuration. Synonym configuration is taken care of automatically. Typos are handled out of the box and its ability to understand synonyms is so capable that it even returns metonyms— words and phrases with the same connotation. Explore the Qualitative Cloud yourself or get in touch to learn more.