State of the art e-commerce search: the Taobao case study

  • How search drives half a billion dollars in GMV
  • What techniques they use to achieve this
min read

The Taobao case study at a glance

Taobao reports, that search is the most important and influential components of the user experience on its platform. We infer that this is true for e-commerce in general.

According to Taobao, the great user experience enhanced with the Taobao vector search engine significantly contributes to converting website visitors to buyers, substantially increasing the company’s revenue.

Based on Taobao’s experience, the combination of traditional keyword search and vector search (aka contextual search and natural language search) provides the best semantic search results in e-commerce.

Due to the length and context of product titles and descriptions in e-commerce, text search is just one part of the solution. Vector image search and visual search are necessary to provide a state-of-the-art search experience to users.

Why the Taobao case study matters

The purpose of this case study is to provide an insider perspective to decision makers about the benefits of vector search systems in e-commerce. Through the use case of one of the most successful e-commerce companies in the world, we argue that vector based systems are the state-of-the art solutions to search.

It has profound implications on the top line metrics of the business. Today, this technology is available for any organisation, independent of its size and wether internal machine learning capabilities are available. Open semantic search is here and provides the right solution to semantic product search in e-commerce.

Taobao is eating e-commerce

The Taobao Marketplace was founded by Alibaba Group in 2003. Similar to eBay it’s a 2 sided marketplace that connects merchants and consumers. Since its launch Taobao has experienced fierce competition with eBay in China. In 2007-2008 it completely took over eBay. Today it’s the largest Chinese consumers to consumers (C2C) e-commerce platform. Currently Taobao is the world’s 8th most visited site in the world.

Taobao contributes a significant share to the Alibaba Group revenue. Most of Alibaba’s revenue comes from its core commerce business. 43% of Alibaba’s total revenue in the 2021 financial year, approximately $41 billion is attributed to its retail marketplaces among which Taobao is one of the biggest. Alibaba’s revenue increased 42% compared to the same period in the previous year.

Taobao has hundreds of millions of daily active users on its platform. In 2021 Taobao and Tmall together attracted 811 million annual active users (AAUs).

Search is a key component of Taobao’s success

According to Taobao, search is the most commonly used ways to navigate the Taobao store. Taobao’s semantic search NLP (Natural Language Processing) solution enhances purchasing experience to the extent that it plays one of the most important roles in converting people on the website to purchase.

Search is the most important and influential components of not only Taobao’s but e-commerce platforms user experience in general. However it’s not easy to get it right. There are many ways to build a search engine.

Taobao’s stellar success can be attributed to its vector search system that uses natural language search combined with traditional keyword matching and image search. We wrote about the underlying vector technology of the Taobao search in our blog post on the Relevance AI website.

The most common way to measure the success of a C2C business is by using gross merchandise value (GMV) that is the the total value of merchandise sold over a given period. Taobao’s GMV in 2021 was $76 billion.

Vector technology essentially enables contextual search. Taobao’s latest improvements in their vector search shows a 0.77% improvement in their GMV (roughly $585 million).

Solutions to the challenges that Taobao set out to solve

How to assist users when they struggle to accurately describe the products they want to find?

Let’s say I’m looking to protect my laptop that I just purchased for a fortune. I’m not sure how to search for laptop cases that are made of “tough material” on the outside (impact resistant laptop cases actually). So I just go ahead and type in “tough laptop case”.

A traditional search engines are not contextual search engines. It won’t surface me the right results, because the title or description of the products I’m looking for do not contain the word “tough”. These are actually “impact resistant laptop cases”.

Overcoming this challenge, Taobao’s vector search engine is able to infer that “impact resistant” is relevant to the word “tough”. Furthermore we can combine text search with vector image search and actually find those laptop cases that are made of hard and tough materials. Vectorisation of images makes image search as well as image clustering possible.

How to distinguish between the relevancy of products when there are thousands of matching products to a keyword?

In Chinese, Taobao means “searching for treasures”, for a reason. You can literally find anything on Taobao, because merchants can sell anything on the platform. Clothing products, food & beverages, consumer electronics, smart devices, health products, supplements, sports equipment, plane tickets and many more.

Fancy a Mona Lisa fly swat? Or looking for mop shoes with built-in rug?

Taobao is often described as a “one-stop super platform for shopping”. It’s thanks to its growing number of more than 2 billion product and service listings. That’s great, so we can literally find anything on the platform.

However, the huge product database poses challenges to search. The more products we have with the same, or similar title or description in our database is the harder to pick the most relevant one and surface that to the user.

Contextual search that is achieved by using vector image search and text search can alleviate this challenge as we are able to capture more nuanced differences in vector space between product attributes.

How to handle synonyms and typos?

One of the root causes why traditional keyword search has limitations when it comes to finding the most relevant products is its lack of ability to handle synonyms and typos.

Traditional keyword matching can not identify synonyms by default. Engineers need to build hardcoded rules in the code and subsequently maintain it to support the endless number of synonyms that are out there.

Just imagine how ridiculous it would be to define synonyms for every permutation of typo (i.e. laptop case, lapotp case, latop case, laptopc ase and so on…). We have written about the challenges associated with synonym search in-depth in this blog post.

As a solution, Taobao implemented a vector search system for text search to handle synonym configuration. The Taobao vector search system is able to infer the meaning of words, supporting synonyms by default. A vector search engine knows that “laptop” and “personal computer” as well as “latop” with a typo are closely related to each other because of their semantic meaning.

How to overcome lack of context in title and description?

The text of product descriptions and titles in e-commerce is usually very short. It doesn’t contain enough words, and usually lacks a proper grammatical structure making it difficult to use contextual search.

Therefore it’s difficult to traditional keyword search to identify the relevant keywords as well as vector search might fail to infer meaning due to lack of context.

The Taobao image search provides a great solution for these challenges. Taobao uses the Taobao image search system to provide users the ability to search images without the need to look at the text of product title and description.

Taobao image search scans the image and recognises colours, shapes, sizes, brands etc. The search engine vector then takes the text based query and aims to match that with a visual result, the image of a relevant product. Using vectors we can perform image clustering, purposefully omitting the less relevant images from the a cluster.

Vector search is not a one size fits all solution. So what?

Although vector search provides solution for most challenges mentioned above, traditional keyword search is still a vital part of e-commerce search.

When it comes to finding products via their IDs and SKUs (e.g. PO-TS-34-BK), contextual search has weaknesses. In these cases traditional keyword matching is the right solution. Similarly, keyword search can find products with acronyms more effectively (e.g. AE shoes – Allen Edmonds shoes).

As a result, Taobao and other e-commerce platforms with state-of the art search engine utilize hybrid text search to create a high performing semantic search as a result; a weighted combination of traditional keyword matching and vector search. As shown above with the example of the Taobao image search, it’s important to implement a hybrid solution for a weighted text and vector image search.

State of the art e-commerce search: the Taobao case study
Benedek Zajkas
October 26, 2021