Google’s new vector search technology – Vertex AI Search

  • Covering Groovenaut’s MatchIt Fast
  • Use Cases of vector similarity search
  • Deep Overview of Google’s Vertex AI Vector search technology
min read

Keeping up with new technologies is vital to a company’s ability to remain competitive and meet emerging needs, regardless of its industry. Businesses could soon benefit from vector similarity search, which is a relatively new option. 

Tech giants like Google, Microsoft, Facebook, Amazon, etc. are working and providing SaaS solutions for your data science needs, but you may still feel lost in this ocean of new tech and discoveries. With Machine Learning (ML), computers are able to understand language semantics and images better, enabling them to answer these abstract queries. In this article, we’ll walk through everything you need to know about Google’s latest vector similarity search product. Let’s get started!

Unstructured data, such as text, images, videos, and audio, is estimated to make up 80% of all data – but only 1% is ever analyzed. It is possible to extract valuable information from these massive unstructured datasets with vector similarity search. Using this technique, raw unstructured data is converted to feature vectors, a machine-friendly numerical data format that is easily analyzed and processed in real time. 

MatchIt Fast – Demonstrates the new capabilities of Google’s vector technology

Google Cloud partner Groovenauts, Inc. recently developed a demo powered by vector search called MatchIt Fast which can find images and news article text similar to a selected sample from a collection of millions in a matter of milliseconds. 

There are two broad problems involved when finding a similar text or image from a large collection.

First, to represent the text or image in such a way so that the computer can understand and make sense of it. 

Second, is to compare the query to the collection and find out the top most similar ones. The second problem needs optimization since here we have collections of millions of images and text corpus. And to compare a given query embedding to every other embedding, it’s gonna take millions of operations to find out the desired answer. 

There are two components to MatchIt Fast. 

  1. Image similarity search, where you can either select a preset image or upload one of your own. You will then receive the top 25 similar images from Wikimedia images instantly, without any caching involved.
  2. Search similar news articles. Just copy and paste some paragraphs from any news article, and you will receive similar articles from a corpus of 2.7 million articles on the GDELT project within a second. Just a few years ago, the same task would have taken minutes not a fraction of seconds. This shows the magnitude of work which has been put into such problems, and why should they not? 

Google handles a whopping 5.6 billion search queries in a day and it provides related webpages for each of these queries within a second. It is the most popular search engine in the U.S. and for good reason.   

It has cost them a fortune to build a search engine that provides fast and accurate results for users. Google search engine, which we all are very familiar with works on the same basis as what MatchIt Fast does. The only major difference between the two is the backend data over which it is finding similarity. MatchIt Fast searches over a few million images or news articles while Google’s search engine does so for around 56.5 billion webpages.

Vertex AI – Google Cloud’s end-to-end ML platform

Among the tools of Vertex AI is a Matching Engine. With Matching Engine, you can build use cases that match semantically similar items. From a large corpus of candidate items, Matching Engine finds items with the greatest semantic similarity to a query item. A semantically related or similar item search has many real-world applications and is a vital part of such applications as:

  • Recommendation engines 
  • Search engines 
  • Ad targeting systems 
  • Image classification or image search 
  • Text classification 
  • Question answering 
  • Chat bots.

Vector similarity search is a well covered problem with 100s of research papers available for achieving this task in the most efficient way possible at the time of publication. Before vector search came into existence, text-sequence matching or keyword matching was used. 

It is a fairly simple search; an ecosystem that takes your query, searches all available information then presents the items that have an exact text match to the keywords you entered. 

The drawback of this approach is that it acts more like a robot than a smart human would. A search like this will not return any results if you type incorrectly in a hurry or if your sentence grammar is incorrect. 

Vector similarity search is a step ahead of this approach which can handle more abstract queries and is something that large groups of smart people are still working towards to make it more optimized. Before we dive into the workings of the Vertex AI Matching Engine, let’s look at some of the benefits of vector similarity search.

Applications of vector similarity search range from artificial intelligence and deep learning to traditional vector calculation. It is not limited to image or text content. It can also be used for information retrieval for anything you have in your business when you can define a vector to represent each thing. 

Here are a few examples:

E-commerce: Providing visitors to e-commerce sites with similar or exact matches to what they are looking for can greatly increase their engagement and conversion rates (an example ecommerce case study on Taobao can be found here) The results of a vector search could be useful for showing shoppers such content. 

Physical & Cyber Security: Video AI is just one of many applications for vector similarity search in the security field. Other scenarios include facial recognition, behavior tracking, identity authentication, intelligent access control, and more. Additionally, vector similarity search plays an important role in thwarting increasingly common and sophisticated cyberattacks

Recommendation Engines: Recommendation engines are systems that use machine learning and data analysis to suggest products, services, content, and information to users. With enough data, algorithms can be trained to understand relationships between entities and invent ways to represent them autonomously. They have broad applicability and are something people already interact with every day, including movie recommendations on Netflix, shopping recommendations on Amazon, and news feeds on Twitter (an example vector recommendation system case study can be found here)

Chatbots: Traditionally, chatbots are built using a regular knowledge graph that requires a large training dataset. However, chatbots built using deep learning models don’t need to preprocess data—instead, a map between frequent questions and answers is created. Using a pre-trained natural language processing (NLP) model, feature vectors can be extracted from the questions and then stored and queried using a vector data management platform.

Image or Video Search: Deep learning networks have been used to recognize visual patterns since the late 1970s, and modern technology trends have made image and video search more powerful and accessible than ever before. (View our image search demo here)

How does Vertex AI Matching Engine work?

Vertex AI Matching Engine provides: 

1) Pre-built models capable of producing embedding representations. 

2) A low-latency Approximate Nearest Neighbor (ANN) service for finding similar embeddings. These two points address the major challenges of applying technology to real-world business problems. Let’s discuss more about these.

Vector Embeddings

Vector searches involve representing images or bits of text as vectors, or embeddings. The closer the embeddings are, the more similar they are, whereas the farther apart they are, the less similar they are. 

Vector embeddings are at the heart of concepts like recommendation engines, voice assistants, language translators, etc. Computers can only handle numbers, so we need a way to represent our unstructured data numerically. 

Of course, a single number cannot represent the entire complexity, so we need a whole bunch of them. 

An ordered list of numbers, such as [24, -5.14, 0, -14], is called a vector, and the length of that list is called the vector dimension. These vectors act as input for the machine learning models to be trained and evaluated.

In the case of the MatchIt Fast demo, the application simply uses a pre-trained MobileNet v2 model for extracting vectors from images, and the Universal Sentence Encoder (USE) for text. 

Both MobileNetv2 and U.S.E. are developed by Google and by using the concept of transfer learning one can easily create embeddings using these two models.

These models can help you extract “embeddings”, which are vectors that map each data row to its “meaning”. The embedding space of MobileNet puts images with similar patterns and textures closer together, and the embedding space of USE places texts with similar topics closer together.

Vector search service (Approximate Nearest Neighbor)

Once we get the embeddings of the query and corpus all we need is to compare the query embedding with each embedding in our database and return the top most similar one. We can use a brute force approach and compare them one by one linearly. Unfortunately, most modern-day applications have massive datasets with high dimensionality (hundreds or thousands) so a linear scan will take a while. 

It’s important to note that despite all recent advances on the topic, the only available method for guaranteed retrieval of the exact nearest neighbor is exhaustive search. Therefore, approximate nearest neighbor solutions are developed to identify nearest neighbors in reduced time and space.”

A similarity search can be orders of magnitude faster if we’re willing to trade some accuracy. 

The Approximate Nearest Neighbors algorithm constructs a k-Nearest Neighbors Graph for a set of objects based on a provided similarity algorithm. The similarity of items is computed based on Jaccard Similarity, Cosine Similarity, Euclidean Distance, or Pearson Similarity.

In 2020, Google released ScaNN (Scalable Nearest Neighbors) which outperformed in number of queries per action, compared to other SOTA vector similarity search libraries by a massive factor of two at that time. ScaNN utilised ICML 2020 paper, “Accelerating Large-Scale Inference with Anisotropic Vector Quantization” where they focus on how to compress the vectors to enable fast approximate distance computations, and propose a new compression technique that significantly boosts accuracy compared to prior works – however, this comes at the cost of limiting the problem scope to only maximising inner product (not L1 or L2, etc.)

In the context of nearest neighbors, vector quantization means compressing the embedding/data. High resolution images have millions of bits of information and the corresponding embedding of such images are of very high dimensions. So it is a very important step to compress those embeddings into a small dimension vector to improve both in-memory space and efficiency. 

Quantization is a technique to reduce dataset size (from linear) by defining a function (quantizer) that encodes our data into a compact approximate representation. 

Anisotropic vector quantization which is a type of compression allows ScaNN to better estimate inner products that are likely to be in the top-k MIPS results and therefore achieve higher accuracy. 

Quantization based techniques are the current state-of-the-art for scaling nearest neighbour problems to massive databases. Traditional approaches to quantization aim to approximately represent the database points.

Maximum inner product search (MIPS) has become a popular paradigm for solving large scale classification and retrieval tasks. 

A visual summary of MatchIt Fast

MatchIT Fast workflow with Google's Vertex AI Matching Engine

Some notable capabilities of Vertex Matching Engine:

Scale: It enables searching over billions of embedding vectors, at high queries per second, with low latency.

Lower TCO: To design a service that works at the scale of billions of vectors, one has to make it resource efficient. After all, just throwing more hardware at the problem will not scale. Due to the resource efficiency of Google’s technology, in real world settings, Vertex Matching Engine can be ~40% cheaper than leading alternatives (Google Cloud internal research, May 2021).

Low latency with high recall: Searching over a really large collection of embedding vectors, at high queries per second while serving results with low latency, will not be possible if the system performed a brute force search. It necessitates the use of sophisticated approximation algorithms that exchange some accuracy for massive speed and scale. Fundamentally, the system tries to find most of the nearest neighbors, at very low latency. Recall is a metric used to measure the percentage of true nearest neighbors returned by the system. Vertex Matching Engine can achieve recall of 95-98%, while serving results with 90th percentile latency less than 10 ms (Google Cloud internal research, May 2021).

Fully managed: The Matching Engine is an autoscaling fully managed solution so that you don’t have to worry about managing infrastructure, and can focus on building cool applications.

Built-in filtering: For real-world applications, semantic similarity via vector matching often goes hand in hand with other constraints used to filter results. For example, while making recommendations an e-commerce website may perform filtering based on the product category or region of the user. Vertex Matching Engine enables users to perform simple or complex filtering using boolean logic.

How to use Vertex Matching Engine

Users upload pre-computed embeddings to Google Cloud Service (GCS). There is a unique ID associated with each embedding, as well as optional tags (a.k.a. tokens or labels) that can be used to filter. Indexes are created by ingesting the embeddings into Matching Engine. It is then deployed on a cluster, at which point it can receive online queries for vector similarity matching. Clients can specify the number of nearest neighbors to return with a query vector. Afterward, the vector IDs and their similarity scores are returned to the client.

Periodically updating embeddings or generating new embeddings is common in real-world applications. Hence, users can provide an updated batch of embeddings to perform an index update. An updated index will be created from the new embeddings, which will replace the existing index with no downtime or latency impact.

While creating an index, it is important to tune the index to adjust the balance between latency and recall. Matching Engine also provides the ability to create brute-force indices, to help with tuning. A brute-force index is a convenient utility to find the “ground truth” nearest neighbors for a given query vector. It performs a naive brute force search. Hence it is slow and should not be used in production. It is only meant to be used to get the “ground truth” nearest neighbors, so that one can compute recall, during index tuning.

Relevance AI – The vector platform for rapid experimentation

This investment in vectors by Google is yet another indicator as to how powerful this technology is for business and where the future is heading. The ability to represent data in a numerical format is unprecedented and it has enabled a wide array of companies to deliver better products. However, despite Vertex’s Engine being a good solution for some production workloads – it’s missing a lot of steps required to get to production. From testing and identifying the best model to clustering and interpreting the data through rapid experiments. 

At Relevance AI our mission is to accelerate developers solving similarity and relevance problems. We are investing heavily in a developer-first vector platform that provides all the tooling needed to make working with vectors a breeze to meet business goals. 

  • Experimentation-first platform for rapid experimentation with vectors
  • 1 line code to visualize your vectors in a dashboard for easier interpretation, collaboration and stakeholder sharing to find the best vectors
  • Highly customisable clustering and visualization drill downs to see how vectors are interpreting the data
  • Advanced customizability of vector search that allows for quick experimentation of different search configurations that can be deployed to other production vector databases such as VertexAI, HNSWlib, FAISS, Elasticsearch, etc

Get started today by joining our wait list.

Alternatively, book in a time with our experts to discuss how you could be empowered by vectors.

Google’s new vector search technology – Vertex AI Search
Benedek Zajkas
January 21, 2022
Want to harness vectors technology to untap your unstructured / structured datasets potential?

Our vector experimentation platform will help you analyse your data immediately.

Talk to our team of data science vector experts.