Building a Semantic Search with Vertex AI

Fermin Blanco
Google Cloud - Community
8 min readOct 21, 2023

--

Semantic Search

Abstract

Semantic search is a more sophisticated mechanism to find relevant content to a search rather than traditionally keyword-based. It takes relevant information that it may be not be present in the query. It does so by understanding the context of the input text that the user types on the search box.

In Semantic Search, who you are, where you are and what you want are shaping the results of your query.

STOP THIS MADNESS AND SHOW ME THE CODE

Semantic Search

Semantic Search adds a deeper level of understanding to search intent, as the algorithms continue learning through bounce rates, conversion rates and other types of key performance indicators (KPIs). With a sharper understanding of search queries, these algorithms can boost user satisfaction and brand loyalty.

Step by step

  1. Train the model using AutoML
  2. Create and deploy an Index in Vector Search
  3. Perform queries to the index

Vertex AI

Google has built a platform that integrates various solutions around Generative AI. We will be using the following to make our dream come true:

Vertex AI tools

  1. Text embedding API
  2. Vector Search API

Before anything let’s enable the API. Vertex AI is a machine learning platform from Google Cloud to build experiences around Generative AI.

gcloud services enable aiplatform.googleapis.com

Installing the AI Platform SDK

By the time I build this article the official docs to install the AI SDK didn’t work for me. So I detail the instructions that worked for me.

# Installaling  Vertext AI SDK libraries
go get cloud.google.com/go/aiplatform/apiv1/aiplatformpb

# Installing dependencies to use Vertex AI SDK
go get github.com/googleapis/gax-go/v2
go get golang.org/x/oauth2
go get github.com/google/s2a-go
go get github.com/googleapis/enterprise-certificate-proxy/client
go get go.opencensus.io/plugin/ocgrpc
go get golang.org/x/sync/semaphore

The Vertex AI service recommends that you configure the endpoint to the location that has the features you want.

Vertex AI Regional Endpoint

Alternative you can,

// To make this the default region, 
run `gcloud config set ai/region us-central1`.
Using endpoint [https://us-central1-aiplatform.googleapis.com/]

aiplatform

The whole SDK is condensed inside aiplatform library, if we can use it in our machine we are all set.

Application default credentials

Assign an appropriate role to the service account that will make request to Google Cloud. The AI SDK needs credentials to authenticate requests against Google Cloud on your behalf.

gcloud auth application-default login

Training a model

For this project I only took into account semantic similarity Even though a Semantic Search is a lot more complex.

A Semantic Search Engine relies on embeddings. But …

What the heck are text embeddings?

Text embeddings capture the contextual meaning of an input text enabling applications to capture relationships between the words that compounds the input query and its semantics.

To put it in simple ways, text embeddings are a way to interchange words, phrases or complete texts to its mathematical representation given a context. For instance:

// The query              =>    its mathematical representation
What is meaning of life => 0.2333222222

But wait a minute, in reality its mathematical representation will be a collections of numbers (a vector 😱)

// The query              =>    its mathematical representation
What is meaning of life => [0.2333222222, 0.223877644444, 0.678e949494, ...]

For training a model I mean, interchange all phrases we posses in our dataset for embeddings. So each phrase will get a vector (We’ll be using this dataset).

Nowadays, the cutting-edge way to build recommendation systems is with embeddings.

Let’s talk about a bit about the algorithms to power the text to embeddings conversion.

word2vec

Word2vec is not a singular algorithm, rather is a family of model architectures and optimizations that can be used to learn word embeddings from a large dataset. Embeddings learned through word2vec have proven to be successful on a variety of downstream natural language processing tasks.

The Prediction API

The Prediction API is what allow us to trained our model. In other words, is the API that given a text or a set of them, it returns its vectors representation.

The Vertex API has a limit of 5 instances per prediction request. What?? Yes, I am in shock as you possible are!

So how do get around this API limitation?

There a couple of ways you can circumvent this restriction but here we’ll go with the following strategy:

  1. For each request (to the Prediction API) we will include 5 instances (text/phrases from our dataset).
  2. Then, we make as many https request to the API as we need. In order to not incur in Resource Exhausted error from the API, we will inetraval the request each 12 seconds.

After training a model (get the embeddings for our dataset), we need to deploy it to place where all that numbers make sense, it will be like finding a home to each embedding (vector).

Vector embeddings

We figure this vector space (the town where all the embedding are going to live) is going to be named, the index. Then if we say, creating an index it will really means building the town for the embeddings. After creating the index, we need a way to find this town, a sort of of address, an Endpoint. Deploying an index to an endpoint will be like draw in a map the location of our brand-new vector hometown. In order to do all of this, Vertex AI has a different API called:

Vector Search

Disclaimer: you need patience here! (1 hour took to create an index)

'@type': type.googleapis.com/google.cloud.aiplatform.v1.CreateIndexOperationMetadata
genericMetadata:
createTime: '2023-10-14T15:28:39.420701Z'
updateTime: '2023-10-14T16:15:36.346195Z'
nearestNeighborSearchOperationMetadata:
contentValidationStats:
- sourceGcsUri: gs://luillyfe-text-embeddings2/input.json
validRecordCount: '1'
dataBytesCount: '3081'

It is a technique to find the most similar vector in a dataset. And Google Cloud provides an API to use it.

Configuring your index

Input Config (Remove comments)

Creating an index

Building the hometown our vectors are going to live.

Creates an Index

Deploying to Public Endpoint

Deploying index to endpoint

Performing queries

When performing queries to the model we trained, all the following will be involved:

  1. When a query is made, the Semantic Search transform the query into text embeddings.
  2. Then The k-nearest neighbor algorithm matches vectors of existing documents
  3. Finally the Semantic Search generates results and ranks them based on conceptuals relevance.

Oftenly we come up with searches that are a lot more complex to be solved with simple matching-words algorithms

Abstract queries

To understand this concept the better, let’s recreate some examples:

What is the meaning of life?

What is the future of work?

What is the best way to learn?

What is the nature of reality?

What is the relationship between mind and body?

How does an word-matching algorithm will response those questions? Well, a word-matching algorithm works by looking for documents that contains the exact word or phrases that compound the query. Then How does a Semantic Search Engine would be so Different? Let’s see a comparison:

Word-matching algorithm:

  • The meaning of life is a question that has been pondered by philosophers for centuries.
  • The meaning of life is to love God and to love others.
  • The meaning of life is to find happiness.

Semantic search engine:

  • The meaning of life is a complex question that has no one answer.
  • Some possible answers to the question of the meaning of life include finding happiness, making a difference in the world, and connecting with something larger than oneself.
  • Different people have different answers to the question of the meaning of life.

Then, how Semantic Search works?

The Two-Tower neural network model

Having compute all the high dimensional vectors (embeddings) for each entry in our dataset. We are left to introduce our queries in the same vector espace the the embeddings are living. But before that happen we need to compute the feature vector for the current query we want to perform.

“… machine learning models are trained to map the queries and database items to a common vector embedding space, such the distance between embeddings carries semantic meaning, …”

https://blog.research.google/2020/07/announcing-scann-efficient-vector.html

Common Issues (panic by panic)

rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type “text/html; charset=UTF-8”

// Set a proper Vertext AI regional endpoint when setting the Prediction Client
aiplatform.NewPredictionClient(ctx, option.WithEndpoint(vertexAIEndpoint))

rpc error: code = PermissionDenied desc = Vertex AI API has not been used in project dauntless-arc-398505 before or it is disabled.

cEnabling the Vertex AI API

rpc error: code = PermissionDenied desc = Permission ‘aiplatform.datasets.create’ denied on resource ‘//aiplatform.googleapis.com/projects/dauntless-arc-398505/locations/us-central1’ (or it may not exist).
error details: name = ErrorInfo reason = IAM_PERMISSION_DENIED domain = aiplatform.googleapis.com metadata = map[permission:aiplatform.datasets.create resource:projects/dauntless-arc-398505/locations/us-central1]

Setting the proper role to the application service account

cannot use instances (variable of type *structpb.Value) as []*structpb.Value value in struct literal

// ... predictRequest context
Instances: []*structpb.Value{instances},

rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp: lookup us-central1-aiplatform.googleapis.com: i/o timeout”

// Network connection issues ... try again later

panic: rpc error: code = ResourceExhausted desc = Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: textembedding-gecko. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/quotas.

  😱😱😱

panic: rpc error: code = InvalidArgument desc = 5 instance(s) is allowed per prediction. Actual: 96


// The Vertex API has a limit of 5 instances per prediction request.
// ...
// WHATTTTTT?????

ERROR: (gcloud.ai.indexes.create) FAILED_PRECONDITION: The Cloud Storage bucket of `gs://text-embeddings/input.jsonl` is in location `us`. It must be in the same regional location as the service location `us-central1`.

let's create a bucket in the same region the index is going to be created

ERROR: (gcloud.ai.indexes.create) FAILED_PRECONDITION: Found file `gs://text-embeddings2/input.json` with unknown format, please make sure your files include the supported file extension (e.g. `.json`, `.csv` or `.avro`) in your file name.

jsonl is not valid format. Let's change it to .json

Resources

--

--