End to End Semantic Search via API

Programmatic access to the Qualitative Cloud.

Ron Au
Updated on

Table of Contents

Perhaps you’ve explored the Qualitative Cloud and implemented a Search App or you’re a developer keen to integrate qualitative insights into your own app. If you’re ready to code with some endpoints, this article will take you through working with vectors in our APIs— from uploading a demo dataset of gym members to performing semantic searches.

For full documentation, refer to our API Reference. There, you can run requests on endpoints from within the docs and even see real responses using your uploaded data.

Semantic search for gym member goals

We’ll create a dataset, upload rows of gym member data, encode a text field into vectors and finally perform a semantic search using those vectors. The goal is relevant goals!

To demonstrate interacting with the endpoints, we’ve shown how to implement the search with examples of API requests in cURL as well as frontend JavaScript. Our API Reference has code examples of every endpoint in a multitude of languages if you’re interested in something else.

Authorization

Most endpoint paths will require you to set an authorization header with your API credentials in order to make a successful request. To find your credentials, log in to the Relevance.ai dashboard and head to the Settings page where you’ll see them displayed:

Settings page showing fields for API username and key

These example credentials are gobbledygook so they won’t work for you, but yours will look similar.

The authorization string is your username and key separated by a colon (:). Be sure to replace <username>:<key> in our code examples with your own details.

cURL

curl --request GET --url 'https://api-dev-aueast.relevance.ai/latest/' --header 'Accept: application/json' --header 'Authorization: <username>:<key>'

JavaScript

fetch('https://api-dev-aueast.relevance.ai/latest/', {
  method: 'GET',
  headers: {
    Accept: 'application/json',
    Authorization: '<username>:<key>'
  }
})
.then(response => response.json())
.then(json => console.log(json))
.catch(error => console.error(error))

If you run the code above, you should see a friendly message!

Response

{"message":"Welcome to VectorDB API, If you are seeing this message then VecDB servers are up"}

JavaScript helper function

For conciseness, we’ll use the helper function defined below from here on out. Feel free to make network requests using any method or library you prefer.

const authorizedFetch = async (
  method,
  { path, headers, body, parameters, authorization },
  endpoint = 'https://api-aueast.relevance.ai/latest/'
) => {
  const url = new URL(path, endpoint);
  const searchParameters = new URLSearchParams(parameters);
  url.search = searchParameters;

  try {
    const response = await fetch(url, {
      method,
      headers: {
        ...headers,
        'Authorization': authorization,
        'Accept': 'application/json',
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(body),
    });

    if (!response.ok) {
      throw new Error(response);
    }

    return await response.json();
  } catch (error) {
    console.error(error);
  }
};

Creating a dataset

To begin working with data we’ll want to create a dataset to contain it. Think of this as a project or collection, where different configurations can be applied to the same source of data.

Endpoint: https://api-aueast.relevance.ai/latest/datasets/create
API Reference: Create a dataset

cURL

curl --request POST \
     --url https://api-aueast.relevance.ai/latest/datasets/create \
     --header 'Accept: application/json' \
     --header 'Content-Type: application/json' \
     --header 'Authorization: <username>:<key>' \
     --data '{"id":"gym-members"}'

JavaScript

const createDataset = await authorizedFetch('POST', {
  path: 'datasets/create',
  body: {
    id: 'gym-members',
  },
  authorization: '<username>:<key>'
});

Response

{
  "status": "complete",
  "message": "gym-members created"
}

Inserting documents

Now we can make use of the dataset and populate it with some data! The structure of your data must either be an array of objects or an object of objects. Each nested object will become a document and automatically indexed with an _id field if one isn’t provided. We recommend defining the _id field yourself for easier referencing when it comes to document writes.

Endpoint: https://api-aueast.relevance.ai/latest/datasets/<dataset_id>/documents/bulk_insert
API Reference: Bulk insert documents

cURL

curl --request POST \
     --url https://api-aueast.relevance.ai/latest/datasets/gym-members/documents/bulk_insert \
     --header 'Accept: application/json' \
     --header 'Content-Type: application/json' \
     --header 'Authorization: <username>:<key>' \
     --data '
{
  "documents": [
    { "_id": "1", "name": "Carl Hyatt", "age": "31", "goal": "weight loss", "image_url": "https://randomuser.me/api/portraits/men/54.jpg" },
    { "_id": "2", "name": "Janice Ramsay", "age": "46", "goal": "build endurance", "image_url": "https://randomuser.me/api/portraits/women/17.jpg" },
    { "_id": "3", "name": "Dean Gardner", "age": "27", "goal": "general fitness", "image_url": "https://randomuser.me/api/portraits/men/82.jpg" }
  ]
}
'

JavaScript

const insertDocuments = await authorizedFetch('POST', {
  path: 'datasets/gym-members/documents/bulk_insert',
  body: JSON.stringify({
    documents: [
      { _id: 1, name: 'Carl Hyatt', age: 31, goal: 'weight loss', image_url: 'https://randomuser.me/api/portraits/men/54.jpg' },
      { _id: 2, name: 'Janice Ramsay', age: 46, goal: 'build endurance', image_url: 'https://randomuser.me/api/portraits/women/17.jpg' },
      { _id: 3, name: 'Dean Gardner', age: 27, goal: 'general fitness', image_url: 'https://randomuser.me/api/portraits/men/82.jpg' },
    ]
  }),
  authorization: '<username>:<key>'
});

Response

{
  "documents_received": 3,
  "inserted": 3,
  "inserted_ids": [
      "1",
      "2",
      "3"
  ],
  "failed_documents": []
}

Encoding fields into vectors

As it is right now, the dataset can be queried using traditional text search but that’s not all you came here for, is it? To enter the magical world of semantics, qualitative data and vectors, let’s encode the goal field.

This will add a goal_use_vector_ column to the dataset containing an array of 512 vectors representing the meaning of the original value. Any vectorised field will have the _use_vector_ suffix appended to the original header.

Endpoint: https://vectorhub-api-text.westus2.azurecontainer.io/datasets/queue
API Reference: Encode dataset

cURL

curl --request POST \
     --url https://vectorhub-api-text.westus2.azurecontainer.io/datasets/queue \
     --header 'Accept: application/json' \
     --header 'Content-Type: application/json' \
     --header 'Authorization: <username>:<key>' \
     --data '
{
  "fields": ["goal"],
  "dataset_id": "gym-members"
}
'

JavaScript

const encodeDocuments = await authorizedFetch(
  'POST',
  {
    path: 'datasets/queue',
    body: {
      dataset_id: 'gym-members',
      fields: ['goal'],
    },
    authorization: '<username>:<key>',
  },
  'https://vectorhub-api-text.westus2.azurecontainer.io/'
);

Response

{
  task_id: "434757f0-4b37-4e53-8ad6-24e9e38d1c7a"
}

Checking task status

Our example here will have finished very quickly but depending on the size of a dataset and the number of fields chosen, encoding can take some time. To check the status of a task, use the task_id returned by the encoding request.

Endpoint: https://vectorhub-api-text.westus2.azurecontainer.io/task_status
API Reference: Task Status API

cURL

curl --request GET \
     --url 'https://vectorhub-api-text.westus2.azurecontainer.io/task_status?dataset_id=gym-members&task_id=<task_id>' \
     --header 'Accept: application/json' \
     --header 'Authorization: <username>:<key>'

JavaScript

const taskStatus = await authorizedFetch(
  'GET',
  {
    path: 'task_status',
    parameters: {
      dataset_id: 'gym-members',
      task_id: '<task_id>',
    },
    authorization: '<username>:<key>',
  },
  'https://vectorhub-api-text.westus2.azurecontainer.io/'
);

Response

{
  "status": {
    "insert_date_": "2021-09-29T05:24:19.164312",
    "message": "\"FINISHED\"",
    "params": {
      "gc_mem": 681682,
      "size": 20,
      "api_key": "<key>",
      "dataset_id": "gym-members",
      "project": "<username>",
      "refresh": false,
      "task_id": "434757f0-4b37-4e53-8ad6-24e9e38d1c7a",
      "fields": [
        "goal"
      ]
    }
  }
}

Performing semantic text search 🚀

Here’s the moment you’ve been waiting for. With vectors in your dataset, you can now search the dataset using qualitative inputs! While traditional text search requires you to match the original string, semantic search removes the reliance on words and allows you to match data to relevant queries:

text matching queries
“lose fat and build muscle after getting out of shape during lockdown” “weight loss”, “bulking up”, “protein”
“build endurance for triathlon” “cardio”, “Ironman”, “2XU”
“maintain fitness for general wellbeing” “staying active”, “overall health”, “keep up with the kids”
Examples of matching queries from semantic search

In the example below, “cycling” is nowhere close to the words in “build endurance for triathlon” but Relevance AI knows to return the document with that goal as the first result. By default, results are sorted in descending order by their relevance score.

Remember that to use query a field’s vectors, you need to use its encoded field which is suffixed like so: <originalfieldname>_use_vector_ .

Endpoint: https://ingest-api-dev-aueast.relevance.ai/latest/datasets/<dataset_id>/simple_search
API Reference: Simple search

cURL

curl --request POST \
     --url https://ingest-api-dev-aueast.relevance.ai/latest/datasets/dataset_id/simple_search \
     --header 'Accept: application/json' \
     --header 'Content-Type: application/json' \
     --header 'Authorization: <username>:<key>' \
     --data '
{
     "vectorSearchQuery": {
          "field": "goal_use_vector_",
          "query": "cycling",
          "model": "text"
     }
}
'

JavaScript

const semanticSearch = await authorizedFetch(
  'POST',
  {
    path: 'datasets/gym-members/simple_search',
    body: {
      vectorSearchQuery: {
        field: 'goal_use_vector_',
        query: 'cycling',
        model: 'text',
      },
    },
    authorization: `${username}:${key}`,
  },
  'https://ingest-api-dev-aueast.relevance.ai/latest/'
);

Response

[
  {
    "goal": "build endurance for triathlon",
    "image_url": "https://randomuser.me/api/portraits/women/17.jpg",
    "name": "Janice Ramsay",
    "insert_date_": "2021-09-29T05:23:27.870491",
    "age": 46,
    "_id": "2",
    "_relevance": 0.37001884
  },
  . . .
]

Recap

To recap, we created a dataset to collect our data, inserted some documents, then encoded a text field into vectors and performed a semantic search using those vectors. Nice work.

This demo was a basic introduction to programmatically working with vectors from beginning to end, but there are a wealth of endpoints and configuration options at your disposal. We encourage you to explore the interactive API Reference to tailor vector operations exactly to your liking!

Leave a Reply

Your email address will not be published. Required fields are marked *