Implement Vector Embeddings on JSON Data with SingleStore Kai™

Clock Icon

4 min read

Pencil Icon

Jun 23, 2023

Implement Vector Embeddings on JSON Data with SingleStore Kai™

SingleStore Kai™ introduces 100x faster analytics on JSON data and adds first in-market vector embeddings for JSON data, empowering you to easily migrate to a more feature rich and performant database without changing your existing MongoDB queries or API calls.

The recent launch of SingleStore Kai™(with MongoDB® compatibility) introduced not only the ability to drive 100x faster analytics on JSON data, but also the ability to support vector embeddings for JSON data — making SIngleStore the first database on the market to do this. SingleStore Kai™ is a MongoDB-compatible API, which empowers you to easily migrate to a more feature rich and performant database, with no change to your existing MongoDB database queries or API calls.

Using vector embeddings along with dot_product to perform a semantic search on your data is an extremely powerful SingleStoreDB feature, one we’ve had available since 2017. These features enable you to build more accurate product recommender systems, AI-powered chatbots, improve customer support and so much more. With the launch of SingleStore Kai™, we are giving MongoDB users the ability to use those same features on their JSON data, through a very simple migration to SingleStoreDB.

Check out this great demo from Principal Software Engineer Jason Thorsness to see how you can utilize this first in-market functionality for vector embeddings on JSON.

In the video, Jason explains how he utilizes SingleStore Kai™ along with our semantic search features — to find some new science fiction books that he’d like to read. It is a truly simple implementation that you can accomplish quickly. Here are the highlights: 

Note: If you’d prefer to read through the code instead, you can find a GitHub repository with the sample code used in the video here.

First, create a script to loop through your dataset and query OpenAI to create the embedding for each document, adding that embedding to an appended field on the document. A snippet of the Node.JS code used to add the embedding to each document can be found here. Each document was added to a file named output.json.

const to = {
title: from.title,
description: from.description.value,
subjects: from.subjects,
embedding: {
$binary: {
base64: embeddingBase64,
subType: "0",
},
},
};

After adding the embeddings to each document, you need to import the JSON dataset into SingleStoreDB using the MongoDB driver for the language of your choice. In our example, Jason imported output.json into SingleStoreDB using the MongoDB driver for Node.JS, looping through the dataset and injecting each document as a new row in the table.

Once the dataset is in SingleStoreDB, you can query it using the sample application, which creates an embedding of your query using OpenAI, then does a dot_product search and returns the five closest matches.

Jason had read a few of the books in the resulting matches, but found one new book that he wanted to read, “Noumenon.” Utilizing the document _id, he performs another search in the Mongo Shell (mongosh) to find the five most similar books to “Noumenon” with an additional filter to constrain the price to less than $20.

With SingleStore Kai™ with MongoDB® compatibility) and some basic code, Jason is able to build an extremely accurate book recommendation system — all without writing any SQL. The possibilities of using vector embeddings of your JSON data are truly endless and you can easily implement this on SingleStoreDB using the same MongoDB queries and libraries you’re already familiar with.

Want to try this for yourself? Get started with a free trial of SingleStoreDB with Kai. 

We only have one more question: @Jason, how were the books?


Share