Finally, What We’ve Been Waiting For: OpenAI Acquires Rockset

First, massive congratulations to both OpenAI and Rockset — validation that the worlds of AI and database are one.

Finally, What We’ve Been Waiting For: OpenAI Acquires Rockset

While we speculate on what their platform might end up looking like, here are some points to note:

The bridge between model providers and an enterprise’s data estate is very, very real.

This is the biggest reason why we’ve seen nascent AI experiences over the last two years. There has yet to be a repeatable stack to bring all your data (structured or unstructured) across your operational and analytical stores.

OpenAI knows that, and wants to develop this stack –– fit for the enterprise.

Bridging the gap with a simple vector DB has always felt primitive.

Over the last two years, we saw the explosion of the vector database market, with vector DBs becoming almost  synonymous with building AI apps.

The data you want to access with your agents and LLMs won’t live in the vector store. This is the reason for your simple RAG apps, when you want to do so much more with AI.

At the end of the day, the winner was a real-time, general purpose SQL database.

This is OpenAI’s most predominant acquisition. There's a good reason why.

Notice it wasn't a vector DB.

Notice it also wasn't an execution engine. 

Notice how your vector DBs are now all trying to build a structured OLAP engine. Also, notice how entire frameworks like Llamaindex emerged to give you a single query layer to interface across your structured, full-text and vector data.

The real winner was the database that could do all of these retrieval techniques in its engine, in real time across a massive scale. Rockset’s product meant so much to OpenAI, it could drop its entire customer base 🤔.

what-we-think-is-going-to-happenWhat we think is going to happen

We shared most of the ideas earlier this year, now it should be obvious to everyone.

Someone will try to really solve the 'memory for AI' problem — and it is not vector DBs.

Every vector DB is trying to market as 'memory for LLM.' But no. Vector DBs are only sticky-notes for LLMs, helping you find some information.

Think about our own memories.  We don't just remember things, we summarize and connect them with each other — analyzing before we use them later.  A general-purpose, real-time database is the closest thing that does this.

So, what does that mean for users?  Maybe ChatGPT can finally extract the structure part of someone's data, throw them into a database and query them —instead of generate some lousy code to go over all the files to answer user's question.

What's more exciting is how to utilize cheap and efficient compute of a database to offload some expensive and slow compute of AI models.

AI interacts with data that's not text-to-sql.

As stated in a previous blog, text-to-sql is an extreme anti-pattern. And hopefully someone can study the problem outside the box. For example, function calling from AI models is a magnitude better than text-to-sql — so potentially for AI models, the interface of the database should change.

Data interacts with AI that's not a simple LLM function.

Big players have introduced various LLM functions. But none of them actually need to be functions in the database. You can summarize the results, do some answer formatting and some ranking — but at the end of the day, they all happen at the very last step of data processing. They don’t need to be part of the DB. 

Hopefully this time I can see things like semantic group bys, or streaming LLM functions that can analyze a table.

We will see new workflows in enterprise data.

Again this was my biggest disappointment in this wave. We know that AI can do things humans can never do — for example, read one million tokens in a second or try a million SQL queries overnight. But we are only trying to use AI to replace junior DBAs on the tasks humans handle perfectly fine (and it is still not going very well).

Hopefully, with more people starting AI/ database co-innovation (like we did at SingleStore), we can explore ways to work with data in totally different ways. I’m looking forward to a world where a business analyst works with AI to generate thousands of different hypotheses and proposals — before drawing a conclusion.


Share