The CEO of OpenAI, Sam Altman, is said to have a poster hanging above his desk which reads, , "No one knows what is going to happen next.”
But I am no Sam Altman — and I’m happy to go out on a limb to share some opinions.
2024 has been a roller coaster ride for the database market, packed with innovation, hard lessons and dramatic shifts. Let’s look at some trends amid the advancements and consider what they mean for everyone working with database technologies.
The rise and fall of transactional databases
MySQL: From free to friction
Earlier this year, PlanetScale, one of the rising MySQL hosting providers, eliminated its free tier. Following close on the heels of these announcements, social media reports suggested people also lost jobs due to what some referred to as cost-cutting measures.
Meanwhile, MariaDB — another MySQL variant — removed its “Database as a Service” offering and quietly went private. This double whammy was a sobering reminder of the challenges inherent in turning open-source projects into profitable commercial services.
Takeaway
Cloud infrastructure is inherently expensive, especially when it comes to inefficient workloads, and it’s no longer enough to merely wrap open-source database management systems with hosting services. If the underlying infrastructure technology isn’t cost-efficient, you need more than just a nice UI and some developer-friendly features. Ironically, this adds more cost, meaning margins are thin in the short term.
In another ironic twist for the world of open-source databases, one of the most loved in-memory databases by developers, Redis, faced competition from its own community. ValKey, a fork of Redis, emerged to counter Redis’s departure from open-source freedom.
Prediction 1
Proprietary hosting services for open-source databases will continue to face significant challenges, leading to some consolidation in the market. In addition, I think open-source forks and community-supported alternatives will continue to increase in popularity, especially more databases where vendors impose restrictions or stray from their open-source roots. (Incidentally, SQLite saw a fork too: libSQLite.)
Postgres: A banner year for the open-source favorite
Postgres-based services flourished throughout 2024, especially those offering modern developer features like branching and built-in authentication. Supabase was one of the key players that captured the spotlight this year as it found its way into more AI-driven, full-stack development tools like Lovable and bolt.new.
We also saw the rise of pgvector, which enabled Postgres users to add vector capabilities, marking another step in merging AI and traditional databases. Postgres is evolving as one of the advanced relational database systems with AI integration, enhancing its functionality for more intelligent data management and improved data validation.
The Postgres example is another validation that at the lower end of the market, more developers and business users are looking for ease of use and additional functionalities. APIs and storage for images, videos and vectors allow for fast and easy bootstrapping of new applications.
Incidentally, as I was writing this blog, AWS announced Aurora Distributed SQL (DSQL) which claims to be fast and scalable with strong consistency. The move further emphasizes the need and demand for better performance and scalability for hosted services on open-source databases.
Prediction 2
Postgres will continue to be popular, and we will see more features added. However, with feature bloats come complexity and scalability issues, which leads to my next point.
The SQLite + DuckDB effect
SQLite and DuckDB proved this year that smaller, more focused databases are rising in the lower-revenue market segment. SQLite found new life on edge devices paired with smaller language models, while DuckDB stepped in as an open-source hero for in-browser analytics. Each filled a niche, but it became clear that they both had specific boundaries, as vectors and deep analytics aren’t in SQLite’s toolkit (yet).
The evolution of data storage methods has seen a shift from traditional systems to advanced solutions like all-flash arrays and intelligent databases. SQLite and DuckDB exemplify this trend by offering scalable, high-performance options that integrate well with modern technologies.
Similarly, MotherDuck is now offering DuckDB hosting, but it remains to be seen whether this will add significant value beyond what DuckDB already provides.
Takeaway
There’s still a need for high-performance, lightweight databases that excel in both transactional and analytical workloads — especially in environments where cloud-based databases and their associated costs are a major consideration.
Prediction 3
Atomized databases like SQLite and DuckDB will gain more traction, but we may see a merger of SQLite with DuckDB-like analytics features in a brand-new solution. This will be driven by the proliferation of smaller models that can run on personal devices without the use of expensive GPUs in the cloud. If this happens, I also see a hybrid of DuckDB and SQLite becoming a serious threat to Postgres.
Analytics and data warehouses: The great reshuffling
Databricks and Snowflake: Chasing flexibility with cloud-native databases
Last year, Databricks acquired Mosaic ML, and this year, it continued its expansion by acquiring Tabular, signaling an evident shift from DeltaLake to Iceberg. Snowflake, in turn, announced support for Iceberg as well as a new feature: running containers in Snowpark Container Services (SPCS).
Both Databricks and Snowflake are leveraging advancements in database technology, likemulti-model databases and cloud-native architectures, to enhance data management, real-time processing, data integrity and security measures. While adding more flexibility in managing vast amounts of data, these moves also revealed a deeper truth — enterprises want optionality. As cloud prices rise, vendor lock-in has become a major concern, pushing companies to adopt technologies that provide flexibility and compatibility across different environments.
In a further strong movement to standardize on Apache Iceberg tables, AWS announced support for Apache Iceberg in its S3 buckets, as well as more integration capabilities with Redshift through its Sagemaker services.
Takeaway
Data is more valuable than ever, even more than the latest and greatest language models, which are now commoditized and accessible to everyone. Businesses need to interact with their data in real time to know what’s happening, learn from it and take action.
Prediction 4
We will see some new niche entries in the market providing an aggregated query layer for Iceberg data.
Other key trends to watch in AI for database administrators
This year, AI made a significant impact on the database world, turning once-cutting-edge features into commodities. Vectors, once the domain of specialist databases like Milvus and Pinecone, became an add-on in every traditional database. With every platform now boasting vector capabilities, specialized vector databases no longer have such a unique allure.
Takeaway
Niche vector-only players may still exist, but it’s hard to see any of them reaching the level of MongoDB® or a Snowflake. It’s clear that the boundary between AI and data is dissolving as real-time artificial intelligence combines with vast, diverse datasets. I believe that’s why OpenAI acquired Rockset, and later released its real-time features and APIs.
As AI-integrated databases become more prevalent, the importance of having security measures to prevent data breaches cannot be overstated. Within these advanced systems, enhanced encryption and access controls are essential for maintaining data integrity and safeguarding sensitive information.
Real-time data analysis is now merging with AI features, and I have had several recent conversations with large companies looking to use real-time data with AI. Case in point: Imagine a factory assembly line where AI not only reports issues but also takes proactive steps (agentic) to optimize and troubleshoot them, using real-time data processing and analyzing 20 trillion data points during the manufacturing process. Or think about cybersecurity scenarios where zero-shot forecasting helps predict and mitigate potential threats in milliseconds. Niche databases are not inherently able to solve these use cases.
Prediction 5
Specialized vector databases will face consolidation or be forced to differentiate beyond vectors with added support for multiple data models. I believe we will see some M&A movements this year in 2025.
Prediction 6
With databases and AI converging, AI-integrated databases will replace many traditional reporting tools by 2025. AI-driven advisory engines capable of machine learning and reasoning in real time will become the default for business intelligence.
More key trends to watch in AI
- NLP to SQL. It’s not enterprise-ready yet, but promising developments in open source indicate it may soon be.
- Model context protocol. If other large players pick Anthropic’s release of MCP, it will be close to the data access layers in databases, opening up a large assortment of possibilities.
- Knowledge graphs. Serving governance, cataloging and security purposes, knowledge graphs have become vital for managing vast interconnected data landscapes, but faster real-time solutions that may use look-up tables will prevail (like SingleStore, which can do transactions, analytics, hybrid search and knowledge graphs in a few milliseconds). Graph databases are also becoming crucial for managing complex data relationships within structured and unstructured data, and providing real-time insights — especially in applications like social networks, recommendation engines and fraud detection systems.
- Retrieval-Augmented Generation (RAG).: The future points to models that not only retrieve data but learn to understand new datasets on the fly, adding reasoning capabilities during inference. In my opinion, this is the future of RAG, which will start with the hybrid approach of fine-tuning and advanced agentic RAG approaches.
Prediction 7
Knowledge graphs and agentic RAG will play a central role in enterprise data strategies, enabling models that interact seamlessly with new and ever-changing datasets, most likely either using MCP or a new protocol.
Key takeaways
- Cost efficiency is key. In a cloud-heavy world, hosting inefficient or open-source databases isn’t sustainable. Companies must either add significant value or radically improve performance and scalability.
- Optionality and real time matter. The rise of Iceberg and real-time analytics speaks volumes. Businesses need the flexibility that prevents lock-in, and the ability and controls to switch seamlessly.
- AI meets data. The line between AI and databases is blurring. Expect more AI-driven data solutions that can interpret, learn and act — all in real time. Imagine a single layer or a store that can curate all kinds of petabyte-scale data by engaging directly with agents or in a natural language interface.
- Database administrators and AI. Database administrators (DBAs) are crucial in managing modern database systems. By leveraging AI tools, DBAs can alleviate maintenance burdens, enhance efficiency and improve security. Embedding AI within databases allows DBAs and developers to utilize advanced data management and analysis tools directly, all while using familiar query languages.
- Niche players will struggle. As vectors become mainstream features in SQL and NoSQL databases, specialized databases may be squeezed out of their niche market by more prominent players supporting a multi-model approach that includes vector capabilities.
Where are we headed in 2025 and beyond?
The world of data and databases is headed toward a convergence — an era where AI isn’t an add-on but an intrinsic part of how we store, understand and utilize data. Real-time insights will be table stakes for any serious business, and the need for flexibility will drive further innovation in technologies that prevent vendor lock-in. As cloud costs continue to rise, so will the pressure on services to deliver optimal performance and game-changing capabilities.
Prediction 8
Expect hybrid solutions that seamlessly combine on-premise, edge and cloud environments to be more prominent. This will help to give enterprises more control over their data, performance and costs at enterprise scale.
Take all these predictions together, and my advice is for everyone who works with database technologies to buckle up. 2025 is shaping up to be even more transformative as the substrate of intelligence — data — becomes the true differentiator for companies that wield it well.