In The End We Seek Structure

In Short:
A range of assumptions led to a boom in NoSQL solutions, but in the end, SQL and relational models find their way back as a critical part of data management.

In the End We Seek Structure. Why SQL and relational models are back as a critical part of data management – Click to Tweet

Background

By the mid 2000s, 10 years into the Netscape-inspired mainstream Internet, webscale workloads were pushing the limits of conventional databases. Traditional solutions could not keep up with a myriad of Internet users simultaneously accessing the same application and database.

At the time, many websites used relational databases like MySQL, SQL Server from Microsoft, or Oracle. Each of these databases relied on a relational model using SQL, the Structured Query Language, which emerged nearly 40 years ago and remains the lingua franca of data management.

Genesis of NoSQL

Scaling solutions is hard, and in particular scaling a relational, SQL database proved particularly challenging, in part leading to the emergence of the NoSQL movement.

FIGURE 1: Interest in NoSQL 2009 – 2015 Source: Google Trends

While there are numerous reasons to explain this interest graph, a few include prior solutions being

hard to scale
hard to achieve new performance needs
hard to build

An Explosion of Database Options

As developers sought alternatives, an explosion of database options emerged.

- Document Datastores
  Enabled scheme-less design which made a building new applications a breeze, but running concurrent reads and writes a significant challenge
- Key-value Stores
  Offered simple lookups and scale based on an eventual consistency model suitable to some, but not all, workloads
- Unstructured File Systems
  Delivered nearly infinite distributed storage making it easy to store everything, but nearly impossible to quickly make use of it
- Graph Databases
  Provided a superior data model for graph-specific datasets but not enough to cover a full spectrum of data management

FIGURE 2: An Explosion of Database Options

Some SQL With Your NoSQL

As reality hit, many approaches edged back towards a relational and SQL focused model.

Document datastore companies incorporated 3rd party storage engines to solve some of the most complex parts of operational, and relational, databases like multi-version concurrency control and record-level locking, as opposed to database locking.

Key-value companies developed entirely new custom query languages that while kind of like SQL, are not. A query language per datastore became the norm.

Unstructured file systems, holding troves of untapped data, quickly spurred an entire market of SQL on Hadoop solutions so customers could make use of everything they had been storing.

And some smaller graph database companies merged with larger NoSQL companies perhaps because the graph-only market did not represent a large enough independent opportunity.

Strength of the Relational Model

Fortunately, the relational model has kept pace with modern workloads from webscale Internet applications, to the Internet of Things, to real-time analytics.

Two critical inventions have catalyzed the strength of the relational model:

- In-Memory Computing
  With DRAM footprints increasing, and memory prices dropping, it becomes economically advantageous to keep high value data in memory
Distributed Systems
Advances in distributed programming deliver near unlimited scale to foundational infrastructure

Coupling these technical advancements with a relational model delivers a solution to tackle large data workloads with ease, and with structure built in.

FIGURE 3: A Relational Database Model

All for One, One for All

In the first round of the big data explosion, infrastructure tools became so abundant that far too quickly data practitioners were working more on data plumbing than data science. A cascading flow of infrastructure tools became as common as the data flow itself.

Now companies can store data in-memory, scale with distributed systems, and maintain a relational model from the outset. This provides the operational model and required performance, with the structure to immediately understand.