When I joined an integration company in late ‘99, I didn’t think data and integration would be my focus and passion. But here I am coming up to 2021, more than 20 years later, in a market that has only gotten more exciting and more relevant to the business.
In the early 2000s, it was all about the “connected enterprise”, getting your apps working together. Then it was the “intelligent enterprise” and we saw the rise of the enterprise data warehouse and analytics. In the early 2010s, Big Data became the rage with a game-changing filesystem, Hadoop, that could hold unlimited data and distributed processing frameworks, MapReduce & Spark, that could work through all the data. It was in this era that we really started talking about data democratization and digital transformation. The ability to empower everyone with tools and all the data they could want to make data-driven decisions. With this in place, businesses could be on the path to digital transformation – create new revenue streams, improve operational efficiency, spot & stop churn well before it happens, etc.
As companies went down this path, many found that the promises of the data-driven enterprise, data democracy and digital transformation were actually very hard to attain. Sure, some companies were showing some amazing results with their data insights but for most, it was still elusive (almost like a snipe hunt). Big Data environments were still too complex even with the new class of ingestion, data preparation and data exploration tools. The primary participants were still highly technical.
The common knowledge worker or business analyst continued working with their traditional BI tools. They were following the common process of going back and forth with a data gatekeeper asking for data sets in the hope the new one or additional data set would answer their questions, complete their reports, or meet their dashboard needs. The process was slow, data was not truly democratized, individual data exploration and insights was in the hands of the few.
The cloud changed everything. Cloud-native data warehouses like Snowflake and data science platforms like Databricks make the analysis of historical, static big data more accessible. These tools do not require specialized resources to provision and maintain the hardware and software. Users who have access rights to their data (e.g. Salesforce, Google AdWords, Files, etc.) can use tools like Fivetran and Stitch to easily load the data into the environments and they can start exploring and reporting. Their drawback is that they are not suitable for real-time applications and, for BI, are not capable of providing consistent low-latency, interactive query responses.
Business intelligence has also followed the trend of moving to the cloud and becoming more accessible. Take a cloud-native tool like Chartio which business users can easily pick and connect to their data in minutes. They’re off and running exploring and reporting on their data, the way they want to see it. Data democratization finally arrived. The foundation for digital transformation is solidly in place. Cloud-native tools are the fuel for this engine.
But we are now seeing the next level problem driven in part by accessibility and in part by the greater digital demands of the COVID-19 era. Everyone, both internal and external, is being empowered with self-service dashboards. Users want data to refresh immediately. Waiting 30 seconds is very painful. They want to explore the data dynamically. All of it, not a subset. And, they want the most current data, e.g. operational data and streaming data. With this increased demand for self-service analytics, businesses are finding their data infrastructure isn’t able to keep up.
Cloud data warehouses are very good at storing large amounts of historical data for exploration and deep analytics, however, when they are faced with a high number of concurrent queries (e.g. people refreshing their dashboards) or lots of people sending different queries (e.g. individuals drilling down into the dashboard data), they are not able to to return the individual answers quickly. The latency of each underlying query serving a dashboard compounds to seconds and even minutes for the user experience. Applying more compute resources doesn’t really solve the latency or concurrency problems, but certainly does increase expenses. The dashboards are still too slow and flaky in the eyes of the user.
Legacy technologies (on-premises or in the cloud) simply weren’t designed for this type of data interaction. Expanding a legacy data warehouse appliance like Teradata, Netezza or Exadata is astronomically cost prohibitive. These systems were not designed for modern data workloads that emphasize real-time insights to changing customer or machine conditions. Expecting legacy systems to do this is analogous to expecting a car to suddenly adapt and begin flying like a jet. Putting wings on the car and making modifications might work temporarily, but it’s unlikely to work long term. In essence, it is a bad idea to modify a car to fly. This is the “flying car dilemma”.
Oftentimes people think they don’t have the data scale problem and this is where they try to use a database like MySQL for their analytical database. However, they quickly find out the pains of trying to scale MySQL or other single-node OLTP databases to meet the growing analytics demands.
Enter SingleStore, a modern distributed SQL database that easily handles both the data volumes and query loads – be it a high number of concurrent queries, individuals doing dynamic or interactive queries or both! SingleStore is also uniquely able to make data available for queries (or in dashboards) as it is being ingested into the environment. This can be operational data from a CDC tool like HVR or Qlik/Attunity or streaming data from a tool like Confluent or StreamSets. Cloud data warehouses simply can’t do this. The data needs to land and then be curated into the various tables and data warehouses before it’s available for use.
And this is why SingleStore is the right database to complement existing data stores and power your BI tools and make your dashboards into fastboards. Chartio, with it’s Visual SQL interface writing queries directly on-top of the database takes full advantage of the speed provided in SingleStore. It is incredibly powerful in how easy it is to use both for business users (executives) and data professionals. With SingleStore underneath, dashboards are refreshed quickly while drill downs and data explorations are able to happen at the speed of thought.
We recently did a live demonstration of this in action with Dave Fowler, Chartio’s CEO and Sarung Tripathi our Technical Lead for Singlestore Helios. You can see for yourself how fast you can get answers to your questions, even with a dataset that’s 10x as large as the one that was being used prior (PostgreSQL). No smoke and mirrors.
So time for all of us to accelerate our digital transformation efforts. The tools are ready and the speed is there to explore your data at the speed of your thoughts. Get started with Chartio and SingleStore today!