This blog introduces SingleStoreDB and Redpanda, and focuses on how the two technologies together can solve the complex challenges involved with building a modern clickstream analytics system.
We show you step-by-step how you can connect Redpanda to Singlestore Helios on AWS, and highlight the benefits of using SingleStore and Redpanda.
Unleashing the Power of Real Time with SingleStore and Redpanda
The ability to use real-time data to make business decisions is critical in today’s world. In the modern business landscape data has become the new oil, fueling growth and innovation across industries. Yet as crucial as data is, its true value lies not only in the volume accumulated but the speed at which we can process and analyze it effectively — and efficiently to drive business outcomes.
The sheer magnitude of data generated every day requires tools that can facilitate high-speed transactions, real-time analytics and robust data streaming. Recognizing these requirements, two platforms stand out — SingleStoreDB and Redpanda. While each is powerful in its own right, integrating the two can revolutionize your data management strategy, bringing together the best of data streaming and real-time analytics.
Let's look at an example involving clickstream data. Real-time clickstream analysis is a pervasive challenge in today's world of online interactions. It involves tracking and analyzing the sequence of clicks that a user makes while navigating a website or application, which is leveraged for improving user experiences, personalizing content and optimizing marketing strategies.
The complexity of dealing with the high volume, velocity and variety of clickstream data make its real-time processing extra challenging. Users typically use an event streaming platform like Kafka to process the clickstream, but have historically struggled with the end database on which they need to do complex analysis.
Building a real-time analytics system requires both ingesting large amounts of events and being able to serve the analytical needs of the application as quickly as possible. These integrations involve plumbing multiple tools and custom solutions which makes it hard to manage and scale. This is where Redpanda’s integration with SingleStoreDB provides a best-in-class option for customers building analytical applications.
What Is Redpanda?
Redpanda is a modern streaming platform that acts as a drop-in replacement for Apache Kafka, which is commonly used for real-time data processing. Redpanda is designed to offer high performance with a simpler, more developer-friendly architecture compared to Kafka. It is fully API compatible with Kafka, which makes it highly suitable for building analytical applications — and also using existing Kafka applications as is without requiring any changes.
This makes it easily to integrate with SingleStore Pipelines, which can ingest massive amounts of data into SingleStore and make it queryable in real time. This can also be done in a simple SQL-like interface, making it easy for developers to use these configurations.
What Is SingleStore?
SingleStoreDB, distributed SQL database that unifies transactions and analytics in a single engine to drive low-latency access to large datasets, simplifying the development of modern applications. SingleStoreDB delivers 10-100 millisecond performance on complex queries, eliminating performance bottlenecks and unnecessary data movement to ensure businesses can effortlessly scale.
We will demonstrate how you can set up a Redpanda cluster that receives clickstream data from a variety of sources. This data can then be ingested into a SingleStoreDB cluster running on AWS to provide rich, real-time insights.
Challenges in Building a Modern Clickstream Analytics System
Building a modern clickstream analytics system involves collecting, processing, analyzing and storing large amounts of data in real time. Let's take an example of the user operation doing multiple clicks online. All of the clickstream data needs to be processed to drive business value or predictive analytics. At a high level, it involves four steps:
- Collect data from variety of sources
- Ingest data
- Store data
- Drive business insights by consuming data
All of these operations need to happen in real time while ensuring the architecture is robust for enterprise readiness, scalable based on business needs and at a reasonable cost factor.
While using real-time stream processing systems like Apache Flink, Apache Spark can provide a low-latency solution. The ability to do fast ingestion into a database and process quickly is a more trusted industry approach , which is where and Redpanda’s integration with SingleStore provides the goodness of both these tools.
Typical databases struggle with speed of ingestion and have to rely on external tools. However, SingleStoreDB supports a native capability called Pipelines, which help in super fast ingestion from Kafka.
The Solution
Let’s see how this works. In the following setup we are showing a simple setup, ingesting a large amount of clickstream data from Redpanda to Singlestore Helios running on AWS.
Redpanda is easy to deploy in the cloud using one of two options: Dedicated (provisioned in Redpanda’s tenant, AWS in this case) or Bring Your Own Cloud (BYOC, provisioned in your tenant yet still fully managed with Redpanda’s unified control plane). The solution in this tutorial was built using Redpanda’s BYOC model, which is documented here.
To build the connection with Singlestore Helios, customers can set up with Kafka. Now that you have SingleStoreDB cluster running, let’s create a pipeline that can capture the incoming stream of data natively into SingleStoreDB. This involves three major steps:
1. Setup the actual pipeline using SingleStore Kafka pipelines
CREATE OR REPLACE PIPELINE <Pipeline_name>
AS LOAD DATA KAFKA 'Redpanda_topic_1,
Redpanda_topic_2,
Redpanda_topic_3,
CONFIG '{
"sasl.username": "<user_name> ",
"sasl.mechanism": "SCRAM-SHA-256",
"security.protocol": "SASL_SSL"
}'
CREDENTIALS '{
"sasl.password": "REDACTED"
}'
DISABLE OUT_OF_ORDER OPTIMIZATION
INTO TABLE <table_name>
FORMAT JSON
(
field_1<- value_1,
field_2<- value_2,
field_3<- value_3,
field_4<- value_4,
field_5<- value_5,
)
ON DUPLICATE KEY UPDATE
field_1= VALUES(value_1),
field_2= VALUES(value_2),
field_3= VALUES(value_3),
field_4= VALUES(value_4),
field_5= VALUES(value_5),
;
2. Once we have the pipeline created you can check the pipeline by checking the sample data
> TEST PIPELINE <Pipeline_name>
3. Once you have verified that the pipeline works perfectly fine through the sample data, you can start the pipeline and see that the data should start flowing.
> START PIPELINE <Pipeline_name>
In just three steps you can start getting data into SingleStoreDB which is instantly available for querying and quick insights.
Empower Your Real-Time Data Strategy
The combination of SingleStoreDB and Redpanda offers a best-in-class solution for organizations seeking real-time analytics and high-speed data processing. By harnessing the power of these platforms, businesses can stay ahead in today’s data-driven landscape. In this step-by-step blog, we demonstrated how to set up the connection between Redpanda and SingleStoreDB running on AWS.
Redpanda provides a real-time ingestion platform that can be a drop-in replacement for Kafka. SingleStore provides the unique capability to solve both transactional and analytical needs of your application, making data available for query as soon as it is loaded. This makes building real-time, analytical applications simple and efficient — and a perfect choice for storing real-time data streamed from Redpanda.
Interested in trying out SingleStoreDB and Redpanda today? Get started building with SingleStoreDB, and download Redpanda for free.
Additional Resources: