As your business grows, you might start hitting the limitations of single-node databases like MySQL — especially when dealing with large datasets and the need for real-time analytics. But migrating to a more powerful database doesn't have to be a complex, daunting task.
Enter SingleStore's Change Data Capture (CDC) through pipelines — a feature that allows you to replicate your MySQL data into SingleStore in real time. With CDC pipelines, you can seamlessly migrate your data without significant downtime or complexity.
In this post, we'll show you how to use MySQL CDC through pipelines to migrate from MySQL to SingleStore in five easy steps:
- Prepare your environment. Set up your SingleStore workspace and access to your MySQL instance.
- Configure MySQL for replication. Enable binary logging, create the necessary databases and set up a replication user in MySQL.
- Establish a connection between MySQL and SingleStore. Create a link using SingleStore's CDC pipelines to connect to your MySQL database.
- Migrate data using CDC pipelines. Create tables in SingleStore with inferred schemas and start data ingestion.
- Verify and monitor the migration. Validate data integrity and monitor real-time updates.
By the end of this guide, you'll have a fully functioning SingleStore database that's synchronized with your MySQL instance, ready to handle high-performance workloads and provide real-time analytics — all made possible by using MySQL CDC through pipelines.
Why choose SingleStore over MySQL?
You might be wondering, why should I migrate from MySQL to SingleStore? Let's break it down.
MySQL vs. SingleStore: A comparison
Feature | MySQL | SingleStore |
Architecture | Single-node or manual sharding; limited distributed capabilities | Fully distributed, scale-out architecture with built-in sharding and rebalancing |
Scalability | Limited horizontal scalability; complex sharding required for large datasets | High scalability; designed to handle terabytes to petabytes of data effortlessly |
Data ingestion speed | Slower ingestion rates; not optimized for high-throughput data | High-throughput parallel data ingestion with real-time capabilities |
Analytical performance | Optimized for OLTP; struggles with complex analytical queries | Hybrid OLTP and OLAP capabilities; optimized for real-time analytics and complex queries |
Storage format | Row-based storage, less efficient for read-heavy analytical workloads | Supports both rowstore and columnstore formats for optimal performance based on workload |
Real-time analytics | Limited real-time analytics; often requires additional tools or data warehouses | Built-in support for real-time analytics with rapid query execution |
Vector storage and search | Not natively supported; requires external systems | Native support for vector data types and similarity searches, beneficial for AI/ML applications |
Cost efficiency | Hidden costs due to additional tools, manual scaling efforts and maintaining multiple databases | Cost-effective scaling; reduces the need for multiple specialized databases, lowering total costs |
Complexity | Increased complexity when integrating with other systems for analytics and scaling | Simplifies architecture by consolidating workloads into a single, unified platform |
Data compression | Limited compression capabilities, leading to higher storage costs | Advanced compression techniques reduce storage costs and improve performance |
Prerequisites
Before we dive in, make sure you have the following ready:
- SingleStore workspace. Sign up for a free SingleStore Helios® account and create a workspace.
- MySQL instance. Access to your source MySQL database.
- SingleStore Notebooks. An interactive environment within SingleStore for running SQL commands.
- MySQL user with replication privileges. A user account in MySQL configured for replication.
- Network access. Ensure your SingleStore workspace can connect to your MySQL instance.
- Primary keys on all source tables. All tables in your MySQL database must have primary keys.
Step 1. Prepare Your Environment
Set up your SingleStore workspace
- Create a workspace. If you haven't already, sign up for SingleStore Helios and create a new workspace.
- Access Notebooks. Navigate to the Develop tab and open Notebooks. This is where we'll execute our SQL commands.
Verify access to your MySQL instance
Ensure you have the necessary credentials and network access to your MySQL database. You'll need this in the upcoming steps.
Step 2. Configure MySQL for replication
Time to get your MySQL instance ready for migration.
1. Enable binary logging. Modify your MySQL configuration file (my.cnf
or my.ini
) to include:
log-bin = mysql-binbinlog_format = ROWbinlog_row_image = FULL
Restart your MySQL server to apply these changes.
2. Create the singlestore
database. This is required for internal usage by SingleStore.
CREATE DATABASE IF NOT EXISTS singlestore;
Note: Do not replicate the singlestore
database; it's used for metadata and internal operations.
3. Set up a replication user
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'Password@123';GRANT CREATE, INSERT, DELETE, DROP, SELECT ON singlestore.* TO 'repl_user'@'%';GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl_user'@'%';GRANT SELECT ON *.* TO 'repl_user'@'%';
Ensure the user uses the mysql_native_password
authentication plugin. This user will be used by SingleStore to connect to your MySQL instance.
4. Ensure all tables have primary keys. CDC requires primary keys to track changes accurately.
5. Check network accessibility. Make sure your SingleStore workspace can connect to your MySQL instance. Adjust firewalls and security groups as needed.
Step 3. Establish a connection between MySQL and SingleStore
Now, let's bridge the gap between MySQL and SingleStore using Change Data Capture (CDC) pipelines.
1. Create a database in SingleStore. In your SingleStore Notebook, run:
DROP DATABASE IF EXISTS mysql_replicate_cdc;CREATE DATABASE IF NOT EXISTS mysql_replicate_cdc;
Select this database for subsequent operations.
2. Ensure the database is empty
SHOW TABLES;
You should see that no tables are present.
3. Create a link to MySQL
CREATE LINK mysql_replicate_link AS MYSQLCONFIG '{"database.hostname": "<your_mysql_host>","database.port": 3306,"database.ssl.mode": "required"}'CREDENTIALS '{"database.user": "repl_user","database.password": "Password@123"}';
- Replace
<your_mysql_host>
with your actual MySQL host - The
database.ssl.mode
parameter specifies the SSL mode for the connection - Note: Do not include
table.include.list
ordatabase.exclude.list
in theCREATE LINK
statement; these parameters are specified later
4. Verify the link
SHOW LINKS;
You should see mysql_replicate_link
listed.
Step 4. Migrate data using CDC pipelines
Here's where the magic happens.
Option 1. Replicate a single table
1. Create the your_table
table with inferred schema
CREATE TABLE your_table ASINFER PIPELINE ASLOAD DATA LINK mysql_replicate_link "<your_database>.<your_table>"FORMAT AVRO;
Replace <your_database>
and <your_table>
with your MySQL database and table names.
This command:
- Creates a
your_table
table in SingleStore - Infers the schema from your MySQL table
- Sets up a CDC pipeline for real-time data ingestion
2. Inspect the created table
DESCRIBE your_table;
Ensure the schema matches your expectations.
Option 2. Replicate multiple tables
If you want to replicate all tables from your MySQL database:
1. Create tables and pipelines for all tables
CREATE TABLES AS INFER PIPELINE ASLOAD DATA LINK mysql_replicate_link "*"FORMAT AVRO;
This command will create tables and pipelines for all tables in your MySQL database.
2. Filter specific tables or databases (optional)
To include only specific tables or databases, specify parameters in the CONFIG
clause:
CREATE TABLES AS INFER PIPELINE ASLOAD DATA MYSQL "*"CONFIG '{"database.include.list": "<your_database>","table.include.list": "<your_database>.<your_table>"}'CREDENTIALS '{"database.user": "repl_user","database.password": "Password@123"}'FORMAT AVRO;
Note: The table.include.list
and database.include.list
parameters are supported in the CREATE TABLES AS INFER PIPELINE
statement, not in the CREATE LINK
statement.
Start data ingestion
1. Start the pipelines
START ALL PIPELINES;
2. Monitor the pipelines
SHOW PIPELINES;
The status should show as Running
.
3. Verify initial data
SELECT COUNT(*) FROM your_table;SELECT * FROM your_table LIMIT 5;
You should see data flowing in from MySQL.
Step 5. Verify and monitor the migration
Let's make sure everything is working as it should.
Validate data integrity
1. Insert new data into MySQL. In your MySQL client, run:
INSERT INTO <your_database>.<your_table> (columns...)VALUES (...);
Add new records to test real-time replication.
2. Check for new data in SingleStore. Back in SingleStore, run:
SELECT COUNT(*) FROM your_table;SELECT * FROM your_table ORDER BY <primary_key_column> DESC LIMIT 5;
The new records should appear almost instantly,thanks to CDC pipelines.
Monitor real-time data updates
1. Check CDC pipeline status
SELECT SOURCE_PARTITION_ID,EARLIEST_OFFSET,LATEST_OFFSET,LATEST_EXPECTED_OFFSET - LATEST_OFFSET AS DIFF,UPDATED_UNIX_TIMESTAMPFROM information_schema.pipelines_cursors;
This helps you see if there are any lags.
2. Look for pipeline errors
SELECT * FROM information_schema.pipelines_errors;
Ensure there are no errors.
3. Review pipeline performance
SELECT * FROM information_schema.pipelines_batches_summary;
Get insights into batch sizes and ingestion rates.
Get started with SingleStore
Migrating from MySQL to SingleStore doesn't have to be complicated. By leveraging MySQL CDC through pipelines, you can easily replicate your data in real time, all while guaranteeing data consistency and minimal downtime.
Key points to remember
- All source tables must have primary keys. CDC requires primary keys to track changes accurately
- Do not alter source tables after starting CDC. Avoid running
ALTER TABLE
statements on the source MySQL database once the CDC operation has started - Use supported parameters appropriately. Parameters like
table.include.list
are specified in theCREATE TABLES AS INFER PIPELINE
statement, not in theCREATE LINK
statement
Why SingleStore?
- Performance at scale. Handle massive datasets effortlessly, and experience high-throughput data ingestion and lightning-fast queries
- Real-time analytics. Get immediate insights without impacting transactional workloads
- Simplified architecture. Replace multiple specialized databases with a unified platform
- Cost efficiency. Lower your total cost of ownership by consolidating systems
- Future-proofing. Stay ahead with built-in support for modern data types, including vectors for AI and machine learning applications
Ready to take the leap? Start your migration to SingleStore today and empower your business with real-time analytics and unparalleled performance — all made possible through MySQL CDC pipelines.
For more information, visit the SingleStore Documentation or contact our support team for assistance.