Capture changes as they happen with the Mage MySQL CDC streaming source

October 29, 2025

TLDR

This article provides a comprehensive walkthrough of building a MySQL Change Data Capture (CDC) pipeline in Mage Pro. You'll learn how to set up MySQL for real-time data streaming, configure the CDC data loader, and understand exactly what's happening in each block of code. Perfect for data engineers who want to move beyond batch processing and capture database changes as they happen.

Table of contents

  1. Introduction: Why CDC matters

  2. Understanding the MySQL CDC configuration

  3. Setting up your MySQL database for CDC

  4. Creating a realistic test scenario

  5. Configuring the CDC pipeline in Mage Pro

  6. Real-world applications

  7. Conclusion

Introduction: why CDC matters

Change Data Capture (CDC) transforms how we build data pipelines. Instead of querying your database every few minutes or hours to check for changes, CDC streams database modifications to your pipeline the moment they occur. This means:

  • Real-time analytics instead of stale dashboards

  • Lower database load compared to constant polling

  • Immediate reactions to critical events (fraud detection, inventory alerts)

  • Complete audit trails of every database change

Now let’s check out and see how the MySQL CDC data loader works


Source: GIPHY

Source: GIPHY

Setting up your MySQL database for CDC

For this demo, I created a simple inventory table that tracks products across three warehouses (East, West, and Central). The table design includes:

  • quantity_in_stock - Physical units in the warehouse

  • quantity_reserved - Units reserved for pending orders

  • quantity_available - Auto-calculated (in_stock - reserved)

  • last_order_id - Tracks which order currently holds a reservation

I populated it with 15 rows: 5 products (headphones, smart watches, laptop stands, USB cables, keyboards) each available in 3 warehouse locations.

The key requirement: MySQL must have binary logging enabled with binlog_format=ROW. This allows CDC to read the change stream. You can verify this with:

SHOW VARIABLES LIKE 'log_bin';        *-- Should be ON*
SHOW VARIABLES LIKE 'binlog_format';  *-- Should be ROW

If binary logging isn't enabled, you'll need to configure it in your MySQL settings and restart the server.

Creating a realistic test scenario

To demonstrate CDC in action, I wrote a Python script that simulates realistic e-commerce inventory operations:

  • Place Orders - Reserves inventory for customer orders

  • Ship Orders - Reduces stock when products leave the warehouse

  • Cancel Orders - Releases reservations back to available stock

  • Restock Inventory - Adds new inventory when shipments arrive

Each operation updates the inventory table, triggering CDC events that Mage Pro captures instantly. The script runs continuously, generating changes every 2-5 seconds to simulate a realistic order flow.

The important part isn't the script itself, it's understanding that these UPDATE operations are happening on your database, and CDC is capturing every single change as it occurs.

Configuring the CDC pipeline in Mage Pro

Now for the magic: connecting Mage Pro to your MySQL database to capture changes in real-time. In Mage Pro, create a new streaming pipeline and add a MySQL CDC data loader block. Here's the configuration:

connector_type: mysql

# Required
host: "{{ mage_secret_var('mysqlcdcHostCole') }}"
port: 3306
user: "{{ mage_secret_var('mysqlcdcuserCole') }}"
password: "{{ mage_secret_var('mysqlcdcPasswordCole') }}"

# Optional
connect_timeout: 10
charset: utf8mb4

# SSL (optional)
# ssl:
#   ca: "/path/to/ca.pem"
#   cert: "/path/to/client-cert.pem"
#   key: "/path/to/client-key.pem"

# Filtering (optional)
databases: ["ecommerce_cdc"]
tables: ["inventory"]
ignore_databases: ["mysql", "information_schema", "performance_schema", "sys"]
# ignore_tables: ["temp_*", "log_*"]

# Replication
server_id: 54001
auto_generate_server_id: true

# Resume (choose one)
# resume_from_gtids: "uuid:1-100"
# start_log_file: "mysql-bin.000001"
# start_log_pos: 1234
# start_timestamp: 1640995200

# Events
include_ddl: false
include_heartbeat: true
include_transaction_events: true
include_gtid_events: true
include_rotate_events: false
include_intvar_events: true
include_load_query_events: true
include_table_map_events: true

# Performance
heartbeat_seconds: 1.0
blocking: true
max_batch_size: 100
flush_interval_seconds: 1.0

# Advanced
return_db_records_only: true

Key configuration decisions:

  • databases and tables - Only watch the inventory table, ignore everything else

  • max_batch_size: 10 - Small batches mean near-instant processing (< 1 second latency)

  • return_db_records_only: true - Gives us clean data structures without CDC metadata overhead

Next, all you need to do is either transform your data, or write it to your downstream use cases This could be simply a data warehouse, or many other use cases similar to the ones below.

Real-world applications

Now that you understand how CDC works, here are some impactful examples of how to apply it in production:

Top use cases for inventory CDC pipelines:

  • Prevent overselling: Sync inventory changes to your customer-facing systems in real-time. When a product is reserved or shipped, your website instantly reflects accurate availability.

  • Automated low stock alerts: Trigger notifications to warehouse managers the moment inventory drops below reorder thresholds. No more discovering stockouts after they happen.

  • Real-time analytics dashboards: Stream inventory changes to your data warehouse (BigQuery, Snowflake, Redshift) so business analysts see live stock levels, order velocity, and warehouse performance.

  • Multi-system synchronization: Keep ERP systems, warehouse management software, and e-commerce platforms synchronized. When inventory changes in one system, all others update automatically within seconds.

Conclusion

MySQL CDC in Mage Pro transforms how you build data pipelines. By understanding the configuration and event structure, you can build real-time systems that:

  • Prevent business problems before they impact customers

  • Reduce infrastructure costs by eliminating constant polling

  • Improve customer experience with accurate real-time data

  • Enable new capabilities that weren't possible with batch processing

Ready to build real-time data pipelines? Start your free trial of Mage Pro or book a demo to see how CDC can transform your data infrastructure

Your AI data engineer

Power data, streamline workflows, and scale effortlessly.