A new approach? Debezium is typically associated with Kafka Connect, but surprisingly, it offers three distinct runtimes—two without Kafka dependencies! After finding the official docs incomplete, we ran the code and dug into the library. While Kafka Connect provides the most robust implementation, its comes with some real complexity - and the other two flavors leave more to be desired. We're working on Sequin to fill this gap.
It’s not obvious, but there are three very distinct Debezium products. And you’ll want to understand the differences.
With recent releases (v3.1 is stable at the time of writing), the project offers three distinct deployment options:
- Classic Debezium on Kafka Connect
- Standalone Debezium Server
- Embeddable Debezium Engine.
While all three share core database connector logic, they differ significantly in architecture, deployment approach, and operational characteristics. This technical breakdown will help developers choose the right Debezium approach for their specific needs.
Codebase and Architecture
All three Debezium variants share a common foundation: the core CDC connectors. These connectors handle the heavy lifting of reading database transaction logs, parsing changes, and producing events (here's a good deep dive with Postgres). But how each deployment option builds upon this shared foundation reveals their unique strengths and use cases.
Debezium on Kafka Connect
The classic Debezium deployment runs atop Apache Kafka Connect. In this model, Debezium connectors function as native Kafka Connect source connectors, seamlessly integrating with Kafka's infrastructure.
What makes this approach powerful is how it leverages Kafka's infrastructure for everything from deployment to scaling to management. You can manage Debezium through Kafka Connect's REST API and leverage the full suite of Kafka monitoring tools. The connectors write directly to Kafka topics, making your change events immediately available to any Kafka consumer.
Debezium Server
Debezium Server takes a different approach, functioning as a standalone application that wraps the Debezium Engine in a ready-to-run package. Built on the Quarkus framework, it comes as a runnable JAR or container image that you can deploy without any Kafka dependencies. Think of Debezium Server as a specialized CDC service that cuts out the middleman—it reads changes from your database and streams them directly to your target messaging system.
What sets Debezium Server apart is its collection of sink adapters. These components enable direct integration with systems like Amazon Kinesis, Apache Pulsar, Redis Streams, and Google Pub/Sub without requiring Kafka as an intermediary. Under the hood, it's still using the same connector code as the Kafka Connect deployment, but it wraps this in a simple, standalone package focused on direct delivery to non-Kafka destinations. As we’ll explore later, this simplicity comes at the expense of reliabity, throughput, and observability.
Debezium Engine
At its most flexible, Debezium offers the Engine—a Java library API for embedding CDC capabilities directly in your own applications. The Engine isn't a service you deploy; it's code you integrate. Through the io.debezium.engine.DebeziumEngine
interface, your application can directly capture and process database change events without any external services or processes.
The Engine gives you programmatic control over the entire CDC process. You configure connectors through the API, receive events via callbacks, and handle everything from offset management to error recovery in your code. While it uses the identical connector implementations as the other options, it provides them through a Java API rather than configuration files, making it remarkably adaptable to custom requirements.
Runtime and Deployment
How you run Debezium dramatically impacts its operational characteristics. Each variant has distinct deployment patterns, infrastructure requirements, and runtime behaviors that shape its fit for different environments.
Debezium on Kafka Connect
When you deploy Debezium on Kafka Connect, you're diving headlong into the distributed world of Kafka - whether you want to or not. Debezium database connectors run as Kafka Connect tasks distributed across a cluster of worker nodes. This model brings significant advantages: your CDC processes inherit Kafka Connect's distributed nature, with automatic task distribution and rebalancing when nodes come and go.
This deployment model requires the most infrastructure—you need a Java runtime, a Kafka broker cluster, and Kafka Connect workers. But it rewards this complexity with a mature operational framework. Your connectors are managed through Connect's REST API, their configuration and offsets stored reliably in Kafka topics, and their execution distributed across the Connect cluster.
What makes this particularly powerful for production environments is the multi-tenant nature of Kafka Connect. A single Connect cluster can run dozens or hundreds of connectors simultaneously, all sharing the same management interface but with isolated execution. When you need to scale, you simply add more Connect workers, and the framework automatically redistributes the connector tasks.
Debezium Server
Debezium Server simplifies deployment significantly. It's a standalone Java application—one instance per source database—that you run like any other service. You can package it as a container, deploy it with Kubernetes, or run it as a system service. The infrastructure requirements are minimal: just a Java runtime and network access to your source database and target messaging system.
The runtime characteristics of Debezium Server reflect its standalone nature. Each instance is self-contained, running a single Debezium connector configured through properties files or environment variables. There's no built-in clustering or coordination between instances—each Server is an independent process capturing changes from a specific database.
This simplicity makes Debezium Server particularly appealing when you want to avoid Kafka entirely. It fits naturally into container orchestration platforms, where you can deploy it alongside your database or as part of your data processing pipeline. While it doesn't offer the automatic failover of Kafka Connect, it integrates smoothly with container orchestration's own resilience features.
Debezium Engine
The Debezium Engine takes minimalism even further—it's not a service at all, but a library embedded within your application. Your infrastructure requirements are determined entirely by your application; the Engine itself needs only network access to your source database.
Running the Engine means integrating CDC directly into your application's lifecycle. You create an Engine instance, configure it with a connector, and receive change events through callbacks. Your code controls when CDC starts and stops, how events are processed, and where offsets are stored. This tight integration gives you maximum control but also means you're responsible for all operational aspects.
This approach makes sense when you want CDC to be an organic part of your application rather than a separate service. Perhaps you're building a specialized data pipeline, a custom replication tool, or a service that needs to react immediately to database changes. In these scenarios, the Engine's direct integration can provide both simplicity and power, eliminating network hops and external dependencies.
Functional Differences and Trade-offs
Beyond architecture and deployment, each Debezium variant offers distinct functional characteristics.
Performance
All three Debezium variants connect to and process changes from database sources in the same way (they all use the same library). Hence, their “input” performance is similar - and with few exceptions - run single threaded. However, because each variant can work with different destinations, they take on different performance characteristics as a result.
Debezium on Kafka Connect
When running on Kafka Connect, Debezium's performance inherits Kafka's robust throughput characteristics. A well-tuned cluster can handle tens of thousands of events per second, though it does add a network hop through Kafka brokers.
Kafka's sophisticated batching and compression make this approach effective for high-volume scenarios. Change events are efficiently packed and buffered, allowing the system to handle activity spikes without overwhelming consumers. With Kafka Connect's distributed nature, you can scale horizontally by simply adding worker nodes, making this approach ideal for enterprise-scale deployments with proven performance characteristics.
Debezium Server
Debezium Server creates a direct path from your database to your target messaging system. Changes flow without going through Kafka as an intermediary, but performance is now directly tied to your target system's capabilities and limitations and your DevOps prowess
For connectors supporting partitioning (like MongoDB and SQL Server), Debezium Server can utilize multiple threads to parallelize processing within a single instance. Its predictable resource utilization and tunable batching parameters make it adaptable to varied workload patterns while maintaining a simpler operational model than it’s Kafka Connect sibling. However, Debezium server typically maxes out at 2-3 thousand messages / second.
Debezium Engine
With Debezium Engine, performance is directly tied to your application. In-memory processing without network serialization can deliver extremely low latency for local event handling, with your application receiving change events directly from the database.
This approach offers flexibility to process events synchronously for immediate consistency or asynchronously for higher throughput. You control how back-pressure is applied and how memory is utilized, but this means your application's efficiency directly impacts CDC performance—which is no joke.
Delivery Guarantees
Out of the box, every flavor of Debezium offers at-least-once delivery guarantees. Duplicates are possible (and common) in the event of a crash. With some tuning, Debezium running on Kafka can get close to exactly-once delivery semantics.
Debezium on Kafka Connect
Kafka Connect provides the strongest delivery guarantees in the Debezium ecosystem. Its default at-least-once delivery ensures no data loss, while the optional exactly-once semantics (via exactly.once.support=true) prevent duplicates even after failures—a critical feature for financial and other sensitive data applications.
This strength comes from tight integration between offset management and event delivery. Offsets are stored in Kafka topics and updated transactionally with event publication, allowing connectors to resume precisely where they left off after failures, avoiding both gaps and duplicates in the change stream.
Debezium Server
Debezium Server provides at-least-once delivery, which means every change will be captured, but duplicates are possible after failures. It uses local offset storage that may lag slightly behind event delivery, creating a window where events might be reprocessed after a crash.
Many target systems have their own deduplication mechanisms, making this limitation manageable in practice. For applications requiring strict exactly-once processing, you'll need to implement additional deduplication logic downstream or consider the Kafka Connect deployment instead.
Debezium Engine
With Debezium Engine, you're in control of delivery guarantees. While the API provides at-least-once semantics, it's your responsibility to persist offsets reliably and handle potential duplicates after failures.
This flexibility allows you to integrate offset management with your business logic, potentially creating custom exactly-once implementations. However, for many applications, designing for idempotence—where occasional duplicates are harmlessly processed—is simpler than building exactly-once semantics from scratch.
Ordering Guarantees
Debezium is designed to capture every change from the database in strict order. The Kafka Connect runtime builds on this guarantee by ensuring that changes with the same ID are routed to the same Kafka partition. But Server and Engine rely on your destination configuration / code a bit more.
Debezium on Kafka Connect
With Kafka Connect, ordering follows Kafka's partition-based model. Changes for the same table primary key maintain the exact sequence they occurred in the database because they're routed to the same Kafka partition.
However, there's no global ordering across different keys or tables—events for different records might be processed in a different order than they were committed. This clear partition-based model scales exceptionally well but requires downstream applications to understand these ordering semantics.
Debezium Server
Debezium Server preserves source database commit order per table, but end-to-end ordering ultimately depends on your sink system's capabilities. Some destinations like Google Pub/Sub allow configuring ordering keys to maintain sequence for related events.
By default, single-threaded processing preserves event order as read from the database log. If you enable multi-threading for compatible connectors, you'll need to consider potential reordering between different partition keys, just as with Kafka.
Debezium Engine
Debezium Engine gives you direct control over ordering. The synchronous engine preserves database log order precisely, while the asynchronous engine with multiple threads might reorder events across partitions.
This flexibility allows you to implement custom ordering logic beyond what's possible with other deployment options—for example, ensuring transactional consistency across related tables or synchronizing changes from different sources according to application-specific rules.
Observability and Management
As you might expect, each run time provides different observability. Kafka Connect comes with metrics out of the box, Server has basic health checks and logging, and Engine has no built in monitoring at all.
Debezium on Kafka Connect
Kafka Connect leaves more to be desired, but the REST API lets you monitor connector status, adjust configurations dynamically, and track performance metrics without service restarts—essential capabilities for production environments.
The Kafka ecosystem provides rich metrics through JMX or Prometheus, exposing detailed information about throughput, latency, and errors. This robust tooling makes Kafka Connect well-suited for enterprise environments where operational visibility is critical to maintaining service level objectives.
Debezium Server
Debezium Server takes a minimalist approach to monitoring. It provides a basic health endpoint via Quarkus and standard logging but lacks the dynamic management capabilities of Kafka Connect. Configuration changes require restarts, and there's no built-in API for runtime control.
This simplicity integrates well with container orchestration platforms, where standard health checks and logging are handled through platform capabilities rather than application-specific interfaces. For teams already comfortable with container-based operations, this lightweight approach is probably the bare minimum.
Debezium Engine
With Debezium Engine, observability is entirely your responsibility. There are no built-in monitoring endpoints or management APIs—your application must expose metrics, handle errors, and manage the Engine's lifecycle.
This approach integrates naturally with your application's existing monitoring. You can incorporate CDC metrics into your dashboards, correlate them with business metrics, and build custom alerting based on application-specific criteria. The Engine provides the raw materials—exceptions and callbacks—for you to build exactly the observability layer you need.
Scaling and Fault Tolerance
Kafka Connect offers automatic failover and task distribution across a cluster. Debezium Server requires external orchestration for resilience. Debezium Engine is completely DIY.
Debezium on Kafka Connect
Kafka Connect's distributed architecture provides excellent scaling and resilience out-of-the-box. Connectors run across multiple nodes with automatic task rebalancing and failover. If a worker fails, tasks are transparently redistributed to other workers, resuming from stored offsets with no manual intervention.
This automatic recovery eliminates operational headaches in production environments. As your CDC needs grow, simply add more Connect workers to the cluster, and existing connectors automatically utilize the additional resources. The Connect framework handles leader election, ensuring there's always a coordinator managing the work distribution.
Debezium Server
Debezium Server takes a simpler approach where each instance runs independently. Scaling means deploying multiple server instances, typically one per database source / replication slot (depending on your setup). Fault tolerance relies on external orchestration—typically container platforms or init systems that restart failed processes.
This model is conceptually simpler but requires more external management. For high availability, you might run active-passive pairs with shared offset storage, but implementing proper failover requires careful coordination, typically through your orchestration layer's leader election mechanisms. Also worth remember, that with at-least-once guarantees, each failure will typically correspond to a batch of duplicates.
Debezium Engine
With Debezium Engine, scaling and recovery are entirely determined by your application architecture. The Engine has no built-in clustering—it's just a library in your application process, so your design decisions shape how CDC scales and recovers from failures.
Typically, you'd designate one application instance to run the Engine per database, using distributed locks for coordination across replicas. This flexibility lets you integrate CDC with your application's existing resilience patterns, but it requires significant implementation effort compared to the out-of-the-box capabilities of Kafka Connect.
What's Right for You?
Your CDC deployment choice comes down to a few simple trade-offs:
Need complete control and don't mind a bit project? Debezium Engine lets you embed CDC directly in your application. It's flexible but requires you to build everything yourself—from error handling to scaling.
Already committed to Kafka? Debezium on Kafka Connect works, though it comes with significant operational overhead. You'll need to manage Kafka clusters, tune configurations, and reconcile yourself to the complexity of maintaining yet another distributed system.
Want simplicity but still need reliability? While Debezium Server offers a lightweight alternative to Kafka Connect, it leaves you in an awkward spot. Less capable than Kafka Connect, less reliable, but not as configurable. It can surely work, but this is where alternatives like Sequin really shine.
Whichever path you choose, understanding these trade-offs ensures you'll implement CDC in a way that fits your specific needs. But for teams looking to move fast without sacrificing reliability, modern managed solutions offer compelling advantages over the DIY approach.