Announcing Sequin v0.5

Anthony Accomazzo

@accomazzo

•

Oct 22, 2024

•

5 min read

We just released Sequin v0.5. I'm really excited about this release, many of the core building blocks are in place.

Sequin streams data out of your Postgres database. You can use it to replicate data from your existing tables to other apps, databases, caches, materialized views, or frontend clients. Or you can use it to build event processing workflows, such as triggering side effects when data in Postgres changes.

Sequin is open source and available on GitHub.

Let me break down what this means and why it's powerful:

What is Sequin?

We all know Postgres is great for storing and querying data. But what about when you need to stream data?

Application data almost always starts in a database like Postgres. Then teams commonly run into two scenarios:

1. They need to replicate data from Postgres to other apps or clients in real-time.
2. They need to build event processing, where workers process Postgres changes concurrently and exactly once.

This forces a team to adopt a new system. For example, they might decide to use a system like Kafka. And maybe a tool like Debezium to copy their Postgres data into Kafka so they can access their Postgres tables via a streaming interface.

I’ve been burned by this. First, it adds immediate operational complexity. I know how to scale, monitor, and optimize Postgres. Understanding a new system and learning how to operate it is hard.

Second, you now have a syncing problem. You need to synchronize Postgres with your streaming system. What happens when you migrate a table, by e.g. adding or removing a column? Postgres applies your data model change atomically, but it can be a challenge to propagate to your stream.

Last, it seems we’ve come to accept that when data is in a streaming system, it's in a black box: challenging to query, migrate, and debug. This opacity complicates every aspect of working with streaming systems.

So, we asked: how can we augment Postgres to do this? We already know how to operate and scale Postgres. My row is already in Postgres. Why do we need to maintain a copy in another system in order to stream it?

Postgres offers some capabilities for these use cases, but falls short in key areas:

Postgres is not a log: Postgres tables are not append-only logs. You can mutate rows, and with MVCC, there are actually multiple simultaneous "correct" views of your database. This makes it hard to both capture and order changes and rows.
Logical replication slots are powerful but limited: Logical replication slots lack persistence, easy accessibility, and consumption tracking needed for robust streaming. They don’t support all the features developers need. (It wasn't built for all this!)
Postgres doesn't have the concept of consumers/delivery: To get exactly-once processing of data, you need to keep track of which messages have been delivered, which need to be re-delivered, and where in a table each consumer is.

So we built Sequin to add these features to Postgres. You connect Sequin to your Postgres database. Sequin blends source tables and the WAL to present consumers and clients a consistently ordered stream of rows (what we call a Sequence). It also adds all the business logic on top of Postgres around consumers and message delivery.

This means a consumer can traverse a table from start to finish safely, and process each row once. And when a consumer is "caught up," it will receive row changes in real-time.

It has fewer moving parts than Debezium. You don’t have to stand up Kafka. But perhaps as important, your Postgres table remains the source of truth. Data isn't copied into a second system. To deliver a message, Sequin pulls rows directly from your Postgres table. This means you only need to run a migration or update rows in one place, your database.

How it works

Sequin has three primary components:

A Sequence constructor that weaves together table rows and the WAL to order Postgres rows consistently and stream them in real-time.
A state management system for consumer groups and message delivery.
A WAL capturing feature that inserts WAL events into an event log table in your database.

Sequence constructor

As we discuss at length, Postgres’ Multiversion concurrency control (MVCC) makes streaming Postgres challenging:

You can’t rely on tables alone, as columns do not provide a consistent sort over time. Datetimes and sequences can commit out-of-order.
But you can’t rely on the WAL alone, as it has a finite retention period.

So, Sequin uses a combination of the two to produce a consistently-ordered Sequence.

With this Sequence, a consumer group can page forward or backward through time, rewind to any row in the table, etc. And without missing any rows/changes.

Consumer groups and message delivery

Sequin includes a state management system for consumer groups, cursors, and message delivery. This means your applications and services have several methods for reading and processing data from table streams.

Sequin supports consumer groups. Consumer groups are a common pattern in streaming systems, and allow you to process messages in a safe, concurrent way. You can have 1 or 1,000 workers that all belong to the same group. And Sequin ensures that messages are processed across the group exactly once.

At the moment, we have two interfaces for consumer groups:

1. The Consume API is an HTTP endpoint you can pull messages from.
2. Webhook subscriptions will push data to endpoints you specify.

The Consume API works a lot like SQS, and offers a much nicer interface than Kafka’s partitions/offsets. And because Sequin doesn’t partition data, you can increase or decrease the number of workers you have pulling from a consumer group at any time.

For webhooks, Sequin’s webhook runtime has built-in retries, back-offs, and back-pressure. Our console has a control plane for tracking in-flight webhooks as well as any error responses your system as returned. Webhooks allow you to process records concurrently with exactly-once processing guarantees.

WAL capture

Sometimes, you don’t just want to stream current records in a table. You want to capture changes to records (i.e. inserts, updates, and deletes) and stream that log of changes.

Sequin supports this first-class. You can create a new table in your database, (e.g. order_logs). Then, you can setup a WAL Pipeline in Sequin (e.g. orders → order_logs). Sequin will capture all the changes that happen in the source table and insert them into the log table. You can optionally specify which events to capture (e.g. inserts, updates, or deletes) and filter events down to those matching certain column values.

Then, with your record changes in a Postgres table, it’s easy to stream with Sequin using all the tools above.

This makes Sequin a great fit for event-driven workflows where the event is a change to a record in your database.

Want to try it?

You can clone and run Sequin (https://github.com/sequinstream/sequin)
You can try it out for free on our cloud (http://console.sequinstream.com)
You can deploy it on Railway with this template: https://railway.app/template/TXSiv2

Join our Discord if you need help!