Thread: Bottled Water: PostgreSQL to Kafka replication

Bottled Water: PostgreSQL to Kafka replication

From
Martin Kleppmann
Date:
Hi PostgreSQL world,

I'd like to announce a new open source project, called "Bottled Water", for getting data from PostgreSQL into Kafka:
http://blog.confluent.io/2015/04/23/bottled-water-real-time-integration-of-postgresql-and-kafka/
https://github.com/confluentinc/bottledwater-pg/

In case you're not aware of Kafka (http://kafka.apache.org/), it's an open source message broker that was originally
developedat LinkedIn and is now a lively Apache project. Unlike many other messaging systems (AMQP, JMS etc), it is
structuredas a commit log, which makes it well suited for replicating data from one system to another. 

Bottled Water uses PostgreSQL 9.4's logical decoding feature to extract a consistent snapshot of a database, plus an
ongoingstream of logical changes. Data is encoded in Avro (http://avro.apache.org/), a language-independent
serializationformat, with schemas that are automatically derived from the PostgreSQL table schemas. Once the data is in
Kafka,it's easier to import into downstream systems, such as full-text search indexes, caches, data warehouses, stream
analyticssystems, auditing and monitoring tools, etc. 

The blog post above has more detail on the design and the rationale behind it. This is an alpha release that is not yet
fitfor production use, but it's ready for experimentation. Feedback and contributions welcome! 

Martin