Thread: Logical WAL streaming & START_REPLICATION

Logical WAL streaming & START_REPLICATION

From
Joe Hammerman
Date:
Hi pgsql-jdbc users email distribution list,

We are using the JDBC Driver bundled with the Debezium project. We are trying to reset the replication stream from a WAL log point in the past. This effort is in service of developing a disaster recovery plan, with the idea that a flushed LSN may have been in the Kafka Broker Pool but not consumed. The missing WAL files may still exist on disk, so it would be ideal if we could begin streaming from a given WAL log file.

To that end, we have configured the Kafka Connect daemon to begin streaming from a paritcular WAL log file.

Consider the following logs which show the creation of a new logical replication slot:


Although START_REPLICATION is called with the x_log position we desire, streaming is picked up from the current WAL log position.

Is there a method to achieve our ends? Our alternative plan is to add a republish_count field to the the transaction, so that we can craft an SQL statement to update the rows from say, an hour ago to the present time.


Re: Logical WAL streaming & START_REPLICATION

From
Craig Ringer
Date:
On 9 March 2018 at 04:49, Joe Hammerman <jhammerman@squarespace.com> wrote:
 
Although START_REPLICATION is called with the x_log position we desire, streaming is picked up from the current WAL log position.

That's how replication slots work. It's necessary to allow proper client- and server-side crash recovery. They're really intended for use with a replication origin or something similar client side to track client progress.
 
Is there a method to achieve our ends? Our alternative plan is to add a republish_count field to the the transaction, so that we can craft an SQL statement to update the rows from say, an hour ago to the present time.

What you'll need to do is find a way to delay sending replication slot replay confirmations  until you know the change is recorded persistently in Kafka.

>  that a flushed LSN may have been in the Kafka Broker Pool but not consumed. 

If your system will forget work on crash, it's not flushed, and you shouldn't report it flushed.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical WAL streaming & START_REPLICATION

From
Craig Ringer
Date:
On 9 March 2018 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:
 
If your system will forget work on crash, it's not flushed, and you shouldn't report it flushed.


I haven't checked to see if PgJDBC actually exposes separate control of the reported flush position though.  If it doesn't, it really must in order to make replication slots work properly.

How this should work is:

- You receive a txn and PgJDBC sends feedback updating the received position but NOT flush position

- You send that txn's changes on to wherever they're going and tag them with the txn commit lsn

- When the recipient of the changes confirms it has them stored persistently and is crash safe, report the change's commit lsn to PgJDBC so it can update the flush position sent in feedback on the replication connection.

If the upstream crashes, it'll be able to restart from the last confirmed position. And you made sure you don't need anything older than that.

If the downstream crashes, you must either tolerate receiving duplicate transactions, or you must ensure that flushing to persistent storage on the downstream also atomically records the latest flushed upstream lsn. For example, you might put it in your Kafka messages, and in crash recovery, find the most recent / highest lsn you successfully stored with Kafka. Then you send *that* lsn when starting replication. This cannot make replication go backwards to an older position than you previously told the server you confirmed, but it *can* make the server skip over txns you didn't confirm to it but actually stored locally.

The client side part ensures that if you commit something to local storage but something crashes before the next feedback message reaches the server, the server won't send you a duplicate txn next time.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical WAL streaming & START_REPLICATION

From
Dave Cramer
Date:


On 9 March 2018 at 01:55, Craig Ringer <craig@2ndquadrant.com> wrote:
On 9 March 2018 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:
 
If your system will forget work on crash, it's not flushed, and you shouldn't report it flushed.


I haven't checked to see if PgJDBC actually exposes separate control of the reported flush position though.  If it doesn't, it really must in order to make replication slots work properly.



Re: Logical WAL streaming & START_REPLICATION

From
Craig Ringer
Date:
On 10 March 2018 at 01:50, Dave Cramer <pg@fastcrypt.com> wrote:


On 9 March 2018 at 01:55, Craig Ringer <craig@2ndquadrant.com> wrote:
On 9 March 2018 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:
 
If your system will forget work on crash, it's not flushed, and you shouldn't report it flushed.


I haven't checked to see if PgJDBC actually exposes separate control of the reported flush position though.  If it doesn't, it really must in order to make replication slots work properly.



Great!

I hope that answers your question then Joe.

Dave, if you'd find it useful to add the explanation I wrote to the docs, please feel free to crib as desired.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical WAL streaming & START_REPLICATION

From
Dave Cramer
Date:



On 10 March 2018 at 04:44, Craig Ringer <craig@2ndquadrant.com> wrote:
On 10 March 2018 at 01:50, Dave Cramer <pg@fastcrypt.com> wrote:


On 9 March 2018 at 01:55, Craig Ringer <craig@2ndquadrant.com> wrote:
On 9 March 2018 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:
 
If your system will forget work on crash, it's not flushed, and you shouldn't report it flushed.


I haven't checked to see if PgJDBC actually exposes separate control of the reported flush position though.  If it doesn't, it really must in order to make replication slots work properly.



Great!

I hope that answers your question then Joe.

Dave, if you'd find it useful to add the explanation I wrote to the docs, please feel free to crib as desired.


Craig, good, idea.


Thanks!