Re: Stopping logical replication protocol - Mailing list pgsql-hackers

From Vladimir Gordiychuk
Subject Re: Stopping logical replication protocol
Date
Msg-id CAFgjRd369=8KJt=Qk6ec5v==NrNBDRzV=8Zk0KYxhCaHnExkBw@mail.gmail.com
Whole thread Raw
In response to Re: Stopping logical replication protocol  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Stopping logical replication protocol  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Hi. It has already passed a few months but patch still have required review state. Can I help to speed up the review, or should i wait commitfest? 
I plane complete changes in pgjdbc drive inside PR https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem with not available stop logical replication. 

2016-05-11 7:25 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:
On 11 May 2016 at 06:47, Vladimir Gordiychuk <folyga@gmail.com> wrote:
Same thread, I just think these are two somewhat separate changes. One is just in the walsender and allows return to command mode during waiting for WAL. The other is more intrusive into the reorder buffer etc and allows aborting decoding during commit processing. So two separate patches make sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole transaction? Why we can't process logical decoding via WalSndLoop LSN by LSN as it work in physycal replication? For example if transaction contains in more them one LSN, first we decode and send "begin", "part data from current LSN" and then returns to WalSndLoop on the next iteration we send "another part data", "commit". I don't research in this way, because I think it will be big changes in comparison callback that stop sending.

There are two parts to that. First, why do we reorder at all, accumulating a whole transaction in a reorder buffer until we see a commit then sending it all at once? Second, when sending, why don't we return control to the walsender between messages? 

For the first: reordering xacts server-side lets the client not worry about replay order. It just applies them as it receives them. It means the server can omit uncommitted transactions from the stream entirely and clients can be kept simple(r). IIRC there are also some issues around relcache invalidation handling and time travel that make it desirable to wait until commit before building a snapshot and decoding, but I haven't dug into the details. Andres is the person who knows that area best.

As for why we don't return control to the walsender between change callbacks when processing a reorder buffer at commit time, I'm not really sure but suspect it's mostly down to easy API and development. If control returned to the walsender between each change we'd need an async api for the reorder buffer where you test to see if there's more unprocessed work and call back into the reorder buffer again if there is. So the reorder buffer has to keep state for the progress of replaying a commit in a separate struct, handle repeated calls to process work, etc. Also, since many individual changes are very small that could lead to a fair bit of overhead; it'd probably want to process a batch of changes then return. Which makes it even more complicated.

If it returned control to the caller between changes then each caller would also need to have the logic to test for more work and call back into the reorder buffer. Both the walsender and SQL interface would need it. The way it is, the loop is just in one place.

It probably makes more sense to have a callback that can test state and abort processing, like you've introduced. The callback could probably even periodically check to see if there's client input to process and consume it.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: [COMMITTERS] pgsql: Modify BufferGetPage() to prepare for "snapshot too old" feature
Next
From: Martín Marqués
Date:
Subject: Re: pg_dump with tables created in schemas created by extensions