Thread: Stopping logical replication protocol

Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

06 May 2016, 15:23:50

Hi all,

During implementing logical replication protocol for pgjdbc https://github.com/pgjdbc/pgjdbc/pull/550 I faced with strange behavior of the walcender.c:

When WAL consumer catchup master and change his state to streaming, not available normally complete replication by send CopyDone message until will not generate/create new WAL record. It occurs because logical decoding located in WalSndWaitForWal until will not available next WAL record, and it method receive message from consumer even reply on CopyDone with CopyDone but ignore exit from loop and we can wait many times waiting CommandStatus & ReadyForQuery packages on consumer.
Logical decoding ignore message from consumer during decoding and writing transaction in socket(WalSndWriteData). It affect long transactions with many changes. Because for example if we start decoding transaction that insert 1 million records and after consume 1% of it date we
decide stop replication, it will be not available until whole million record will not send to consumer. And I see the following problems because of this:

It will generate many not necessary network traffic.
Not available normally stop logical replication, because consumer will fail with timeout and it broke scenario when consumer put data to external system asynchronously and by success callback send feedback to master about flushed LSN, but by fail callback stop replication, and restart it from the last successfully sent LSN to external system, because slot will be busy and it will increase latency for streaming system.
Consumer can send keepalive message with required reply flag to master that right now decoding and sending huge transaction, and after some time disconnect master because because reply on keepalive will not get. I faced with it problem during research bottledwater-pg extension that fail each time during receive transaction in that was modify 1 million record(restart_lsn was before huge transaction so extension fail again and again) disconnect master because not keep keepalive package too long.

I prepare small patch that fix problems describe below. Now WalSndWriteData first check message from consumer and during decode transaction check that replication still active. KeppAlive message now not send if was get CopyDone package(keep alive now not necessary we preparing to complete). WalSndWaitForWal after get CopyDone exit from loop. With apply this patch I get next measurements

Before
-----
logical start and stopping: 15446ms
logical stopping: 13820ms

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: 26ms

physical start and stopping: 458ms
physical stopping: 329ms

Where for measurements was use code from pgjdbc

For physical replicaion:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(startLSN)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

https://github.com/Gordiychuk/postgres/tree/stopping_logical_replication

Attachment

0001-Stop-logical-decoding-by-get-CopyDone.patch

Re: Stopping logical replication protocol

From

Oleksandr Shulgin

Date:

06 May 2016, 16:46:05

On Fri, May 6, 2016 at 5:23 PM, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Hi all,

During implementing logical replication protocol for pgjdbc https://github.com/pgjdbc/pgjdbc/pull/550 I faced with strange behavior of the walcender.c:
When WAL consumer catchup master and change his state to streaming, not available normally complete replication by send CopyDone message until will not generate/create new WAL record. It occurs because logical decoding located in WalSndWaitForWal until will not available next WAL record, and it method receive message from consumer even reply on CopyDone with CopyDone but ignore exit from loop and we can wait many times waiting CommandStatus & ReadyForQuery packages on consumer.
Logical decoding ignore message from consumer during decoding and writing transaction in socket(WalSndWriteData). It affect long transactions with many changes. Because for example if we start decoding transaction that insert 1 million records and after consume 1% of it date we
decide stop replication, it will be not available until whole million record will not send to consumer.

How exactly are you stopping the replication? If you just stop reading you'll likely to hit some problems, but what if you also close the connection? I don't think there is any other supported way to do it.

I was working last year on adding support for replication protocol to the Python driver: https://github.com/psycopg/psycopg2/pull/322 It would be nice if you could skim through this implementation, since it didn't receive a whole lot of review.

Cheers.

Alex

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

06 May 2016, 16:54:47

Replication work via Copy API, so for stop replication walcender.c wait CopyDone. My communication seems like this

FE=> StartReplication(query: START_REPLICATION SLOT pgjdbc_logical_replication_slot LOGICAL 0/18FCFD0 ("include-xids" 'false', "skip-empty-xacts" 'true'))
FE=> Query(CopyStart)
<=BE CopyBothResponse
FE=> StandbyStatusUpdate(received: 0/18FCFD0, flushed: 0/0, applied: 0/0, clock: Tue May 03 22:13:14 MSK 2016)
FE=> CopyData(34)
<=BE CopyData
<=BE CopyData
<=BE Keepalive(lastServerWal: 0/18FCFD0, clock: Tue May 03 22:13:14 MSK 2016 needReply: false)
<=BE XLogData(currWal: 0/0, lastServerWal: 0/0, clock: 0)
<=BE CopyData
<=BE XLogData(currWal: 0/18FD0A0, lastServerWal: 0/18FD0A0, clock: 0)
<=BE CopyData
<=BE XLogData(currWal: 0/18FD1B0, lastServerWal: 0/18FD1B0, clock: 0)
FE=> StopReplication
<=BE CopyData
FE=> CopyDone
<=BE CopyDone
<=BE CopyData
... after few seconds
<=BE CommandStatus(COPY 0)
<=BE ReadyForQuery(I)

The main idea that i want work with same connection without close it.

2016-05-06 19:45 GMT+03:00 Oleksandr Shulgin <oleksandr.shulgin@zalando.de>:

On Fri, May 6, 2016 at 5:23 PM, Vladimir Gordiychuk <folyga@gmail.com> wrote:
Hi all,

During implementing logical replication protocol for pgjdbc https://github.com/pgjdbc/pgjdbc/pull/550 I faced with strange behavior of the walcender.c:
When WAL consumer catchup master and change his state to streaming, not available normally complete replication by send CopyDone message until will not generate/create new WAL record. It occurs because logical decoding located in WalSndWaitForWal until will not available next WAL record, and it method receive message from consumer even reply on CopyDone with CopyDone but ignore exit from loop and we can wait many times waiting CommandStatus & ReadyForQuery packages on consumer.
Logical decoding ignore message from consumer during decoding and writing transaction in socket(WalSndWriteData). It affect long transactions with many changes. Because for example if we start decoding transaction that insert 1 million records and after consume 1% of it date we
decide stop replication, it will be not available until whole million record will not send to consumer.
How exactly are you stopping the replication? If you just stop reading you'll likely to hit some problems, but what if you also close the connection? I don't think there is any other supported way to do it.

I was working last year on adding support for replication protocol to the Python driver: https://github.com/psycopg/psycopg2/pull/322 It would be nice if you could skim through this implementation, since it didn't receive a whole lot of review.

Cheers.
--
Alex

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

09 May 2016, 01:28:03

On 6 May 2016 at 23:23, Vladimir Gordiychuk <folyga@gmail.com> wrote:

I prepare small patch that fix problems describe below. Now WalSndWriteData first check message from consumer and during decode transaction check that replication still active. KeppAlive message now not send if was get CopyDone package(keep alive now not necessary we preparing to complete). WalSndWaitForWal after get CopyDone exit from loop. With apply this patch I get next measurements

I'll review this, but based on the description and concept I agree it's useful.

I have been frustrated in the past by the inability to terminate the logical replication stream and return to command mode without a disconnect/reconnect so I'd like to have this.

AFAIK there's no particular reason we can't do it, it's just not implemented. It does have to be a clean exit though, where the logical decoding context is destroyed, the xact it was running in is closed, etc. Like for the SQL interface.

You still won't want to do it too often because there's a cost to stopping decoding and restarting it. We have to re-read from the restart_lsn and reassemble transactions again up to the point where we can start sending them to the client again. Especially if you're most of the way through decoding a big xact, or if there's a long-running xact holding restart_lsn down this might cost a bit. But no worse than repeatedly calling the SQL logical decoding interface.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

09 May 2016, 02:39:03

On 6 May 2016 at 23:23, Vladimir Gordiychuk <folyga@gmail.com> wrote:

I prepare small patch that fix problems describe below. Now WalSndWriteData first check message from consumer and during decode transaction check that replication still active.

OK, upon looking closer I'm not sure I agree with this approach.

In WalSndLoop(...) we currently call ProcessRepliesIfAny() which handles a client-initiated CopyDone by replying with its own CopyDone. WalSndLoop then just waits until the send buffer is flushed or the client disconnects and returns. That looks to be pretty much what's needed... but only when we actually execute that code path.

XLogSendLogical calls WalSndWaitForWal(...) (via xlogreader and logical_read_xlog_page) when waiting for something to process, when the client will be blocked on a socket read. It also processes client CopyDone, but it doesn't seem to test streamingDoneSending or streamingDoneReceiving like the outer WalSndLoop does. So it looks like it continues waiting for WAL when we know the client sent CopyDone and doesn't want anything more.

I think we should be testing that in WalSndWaitForWal, like we do in WalSndLoop, and bailing out. You've done that part, and I agree that's good.

It seems like what you're also trying to allow interruption deeper than that, when we're in the middle of processing a reorder buffer commit record and streaming that to the client. You're introducing an is_active member (actually a callback, though name suggests it's a flag) in struct ReorderBuffer to check whether a CopyDone is received, and you're skipping ReorderBuffer commit processing when the callback returns false. The callback returns "!streamingDoneReceiving && !streamingDoneSending" i.e. it's false if either end has sent CopyDone. streamingDoneSending and streamingDoneSending are only set in ProcessRepliesIfAny, called by WalSndLoop and WalSndWaitForWal. So the idea is, presumably, that if we're waiting for WAL from XLogSendLogical we skip processing of any commit records and exit.

That seems overcomplicated.

When WalSndWaitForWAL is called by logical_read_xlog_page, logical_read_xlog_page can just test streamingDoneReceiving and streamingDoneSending. If they're set it can skip the page read and return -1, which will cause the xlogreader to return a null record to XLogSendLogical. That'll skip the decoding calls and return to WalSndLoop, where we'll notice it's time to exit.

That way there's no need to modify the reorder buffer structs, add callbacks, etc.

Make sense?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

09 May 2016, 02:44:04

I've created a CF entry for this patch:

https://commitfest.postgresql.org/10/621/

set as waiting-on-author. Vladimir, I didn't find a PostgreSQL community user account for you, so I couldn't set you up as the author. What's your PostgreSQL community username?

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

09 May 2016, 08:31:53

What's your PostgreSQL community username?

gordiychuk

It seems like what you're also trying to allow interruption deeper than that, when we're in the middle of processing a reorder buffer commit record and streaming that to the client. You're introducing an is_active member (actually a callback, though name suggests it's a flag) in struct ReorderBuffer to check whether a CopyDone is received, and you're skipping ReorderBuffer commit processing when the callback returns false. The callback returns "!streamingDoneReceiving && !streamingDoneSending" i.e. it's false if either end has sent CopyDone. streamingDoneSending and streamingDoneSending are only set in ProcessRepliesIfAny, called by WalSndLoop and WalSndWaitForWal. So the idea is, presumably, that if we're waiting for WAL from XLogSendLogical we skip processing of any commit records and exit.

That seems overcomplicated.

When WalSndWaitForWAL is called by logical_read_xlog_page, logical_read_xlog_page can just test streamingDoneReceiving and streamingDoneSending. If they're set it can skip the page read and return -1, which will cause the xlogreader to return a null record to XLogSendLogical. That'll skip the decoding calls and return to WalSndLoop, where we'll notice it's time to exit.

ProcessRepliesIfAny also now executes in WalSdnWriteData. Because during send data we should also check message from client(client can send CopyDone, KeepAlive, Terminate).

@@ -1086,14 +1089,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,

memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],

tmpbuf.data, sizeof(int64));

- /* fast path */

- /* Try to flush pending output to the client */

- if (pq_flush_if_writable() != 0)

- WalSndShutdown();

- if (!pq_is_send_pending())

- return;

The main idea is that we can get CopyDone from client in the next functions: WalSdnLoop, WalSndWaitForWal, WalSndWriteData. All of this methods can take a long time, because WalSndWaitForWal can wait new transaction and on not active db it can take long enough, WalSndWriteData can send big transaction that also lead to ignore messages from client until long time(In my example above for 1 million object changes, walsender ignore messages 13 seconds and not allow reuse connection). When client send CopyDone they don't want receive message anymore for current LSN. For example physical replication can be interrupt in the middle of transaction that affect more than one LSN.

Maybe I not correct undestand documentation, but I want reuse same connection without reopen it, because open new connection takes too long. Is it correct use case or CopyDOne it side effect of copy protocol and for complete replication need use always Terminate package and reopen connection?

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

10 May 2016, 01:50:42

ProcessRepliesIfAny also now executes in WalSdnWriteData. Because during send data we should also check message from client(client can send CopyDone, KeepAlive, Terminate).

Ah, I didn't spot that ProcessRepliesIfAny() is already called from WalSndWriteData in the current codebase.

I agree that it would be useful to check for client-initiated CopyDone there, clear the walsender write queue, close the decoding session and return to command mode.

Nothing gets sent until we've formatted the data we want to send into a StringInfo, though. The comments on WalSndPrepareWrite even note that there's no guarantee anything will get done with the data. There should be no need to test whether we're processing a Datum, since we won't call ProcessRepliesIfAny() or WalSndWriteData() while we're doing that.

I think all you should need to do is test streamingDoneSending and streamingDoneReceiving in WalSndWriteData() and WalSndWaitForWal().

I outlined how I think WalSndWaitForWal() should be handled earlier. Test streamingDoneReceiving and streamingDoneSending in logical_read_xlog_page and return -1.

For WalSndWriteData() it's more complicated because there's data waiting to flush. I don't see any way to remove data from the send buffer in the libpq async API, so you'll have to send whatever's already been passed to libpq for sending using pq_putmessage_noblock(...). This is necessary for protocol integrity too, since you need to send a complete CopyData packet. The length is specified at the start of the CopyData, so once you start sending it you are committed to finishing it. Otherwise the client has no way to know when to stop expecting CopyData and look for the next protocol message. It can wait until its socket read blocks, but that doesn't mean there isn't more of the CopyData in the upstream server's send buffers, buffered in a router, etc.

If you want a way to abort in the middle of sending a datum (or in fact an individual logical change) you'd need a much bigger change to the replication protocol, where we chunk up big Datum values into multiple CopyData messages with continuations. That'd be quite nice to have, especially if it let us stream the Datum out without assembling the whole thing in memory in a StringInfo first, but it's a much bigger change.

If you just return from WalSndWriteData(), the data is still queued by libpq, and libpq will still finish sending it when you try to send something else. So I don't see any point taking much action in response to CopyDone in WalSndWriteData(). Maybe you could check if the message we're about to send is large and if it is, check for client input before we start sending and bail out before the pq_putmessage_noblock() call. But once we've done that we're committed to finishing sending that message and have to wait until pq_is_send_pending() return false before we can send our own CopyDone and exit.

Since WalSndWriteData() must write a whole message from the decoding plugin before it can return, and when it returns we'll return from LogicalDecodingProcessRecord(..) to XLogSendLogical(...), we can just test for streamingDoneReceiving and streamingDoneSending there.

I guess if you add a flag or callback to the logical decoding context you could have the logical decoding plugin interrupt processing of an insert/update/delete row change message between processing of individual datums within it and return without calling OutputPluginWrite(...) so the data is never queued for WalSndWriteData(...). This would make a difference if you have wide rows... but don't currently process client writes during output plugin callbacks. You'd have to add a logical decoding API function clients could call to process client replies and set the flag. (They can't call ProcessRepliesIfAny from walsender.c as it's static and not meant to be called from within a decoding plugin anyway).

> I want reuse same connection without reopen it, because open new connection takes too long.

Fair enough. Though I don't understand why you'd be doing this often enough that you'd care about reopening connections. What is the problem you are trying to solve with this? The underlying reason you need this change?

> Is it correct use case or CopyDone it side effect of copy protocol and for complete replication need use always Terminate package and reopen connection?

Client-initiated CopyDone for COPY BOTH should be just fine in protocol terms. There's partial support for it in the walsender already.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

10 May 2016, 02:49:48

On 10 May 2016 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:

I outlined how I think WalSndWaitForWal() should be handled earlier. Test streamingDoneReceiving and streamingDoneSending in logical_read_xlog_page and return -1.

OK, so thinking about this some more, I see why you've added the callback within the reorder buffer code. You want to stop processing of a transaction after we've decoded the commit and are looping over the changes within ReorderBufferCommit(...), which doesn't know anything about the walsender. So we could loop for a long time within WalSndLoop -> XLogSendLogical -> LogicalDecodingProcessRecord if the record is a commit, as we process each change and send it to the client.

So aborting processing and returning between individual changes in ReorderBufferCommit(...) seems very reasonable. I agree that some kind of callback is needed because the walsender uses static globals to control its send/receive stop logic. I don't like the naming "is_active"; maybe reverse the sense and call it "stop_decoding_cb" ? Or "continue_decoding_cb"? Unsure.

Anyway, here's a draft patch that does the parts other than the reorder buffer processing stop. It passes 'make check' and the src/test/recovery tests, but I haven't written anything to verify the client-initiated abort handling. You have test code for this and I'd be interested to see the results.

This patch doesn't attempt to allow decoding to be aborted during processing of an xlog record, including a commit. So it'll still attempt to send whole transactions. But it should allow clients to send CopyDone when we're waiting for new WAL to be generated and return to command mode then.

I think it's worth making the next step, where you allow reorder buffer commit processing to be interrupted, into a separate patch on top of this one. They're two separate changes IMO.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

10 May 2016, 11:42:04

Fair enough. Though I don't understand why you'd be doing this often enough that you'd care about reopening connections. What is the problem you are trying to solve with this? The underlying reason you need this change?

First reason it clear API in pgjdc. Second reason it ability fast enough rollack to one of the previous LSN and repeat it. My use case for second reason - I have logical decoding extension that prepare only data as key-value pair without information about (insert, update, delete) for example if it delete as key I use primary key for table and as value null. Via pgjdc by replication protocol connects receiver data, consumer group changes to batch and send it to Kafka. If some problem occurs during delivery to kafka consumer, I should stop current replication, go back to success LSN and repeat all messages from success LSN. I think it operation can be quite common, but reopen connection or not stopping replication will increase latency.

Anyway, here's a draft patch that does the parts other than the reorder buffer processing stop. It passes 'make check' and the src/test/recovery tests, but I haven't written anything to verify the client-initiated abort handling. You have test code for this and I'd be interested to see the results.

What about keepAlive package when we already send/receive CopyDone? Is It really necessary?

I measure current fix without include long transaction, only start replication and stop it, when on postgresql absent active transactions and get next results:

Before:
12:35:31.403 (2) FE=> StartReplication(query: START_REPLICATION SLOT pgjdbc_logical_replication_slot LOGICAL 0/7287AC00 ("include-xids" 'false', "skip-empty-xacts" 'true'))
12:35:31.403 (2) FE=> Query(CopyStart)
12:35:31.404 (2) <=BE CopyBothResponse
12:35:31.406 (2) FE=> StopReplication
12:35:31.406 (2) FE=> CopyDone
12:35:31.406 (2) <=BE CopyData
12:35:31.407 (2) [107, 0, 0, 0, 0, 114, -121, -84, 0, 0, 1, -43, 120, 106, 53, -85, 21, 0]
12:35:31.407 (2) <=BE CopyDone
12:35:31.407 (2) <=BE CopyData
12:35:31.407 (2) [107, 0, 0, 0, 0, 114, -121, -84, 0, 0, 1, -43, 120, 106, 53, -76, 29, 0]
12:35:52.120 (2) <=BE CommandStatus(COPY 0)
12:35:52.120 (2) <=BE CommandStatus(SELECT)
12:35:52.120 (2) <=BE ReadyForQuery(I)
12:35:52.126 (2) FE=> Terminate

Timing:
Start and stopping time: 20729ms
Stopping time: 20720ms

After:
14:30:02.050 (2) FE=> StartReplication(query: START_REPLICATION SLOT pgjdbc_logical_replication_slot LOGICAL 0/728A06C0 ("include-xids" 'false', "skip-empty-xacts" 'true'))
14:30:02.050 (2) FE=> Query(CopyStart)
14:30:02.051 (2) <=BE CopyBothResponse
14:30:02.056 (2) FE=> StopReplication
14:30:02.057 (2) FE=> CopyDone
14:30:02.057 (2) <=BE CopyData
14:30:02.057 (2) [107, 0, 0, 0, 0, 114, -118, 6, -64, 0, 1, -43, 122, 3, -69, 107, 76, 0]
14:30:02.058 (2) <=BE CopyDone
14:30:02.058 (2) <=BE CommandStatus(COPY 0)
14:30:02.058 (2) <=BE CommandStatus(SELECT)
14:30:02.058 (2) <=BE ReadyForQuery(I)
14:30:02.068 (2) FE=> StandbyStatusUpdate(received: 0/728A06C0, flushed: 0/0, applied: 0/0, clock: Tue May 10 14:30:02 MSK 2016)

Timing:
Start and stopping time: 27ms
Stopping time: 11ms

So aborting processing and returning between individual changes in ReorderBufferCommit(...) seems very reasonable. I agree that some kind of callback is needed because the walsender uses static globals to control its send/receive stop logic. I don't like the naming "is_active"; maybe reverse the sense and call it "stop_decoding_cb" ? Or "continue_decoding_cb"? Unsure.

continue_decoding_cb sounds good.

I think it's worth making the next step, where you allow reorder buffer commit processing to be interrupted, into a separate patch on top of this one. They're two separate changes IMO.

We will continue in the current thread, or new? I interesting in both patch for my solution and pgjbc driver.

2016-05-10 5:49 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 10 May 2016 at 09:50, Craig Ringer <craig@2ndquadrant.com> wrote:

I outlined how I think WalSndWaitForWal() should be handled earlier. Test streamingDoneReceiving and streamingDoneSending in logical_read_xlog_page and return -1.

OK, so thinking about this some more, I see why you've added the callback within the reorder buffer code. You want to stop processing of a transaction after we've decoded the commit and are looping over the changes within ReorderBufferCommit(...), which doesn't know anything about the walsender. So we could loop for a long time within WalSndLoop -> XLogSendLogical -> LogicalDecodingProcessRecord if the record is a commit, as we process each change and send it to the client.

So aborting processing and returning between individual changes in ReorderBufferCommit(...) seems very reasonable. I agree that some kind of callback is needed because the walsender uses static globals to control its send/receive stop logic. I don't like the naming "is_active"; maybe reverse the sense and call it "stop_decoding_cb" ? Or "continue_decoding_cb"? Unsure.

Anyway, here's a draft patch that does the parts other than the reorder buffer processing stop. It passes 'make check' and the src/test/recovery tests, but I haven't written anything to verify the client-initiated abort handling. You have test code for this and I'd be interested to see the results.

This patch doesn't attempt to allow decoding to be aborted during processing of an xlog record, including a commit. So it'll still attempt to send whole transactions. But it should allow clients to send CopyDone when we're waiting for new WAL to be generated and return to command mode then.

I think it's worth making the next step, where you allow reorder buffer commit processing to be interrupted, into a separate patch on top of this one. They're two separate changes IMO.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

10 May 2016, 12:15:32

On 10 May 2016 at 19:41, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Fair enough. Though I don't understand why you'd be doing this often enough that you'd care about reopening connections. What is the problem you are trying to solve with this? The underlying reason you need this change?

First reason it clear API in pgjdc. Second reason it ability fast enough rollack to one of the previous LSN and repeat it. My use case for second reason - I have logical decoding extension that prepare only data as key-value pair without information about (insert, update, delete) for example if it delete as key I use primary key for table and as value null. Via pgjdc by replication protocol connects receiver data, consumer group changes to batch and send it to Kafka. If some problem occurs during delivery to kafka consumer, I should stop current replication, go back to success LSN and repeat all messages from success LSN. I think it operation can be quite common, but reopen connection or not stopping replication will increase latency.

It will, but not tons. The biggest cost (at least if there are any long running xacts) will be replaying since the restart_lsn when you restart the decoding session, which will happen whether you reconnect or just stop decoding and return to command mode.

Anyway, here's a draft patch that does the parts other than the reorder buffer processing stop. It passes 'make check' and the src/test/recovery tests, but I haven't written anything to verify the client-initiated abort handling. You have test code for this and I'd be interested to see the results.

What about keepAlive package when we already send/receive CopyDone? Is It really necessary?

No, I don't think it is, and I applied the change you made to suppress it.

I think it's worth making the next step, where you allow reorder buffer commit processing to be interrupted, into a separate patch on top of this one. They're two separate changes IMO.

We will continue in the current thread, or new? I interesting in both patch for my solution and pgjbc driver.

Same thread, I just think these are two somewhat separate changes. One is just in the walsender and allows return to command mode during waiting for WAL. The other is more intrusive into the reorder buffer etc and allows aborting decoding during commit processing. So two separate patches make sense here IMO, one on top of the other.

I use git and just "git format-patch -2" to prepare a stack of two patches from two separate commits.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

10 May 2016, 17:15:39

in which release can be include first part?

2016-05-10 15:15 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 10 May 2016 at 19:41, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Fair enough. Though I don't understand why you'd be doing this often enough that you'd care about reopening connections. What is the problem you are trying to solve with this? The underlying reason you need this change?

First reason it clear API in pgjdc. Second reason it ability fast enough rollack to one of the previous LSN and repeat it. My use case for second reason - I have logical decoding extension that prepare only data as key-value pair without information about (insert, update, delete) for example if it delete as key I use primary key for table and as value null. Via pgjdc by replication protocol connects receiver data, consumer group changes to batch and send it to Kafka. If some problem occurs during delivery to kafka consumer, I should stop current replication, go back to success LSN and repeat all messages from success LSN. I think it operation can be quite common, but reopen connection or not stopping replication will increase latency.

It will, but not tons. The biggest cost (at least if there are any long running xacts) will be replaying since the restart_lsn when you restart the decoding session, which will happen whether you reconnect or just stop decoding and return to command mode.

Anyway, here's a draft patch that does the parts other than the reorder buffer processing stop. It passes 'make check' and the src/test/recovery tests, but I haven't written anything to verify the client-initiated abort handling. You have test code for this and I'd be interested to see the results.

What about keepAlive package when we already send/receive CopyDone? Is It really necessary?

No, I don't think it is, and I applied the change you made to suppress it.

I think it's worth making the next step, where you allow reorder buffer commit processing to be interrupted, into a separate patch on top of this one. They're two separate changes IMO.

We will continue in the current thread, or new? I interesting in both patch for my solution and pgjbc driver.

Same thread, I just think these are two somewhat separate changes. One is just in the walsender and allows return to command mode during waiting for WAL. The other is more intrusive into the reorder buffer etc and allows aborting decoding during commit processing. So two separate patches make sense here IMO, one on top of the other.

I use git and just "git format-patch -2" to prepare a stack of two patches from two separate commits.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

10 May 2016, 17:22:47

On 11 May 2016 at 01:15, Vladimir Gordiychuk <folyga@gmail.com> wrote:

in which release can be include first part?

Since it's not a bug fix, I don't think it can go in before 9.7.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

10 May 2016, 22:47:21

Same thread, I just think these are two somewhat separate changes. One is just in the walsender and allows return to command mode during waiting for WAL. The other is more intrusive into the reorder buffer etc and allows aborting decoding during commit processing. So two separate patches make sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole transaction? Why we can't process logical decoding via WalSndLoop LSN by LSN as it work in physycal replication? For example if transaction contains in more them one LSN, first we decode and send "begin", "part data from current LSN" and then returns to WalSndLoop on the next iteration we send "another part data", "commit". I don't research in this way, because I think it will be big changes in comparison callback that stop sending.

2016-05-10 20:22 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 11 May 2016 at 01:15, Vladimir Gordiychuk <folyga@gmail.com> wrote:
in which release can be include first part?

Since it's not a bug fix, I don't think it can go in before 9.7.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

11 May 2016, 04:25:33

On 11 May 2016 at 06:47, Vladimir Gordiychuk <folyga@gmail.com> wrote:

Same thread, I just think these are two somewhat separate changes. One is just in the walsender and allows return to command mode during waiting for WAL. The other is more intrusive into the reorder buffer etc and allows aborting decoding during commit processing. So two separate patches make sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole transaction? Why we can't process logical decoding via WalSndLoop LSN by LSN as it work in physycal replication? For example if transaction contains in more them one LSN, first we decode and send "begin", "part data from current LSN" and then returns to WalSndLoop on the next iteration we send "another part data", "commit". I don't research in this way, because I think it will be big changes in comparison callback that stop sending.

There are two parts to that. First, why do we reorder at all, accumulating a whole transaction in a reorder buffer until we see a commit then sending it all at once? Second, when sending, why don't we return control to the walsender between messages?

For the first: reordering xacts server-side lets the client not worry about replay order. It just applies them as it receives them. It means the server can omit uncommitted transactions from the stream entirely and clients can be kept simple(r). IIRC there are also some issues around relcache invalidation handling and time travel that make it desirable to wait until commit before building a snapshot and decoding, but I haven't dug into the details. Andres is the person who knows that area best.

As for why we don't return control to the walsender between change callbacks when processing a reorder buffer at commit time, I'm not really sure but suspect it's mostly down to easy API and development. If control returned to the walsender between each change we'd need an async api for the reorder buffer where you test to see if there's more unprocessed work and call back into the reorder buffer again if there is. So the reorder buffer has to keep state for the progress of replaying a commit in a separate struct, handle repeated calls to process work, etc. Also, since many individual changes are very small that could lead to a fair bit of overhead; it'd probably want to process a batch of changes then return. Which makes it even more complicated.

If it returned control to the caller between changes then each caller would also need to have the logic to test for more work and call back into the reorder buffer. Both the walsender and SQL interface would need it. The way it is, the loop is just in one place.

It probably makes more sense to have a callback that can test state and abort processing, like you've introduced. The callback could probably even periodically check to see if there's client input to process and consume it.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

24 August 2016, 19:26:46

Hi. It has already passed a few months but patch still have required review state. Can I help to speed up the review, or should i wait commitfest?

I plane complete changes in pgjdbc drive inside PR https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem with not available stop logical replication.

2016-05-11 7:25 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 11 May 2016 at 06:47, Vladimir Gordiychuk <folyga@gmail.com> wrote:
Same thread, I just think these are two somewhat separate changes. One is just in the walsender and allows return to command mode during waiting for WAL. The other is more intrusive into the reorder buffer etc and allows aborting decoding during commit processing. So two separate patches make sense here IMO, one on top of the other.

About the second part of the patch. What the reason decode and send whole transaction? Why we can't process logical decoding via WalSndLoop LSN by LSN as it work in physycal replication? For example if transaction contains in more them one LSN, first we decode and send "begin", "part data from current LSN" and then returns to WalSndLoop on the next iteration we send "another part data", "commit". I don't research in this way, because I think it will be big changes in comparison callback that stop sending.

There are two parts to that. First, why do we reorder at all, accumulating a whole transaction in a reorder buffer until we see a commit then sending it all at once? Second, when sending, why don't we return control to the walsender between messages?

For the first: reordering xacts server-side lets the client not worry about replay order. It just applies them as it receives them. It means the server can omit uncommitted transactions from the stream entirely and clients can be kept simple(r). IIRC there are also some issues around relcache invalidation handling and time travel that make it desirable to wait until commit before building a snapshot and decoding, but I haven't dug into the details. Andres is the person who knows that area best.

As for why we don't return control to the walsender between change callbacks when processing a reorder buffer at commit time, I'm not really sure but suspect it's mostly down to easy API and development. If control returned to the walsender between each change we'd need an async api for the reorder buffer where you test to see if there's more unprocessed work and call back into the reorder buffer again if there is. So the reorder buffer has to keep state for the progress of replaying a commit in a separate struct, handle repeated calls to process work, etc. Also, since many individual changes are very small that could lead to a fair bit of overhead; it'd probably want to process a batch of changes then return. Which makes it even more complicated.

If it returned control to the caller between changes then each caller would also need to have the logic to test for more work and call back into the reorder buffer. Both the walsender and SQL interface would need it. The way it is, the loop is just in one place.

It probably makes more sense to have a callback that can test state and abort processing, like you've introduced. The callback could probably even periodically check to see if there's client input to process and consume it.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

25 August 2016, 01:22:16

On 25 August 2016 at 03:26, Vladimir Gordiychuk <folyga@gmail.com> wrote:
> Hi. It has already passed a few months but patch still have required review
> state. Can I help to speed up the review, or should i wait commitfest?
> I plane complete changes in pgjdbc drive inside PR
> https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
> with not available stop logical replication.

The latest patch is the updated one I posted, which was the part of
yours that allowed return to command mode when logical decoding is
idle.

If you want to submit an updated version of the second patch, with the
callback to allow logical decoding cancellation during
ReorderBufferCommit, that'd be handy, but I don't think it's as
important as the first one.

As far as I'm concerned the first patch is ready. Please take a look
and verify that you're happy with my updates and test the updated
patch. I'll mark it ready to go if you are. It's linked to at
https://commitfest.postgresql.org/10/621/ .

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

25 August 2016, 05:04:30

On 25 August 2016 at 09:22, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 25 August 2016 at 03:26, Vladimir Gordiychuk <folyga@gmail.com> wrote:
>> Hi. It has already passed a few months but patch still have required review
>> state. Can I help to speed up the review, or should i wait commitfest?
>> I plane complete changes in pgjdbc drive inside PR
>> https://github.com/pgjdbc/pgjdbc/pull/550 but PR blocked current problem
>> with not available stop logical replication.
>
> The latest patch is the updated one I posted, which was the part of
> yours that allowed return to command mode when logical decoding is
> idle.
>
> If you want to submit an updated version of the second patch, with the
> callback to allow logical decoding cancellation during
> ReorderBufferCommit, that'd be handy, but I don't think it's as
> important as the first one.
>
> As far as I'm concerned the first patch is ready. Please take a look
> and verify that you're happy with my updates and test the updated
> patch. I'll mark it ready to go if you are. It's linked to at
> https://commitfest.postgresql.org/10/621/ .

By the way, I now think that the second part of your patch, to allow
interruption during ReorderBufferCommit processing, is also very
desirable.

Not just for that feature, but because we should be processing client
messages during long ReorderBufferCommit executions so that we can
respond to feedback from the client. Right now, long running
ReorderBufferCommit runs can trigger walsender_timeout because there's
no feedback processed.

Alternately, ReorderBufferCommit() could call back into the walsender
with a progress update, without requiring the client to send feedback.
It knows the client is making progress because it's consuming data on
the socket. But I'd rather have client feedback processed, since it
could be useful later for client=->server messages to output plugin
callbacks too.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

06 September 2016, 01:10:27

On 25 August 2016 at 13:04, Craig Ringer <craig@2ndquadrant.com> wrote:

> By the way, I now think that the second part of your patch, to allow
> interruption during ReorderBufferCommit processing, is also very
> desirable.

I've updated your patch, rebasing it on top of 10.0 master and
splitting it into two pieces. I thought you were planning on following
up with an update to the second part, but maybe that was unclear from
our prior discussion, so I'm doing this to help get this moving again.

The first patch is what I posted earlier - the part of your patch that
allows the walsender to exit COPY BOTH mode and return to command mode
in response to a client-initiated CopyDone when it's in the
WalSndLoop.  For logical decoding this means that it will notice a
client CopyDone when it's between transactions or while it's ingesting
transaction changes. It won't react to CopyDone while it's in
ReorderBufferCommit and sending a transaction to an output plugin
though.

The second piece is the other half of your original patch. Currently
unmodified from what you wrote. It adds a hook in ReorderBufferCommit
that calls back into the walsender to check for new client input and
interrupt processing. I haven't yet investigated this in detail.

Review comments on the 2nd patch, i.e. the 2nd half of your original patch:

* Other places in logical decoding use the CB suffix for callback
types. This should do the same.

* I'm not too keen on the name is_active  for the callback. We
discussed the name continue_decoding_cb in our prior conversation.

* Instead of having LogicalContextAlwaysActive, would it be better to
test if the callback pointer is null and skip it if so?

* Lots of typos in LogicalContextAlwaysActive comments, and wording is unclear

* comment added to arguments of pg_logical_slot_get_changes_guts is
not in PostgreSQL style - it goes way wide on the line. Probably
better to put the comment above the call and mention which argument it
refers to.

* A comment somewhere is needed - probably on the IsStreamingActive()
callback - to point readers at the fact that WalSndWriteData() calls
ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I
had to look at the email discussion history to remind myself how this
worked when I was re-reading the code.

* I'm not keen on

    typedef ReorderBufferIsActive LogicalDecondingContextIsActive;

  I think instead a single callback name that encompasses both should
be used. ContinueDecodingCB ?

* There are no tests. I don't see any really simple way to test this, though.

I suggest that you apply these updated/split patches to a fresh copy
of master then see if you can update it based on this feedback.
Something like:

git checkout master
git pull
git checkout -b stop-decoding-v10
git am /path/to/0001-Respect-client-initiated-CopyDone-in-walsender.patch
git am /path/to/0002-Second-part-of-Vladimir-Gordiychuk-s-patch.patch

then modify the code per the above, and "git commit --amend" the
changes and send an updated set of patches with "git format-patch -2"
.

I am setting this patch as "waiting on author" in the commitfest.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

07 September 2016, 07:44:44

>Review comments on the 2nd patch, i.e. the 2nd half of your original patch:
>
>* Other places in logical decoding use the CB suffix for callback
>types. This should do the same.
>
>* I'm not too keen on the name is_active for the callback. We
>discussed the name continue_decoding_cb in our prior conversation.
>
>* Instead of having LogicalContextAlwaysActive, would it be better to
>test if the callback pointer is null and skip it if so?
>
>* Lots of typos in LogicalContextAlwaysActive comments, and wording is unclear
>
>* comment added to arguments of pg_logical_slot_get_changes_guts is
>not in PostgreSQL style - it goes way wide on the line. Probably
>better to put the comment above the call and mention which argument it
>refers to.
>
>* A comment somewhere is needed - probably on the IsStreamingActive()
>callback - to point readers at the fact that WalSndWriteData() calls
>ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I
>had to look at the email discussion history to remind myself how this
>worked when I was re-reading the code.
>
>* I'm not keen on
>
> typedef ReorderBufferIsActive LogicalDecondingContextIsActive;
>
> I think instead a single callback name that encompasses both should
>be used. ContinueDecodingCB ?

Fixed patch in attach.

> * There are no tests. I don't see any really simple way to test this, though.

I will be grateful if you specify the best way how to do it. I tests this patches by pgjdbc driver tests that also build on head of postgres. You can check test scenario for both patches in PRhttps://github.com/pgjdbc/pgjdbc/pull/550

org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive
org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive

2016-09-06 4:10 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 25 August 2016 at 13:04, Craig Ringer <craig@2ndquadrant.com> wrote:

> By the way, I now think that the second part of your patch, to allow
> interruption during ReorderBufferCommit processing, is also very
> desirable.

I've updated your patch, rebasing it on top of 10.0 master and
splitting it into two pieces. I thought you were planning on following
up with an update to the second part, but maybe that was unclear from
our prior discussion, so I'm doing this to help get this moving again.

The first patch is what I posted earlier - the part of your patch that
allows the walsender to exit COPY BOTH mode and return to command mode
in response to a client-initiated CopyDone when it's in the
WalSndLoop. For logical decoding this means that it will notice a
client CopyDone when it's between transactions or while it's ingesting
transaction changes. It won't react to CopyDone while it's in
ReorderBufferCommit and sending a transaction to an output plugin
though.

The second piece is the other half of your original patch. Currently
unmodified from what you wrote. It adds a hook in ReorderBufferCommit
that calls back into the walsender to check for new client input and
interrupt processing. I haven't yet investigated this in detail.

Review comments on the 2nd patch, i.e. the 2nd half of your original patch:

* Other places in logical decoding use the CB suffix for callback
types. This should do the same.

* I'm not too keen on the name is_active for the callback. We
discussed the name continue_decoding_cb in our prior conversation.

* Instead of having LogicalContextAlwaysActive, would it be better to
test if the callback pointer is null and skip it if so?

* Lots of typos in LogicalContextAlwaysActive comments, and wording is unclear

* comment added to arguments of pg_logical_slot_get_changes_guts is
not in PostgreSQL style - it goes way wide on the line. Probably
better to put the comment above the call and mention which argument it
refers to.

* A comment somewhere is needed - probably on the IsStreamingActive()
callback - to point readers at the fact that WalSndWriteData() calls
ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I
had to look at the email discussion history to remind myself how this
worked when I was re-reading the code.

* I'm not keen on

typedef ReorderBufferIsActive LogicalDecondingContextIsActive;

I think instead a single callback name that encompasses both should
be used. ContinueDecodingCB ?

* There are no tests. I don't see any really simple way to test this, though.

I suggest that you apply these updated/split patches to a fresh copy
of master then see if you can update it based on this feedback.
Something like:

git checkout master
git pull
git checkout -b stop-decoding-v10
git am /path/to/0001-Respect-client-initiated-CopyDone-in-walsender.patch
git am /path/to/0002-Second-part-of-Vladimir-Gordiychuk-s-patch.patch

then modify the code per the above, and "git commit --amend" the
changes and send an updated set of patches with "git format-patch -2"
.

I am setting this patch as "waiting on author" in the commitfest.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

08 September 2016, 09:29:26

<p dir="ltr">On 7 September 2016 at 15:44, Vladimir Gordiychuk <<a
href="mailto:folyga@gmail.com">folyga@gmail.com</a>>wrote:<p dir="ltr">> Fixed patch in attach.<p dir="ltr">It'd
helpfulif you summarize the changes made when posting revisions.<p dir="ltr">>> * There are no tests. I don't see
anyreally simple way to test this,<br /> >> though.<br /> ><br /> > I will be grateful if you specify the
bestway how to do it. I tests this<br /> > patches by pgjdbc driver tests that also build on head of postgres. You
can<br/> > check test scenario for both patches in<br /> > PRhttps://<a
href="http://github.com/pgjdbc/pgjdbc/pull/550">github.com/pgjdbc/pgjdbc/pull/550</a><br/> ><br /> >
org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive<br/> >
org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive<pdir="ltr">Yeah,
testingit isn't trivial in postgres core, since of course none of the tools know to send a client-initiated CopyDone.<p
dir="ltr">Mysuggestion is to teach pg_recvlogical to send CopyDone on the first SIGINT (control-C) and wait until the
serverends the data-stream and returns to command mode. A second control-C should unilaterally close the connection and
itshould close after a timeout too. This will be backward compatible with 9.4/5/6 (just with a delay on exit by
sigint).<pdir="ltr">Then in a TAP test in src/test/recovery set up a logical decoding session with test_decoding and
pg_recvlogical.Do a test xact then sigint pg_recvlogical and verify that it exits. To make sure it exited by mutual
agreementwith server, probably run pg_recvlogival at a higher debug level and text search for a message you print from
pg_recvlogicalwhen it gets server CopyDone in the response to client CopyDone. I don't think a different pg_recvlogical
numericexit code could be justified for this.<p dir="ltr">It sounds like more work than I think it would actually be.<p
dir="ltr">--<br /> Craig Ringer <a href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br /> PostgreSQL
Development,24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

08 September 2016, 19:56:10

>It'd helpful if you summarize the changes made when posting revisions.

Can we use as summary about changes first message? If not, summary can be something like that:

This parches fix scenarios interrupt logical replication from client side and allow the client to end a logical decoding session between transactions without just unilaterally closing its connection. The client send CopyDone package and wait stop replication as fast as possible. But in two cases walsender interrupt replication too slow. First scenario it's interrupt during waiting new WAL, when replication in streaming state, walsender ignores messages from Client until new WAL record will not generate. The second scenario it is interrupt replication during decoding long transaction that contain many changes.

>Yeah, testing it isn't trivial in postgres core, since of course none of the tools know to send a client-initiated CopyDone.

>My suggestion is to teach pg_recvlogical to send CopyDone on the first SIGINT (control-C) and wait until the server ends the data-stream and returns to command mode. A second control-C should unilaterally close the connection and it should close after a timeout too. This >will be backward compatible with 9.4/5/6 (just with a delay on exit by sigint).

>Then in a TAP test in src/test/recovery set up a logical decoding session with test_decoding and pg_recvlogical. Do a test xact then sigint pg_recvlogical and verify that it exits. To make sure it exited by mutual agreement with server, probably run pg_recvlogival at a higher debug level and text search for a message you print from pg_recvlogical when it gets server CopyDone in the response to client CopyDone. I don't think a different pg_recvlogical numeric exit code could be justified for this.

>It sounds like more work than I think it would actually be.

Sounds good. Is it changes should be include in this PR? I'm not ensure that I do this before complete this commitefest, I need some time to understand how this tests works, and how can i write new one.

2016-09-08 12:29 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 7 September 2016 at 15:44, Vladimir Gordiychuk <folyga@gmail.com> wrote:
> Fixed patch in attach.
It'd helpful if you summarize the changes made when posting revisions.
>> * There are no tests. I don't see any really simple way to test this,
>> though.
>
> I will be grateful if you specify the best way how to do it. I tests this
> patches by pgjdbc driver tests that also build on head of postgres. You can
> check test scenario for both patches in
> PRhttps://github.com/pgjdbc/pgjdbc/pull/550
>
> org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive
> org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive
Yeah, testing it isn't trivial in postgres core, since of course none of the tools know to send a client-initiated CopyDone.
My suggestion is to teach pg_recvlogical to send CopyDone on the first SIGINT (control-C) and wait until the server ends the data-stream and returns to command mode. A second control-C should unilaterally close the connection and it should close after a timeout too. This will be backward compatible with 9.4/5/6 (just with a delay on exit by sigint).
Then in a TAP test in src/test/recovery set up a logical decoding session with test_decoding and pg_recvlogical. Do a test xact then sigint pg_recvlogical and verify that it exits. To make sure it exited by mutual agreement with server, probably run pg_recvlogival at a higher debug level and text search for a message you print from pg_recvlogical when it gets server CopyDone in the response to client CopyDone. I don't think a different pg_recvlogical numeric exit code could be justified for this.
It sounds like more work than I think it would actually be.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

09 September 2016, 02:37:20

On 9 September 2016 at 03:56, Vladimir Gordiychuk <folyga@gmail.com> wrote:
>>It'd helpful if you summarize the changes made when posting revisions.
>
> Can we use as summary about changes first message?

I meant "what did you change between the last patch I posted and the
one you just posted" ?

I'm looking at the revised patch now.

> Sounds good. Is it changes should be include in this PR?

Yes, probably as a 3rd patch on top of these two that adds the
pg_recvlogical tests.

I suggest:

1 and 2: same patches as currently
3: send CopyDone on SIGINT in pg_recvlogical, second SIGINT just
closes per current behaviour
4: add tests using the code in 1-3

When you're editing such a patch stack, sometimes you'll find you want
to change something in patch 1 or 2 or 3 while working on patch 4. The
trick to doing this is to make sure to commit your changes in small
pieces and avoid mixing up changes for different patches. So you'll
land up with lots of small commits like

* "fix typo in ReorderBufferIsActive declaration"
* "forgot to test streamingDoneReceiving in SomeFunction"

etc.

Then when you're ready to send a new patch series, you tag your
current working branch (optional, but makes cleaning up after mistakes
easier), rebase the changes, tag the result, git format-patch it and
push it, e.g.:

git checkout my-branch
git tag my-branch-v4-before-rebase
# non-interactive rebase on top of current Pg git master
git rebase master
# Interactive rebase of all patches on top of master.
git rebase -i master

This will produce an editor window. Here you choose patches to reorder
and squash into existing patches, e.g if you start with:

pick 81b3f61 Respect client-initiated CopyDone in walsender
pick 43de23d Second part of Vladimir Gordiychuk's patch
pick 1231134 fix a typo in part 1
pick 1231111 forgot to check something in part 2
pick a213111 another typo

you might edit it to:

pick 81b3f61 Respect client-initiated CopyDone in walsender
fixup 1231134 fix a typo in part 1
fixup a213111 another typo
pick 43de23d Second part of Vladimir Gordiychuk's patch
fixup 1231111 forgot to check something in part 2

When you save and quit, git will reorder the patches and squash them
for you. If there are merge conflicts you fix them at each stage then
"git rebase --continue".

It takes some getting used to but isn't really hard at all. If you run
into trouble, just push your current working branch to a public tree
(github or whatever) and ask for help here.

Once you're used to it, git has a few helper features like "fixup!"
commits and --auto-squash that make this a bit quicker, but it's worth
doing manually first since it's easier to figure out problems.

There's lots of good reference material on this out there so I won't
go on about it endlessly.

> I'm not ensure that
> I do this before complete this commitefest, I need some time to understand
> how this tests works, and how can i write new one.

No problem. I don't think it matters much if the patch bumps to the
next commitfest. It's not a big intrusive change that needs to get in
early in the development cycle.

I'm not a committer and can't commit this myself. If I were I don't
think I'd do so without a working test anyway.

A good starting point for the tests is src/test/perl/README . You can
add a test to src/test/recovery/t . Follow the model used in the other
tests there.

I wrote some tests using logical decoding in the logical decoding
timeline following patch; see
https://commitfest.postgresql.org/10/779/ for the patch, or
https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following-pg10-v1
. They use the SQL interface not pg_recvlogical though.

There's also https://github.com/2ndQuadrant/postgres/blob/dev/failover-slots/src/test/recovery/t/008_failover_slots.pl
which has another example of controlling an external process, in this
case pg_receivexlog, using IPC::Run. See start_pg_receivexlog(...) and
its callers.

When writing TAP tests for Perl you must ensure you use only Perl
5.8.8-compatible code and only the core modules plus IPC::Run . You'll
usually be fine if you just avoid importing things you don't see other
modules already using. Don't bother actually installing Perl 5.8.8, we
can run tests in an ancient-perl Docker container before committing
and fix any issues. I don't see any reason you'd need anything special
for this test.

If you think some of the code you add would be very reusable for other
test modules in future, feel free to propose new methods for
PostgresNode.pm . Ask if you're not sure.

Ping me if you're stuck or need a hand. I really want more people to
have a clue about how the TAP tests work. I don't use IRC (wrong
timezone and life is busy) so email is best.

-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Michael Paquier

Date:

09 September 2016, 02:51:43

On Fri, Sep 9, 2016 at 11:37 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> When writing TAP tests for Perl you must ensure you use only Perl
> 5.8.8-compatible code and only the core modules plus IPC::Run . You'll
> usually be fine if you just avoid importing things you don't see other
> modules already using. Don't bother actually installing Perl 5.8.8, we
> can run tests in an ancient-perl Docker container before committing
> and fix any issues. I don't see any reason you'd need anything special
> for this test.

On top of that, the perl documentation is your friend if you have a
doubt about the versioning of a method:
http://perldoc.perl.org/
The important thing is that it is easy to navigate through perl
versions down to 5.8.8 so you can easily detect if what you are using
would be fit or not for the buildfarm.

> If you think some of the code you add would be very reusable for other
> test modules in future, feel free to propose new methods for
> PostgresNode.pm . Ask if you're not sure.

That would be great if things generic enough could be extracted, but I
haven't read the patch to make an opinion regarding what could be
done.
-- 
Michael

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

09 September 2016, 04:03:13

On 9 September 2016 at 10:37, Craig Ringer <craig@2ndquadrant.com> wrote:

> I'm looking at the revised patch now.

Considerably improved.


I fixed a typo from decondig to decoding that's throughout all the
callback names.


This test is wrong:

+        while ((change = ReorderBufferIterTXNNext(rb, iterstate)) !=
NULL && rb->continue_decoding_cb != NULL &&
rb->continue_decoding_cb())

If continue_decoding_cb is non-null, it'll never be called since the
and will short-circuit. If it's null the and will be false so we'll
call rb->continue_decoding_cb().

It also violates PostgreSQL source formatting coding conventions; it
should be wrapped to 80 lines. (Yes, that's archaic, but it's kind of
useful when you've got multiple files open side-by-side, and at least
it's consistent across the codebase).

so it should be:

        while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL &&
               (rb->continue_decoding_cb == NULL ||
                rb->continue_decoding_cb()))

I'd prefer the continue_decoding_cb tests to be on the same line, but
it's slightly too wide. Eh, whatever.


Same logic issue later;

-         if(rb->continue_decoding_cb != NULL && rb->continue_decoding_cb())
+        if (rb->continue_decoding_cb == NULL || rb->continue_decoding_cb())


Commit message is slightly inaccurate. The walsender DID look for
client messages during ReorderBufferCommit decoding, but it never
reacted to CopyDone sent by a client when it saw it.


Rewrote comment on IsStreamingActive() per my original review advice.



I'm not sure why exactly you removed

-    /* fast path */
-    /* Try to flush pending output to the client */
-    if (pq_flush_if_writable() != 0)
-        WalSndShutdown();
-
-    if (!pq_is_send_pending())
-        return;
-

It should be OK if we flush pending output even after we receive
CopyDone. In fact, we have to, since we can't unqueue it and have to
send it before we can send our own CopyDone reply.
pq_flush_if_writable() should only return EOF if the socket is closed,
in which case fast bailout is the right thing to do.

Can you explain your thinking here and what the intended outcome is?



I've attached updated patches with a number of typo fixes,
copy-editing changes, a fix to the test logic, etc as noted above.

Setting "waiting on author" in CF per discussion of the need for tests.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

16 September 2016, 01:37:30

On 9 September 2016 at 12:03, Craig Ringer <craig@2ndquadrant.com> wrote:

> Setting "waiting on author" in CF per discussion of the need for tests.

Have you had a chance to look at adding the tests we discussed?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

16 September 2016, 07:12:01

>Have you had a chance to look at adding the tests we discussed?

Not yet. I plane do it on this weekends

2016-09-16 4:37 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 9 September 2016 at 12:03, Craig Ringer <craig@2ndquadrant.com> wrote:

> Setting "waiting on author" in CF per discussion of the need for tests.

Have you had a chance to look at adding the tests we discussed?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

18 September 2016, 23:12:24

New patch in attach. 0001 and 0002 without changes.

0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca after receive SIGINT will send CopyDone package to postgresql and wait from server CopyDone. For backward compatible after repeat send SIGINT pg_recvloginca will continue immediately without wait CopyDone from server.

0004 patch contain regression tests on scenarios that fix 0001 and 0002 patches.

2016-09-16 10:11 GMT+03:00 Vladimir Gordiychuk <folyga@gmail.com>:

>Have you had a chance to look at adding the tests we discussed?

Not yet. I plane do it on this weekends

2016-09-16 4:37 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:
On 9 September 2016 at 12:03, Craig Ringer <craig@2ndquadrant.com> wrote:

> Setting "waiting on author" in CF per discussion of the need for tests.

Have you had a chance to look at adding the tests we discussed?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

23 September 2016, 05:01:34

On 19 September 2016 at 07:12, Vladimir Gordiychuk <folyga@gmail.com> wrote:
> New patch in attach. 0001 and 0002 without changes.
> 0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca
> after receive SIGINT will send CopyDone package to postgresql and wait from
> server CopyDone. For backward compatible after repeat send SIGINT
> pg_recvloginca will continue immediately without wait CopyDone from server.
> 0004 patch contain regression tests on scenarios that fix 0001 and 0002
> patches.

Great.

Thanks for this.

First observation (which I'll fix in updated patch):

It looks like you didn't import my updated patches, so I've rebased
your new patches on top of them.

Whitespace issues:

$ git am ~/Downloads/0003-Add-ability-for-pg_recvlogical-safe-stop-replication.patch
Applying: Add ability for pg_recvlogical safe stop replication
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:140: indent
with spaces.
   msgBuf + hdr_len + bytes_written,
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:550: indent
with spaces.
    /* Backward compatible, allow force interrupt logical replication
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:551: indent
with spaces.
     * after second SIGINT without wait CopyDone from server
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:552: indent
with spaces.
     */
warning: 4 lines add whitespace errors.

Remember to use "git log --check" before sending patches.

Also, comment style, please do

/* this */

or

/*
 * this
 */

not

/* this
 */

I did't see you explain why this was removed:

-    /* fast path */
-    /* Try to flush pending output to the client */
-    if (pq_flush_if_writable() != 0)
-        WalSndShutdown();
-
-    if (!pq_is_send_pending())
-        return;

I fixed a warning introduced here:

pg_recvlogical.c: In function ‘ProcessXLogData’:
pg_recvlogical.c:289:2: warning: ISO C90 forbids mixed declarations
and code [-Wdeclaration-after-statement]
  int bytes_left = msgLength - hdr_len;
  ^

Some of the changes to pg_recvlogical look to be copied from
receivelog.c, most notably the functions ProcessKeepalive and
ProcessXLogData . I thought that rather than copying them, shouldn't
the existing ones be modified to meet your needs? But it looks like
the issue is that a fair chunk of the rest of pg_recvlogical doesn't
re-use code from receivelog.c either; for example, pg_recvlogical's
sendFeedback differs from receivelog.c's sendFeedback. So
pg_recvlogical doesn't share the globals that receivelog.c assumes are
used. Also, what I thought were copied from receivelog.c were actually
extracted from code previously inline in StreamLogicalLog(...) in
pg_recvlogical.c .

I'm reluctant to try to untangle this in the same patch for risk that
it'll get stalled by issues with refactoring. The change you've
already made is a useful cleanup without messing with unnecessary
code.

Your patch 3 does something ... odd:

 src/test/recovery/t/001_stream_rep.pl                |  63
----------------------
 src/test/recovery/t/002_archiving.pl                 |  53 -------------------
 src/test/recovery/t/003_recovery_targets.pl          | 146
---------------------------------------------------
 src/test/recovery/t/004_timeline_switch.pl           |  75
--------------------------
 src/test/recovery/t/005_replay_delay.pl              |  69
------------------------
 src/test/recovery/t/006_logical_decoding.pl          |  40 --------------
 src/test/recovery/t/007_sync_rep.pl                  | 174
------------------------------------------------------------
 src/test/recovery/t/previous/001_stream_rep.pl       |  63
++++++++++++++++++++++
 src/test/recovery/t/previous/002_archiving.pl        |  53 +++++++++++++++++++
 src/test/recovery/t/previous/003_recovery_targets.pl | 146
+++++++++++++++++++++++++++++++++++++++++++++++++++
 src/test/recovery/t/previous/004_timeline_switch.pl  |  75
++++++++++++++++++++++++++
 src/test/recovery/t/previous/005_replay_delay.pl     |  69
++++++++++++++++++++++++
 src/test/recovery/t/previous/006_logical_decoding.pl |  40 ++++++++++++++
 src/test/recovery/t/previous/007_sync_rep.pl         | 174
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

so I've revised it to remove that whole change, which I think you
probably did not mean to commit.

Reworded commit message.

Ensured that we send feedback just before a client-initiated CopyDone
so server knows what position we saved up to. We don't discard already
buffered data

I've modified patch 3 so that it also flushes to disk and sends
feedback before sending CopyDone, then discards any xlog data received
after it sends CopyDone. This is helpful in ensuring that we get
predictable behaviour when cancelling pg_recvxlog and restarting it
because the client and server agree on the point at which replay
stopped.

I was evaluating the tests and I don't think they actually demonstrate
that the patch works. There's nothing done to check that
pg_recvlogical exited because of client-initiated CopyDone. While
looking into that I found that it also never actually hits
ProcessCopyDone(...) because libpq handles the CopyDone reply from the
server its self and treats it as end-of-stream. So the loop in
StreamLogicalLog will just end and ProcessCopyDone() is dead code.

Initialization of copyDoneSent and copyDoneReceived were also in the
wrong place and would've caused issues with retry looping.

I modified pg_recvlogical so that when in verbose mode it prints a
message when it exits by client request. Then modified the tests to
look for that message.

An updated patch series is attached. Please re-test and review my
changes. At that point I'll mark it ready to go unless someone else
wants to take a look. I'm pretty much out of time for this anyway.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

25 September 2016, 18:16:09

>It looks like you didn't import my updated patches, so I've rebased your new patches on top of them.

Yes, i forgot it, sorry. Thanks for you fixes.

>I did't see you explain why this was removed:

- /* fast path */
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- if (!pq_is_send_pending())
- return;

I remove it, because during decode long transaction code, we always exist from function by fast path and not receive messages from client. Now we always include in 'for' loop and executes

/* Check for input from the client */ ProcessRepliesIfAny();

>Some of the changes to pg_recvlogical look to be copied from
>receivelog.c, most notably the functions ProcessKeepalive and
>ProcessXLogData .

During refactoring huge function I also notice It, but decide not include additional changes in already huge patch. I only split method that do everything to few small functions.

>I was evaluating the tests and I don't think they actually demonstrate

>that the patch works. There's nothing done to check that
>pg_recvlogical exited because of client-initiated CopyDone. While
>looking into that I found that it also never actually hits
>ProcessCopyDone(...) because libpq handles the CopyDone reply from the
>server its self and treats it as end-of-stream. So the loop in
>StreamLogicalLog will just end and ProcessCopyDone() is dead code.

The main idea was about fast stop logical replication as it available, because if stop replication gets more them 1 seconds it takes more pain.

That's why my tests not contained check pg_reclogincal output, only time spend on stop replication(CopyDone exchange).

Increase this timeout to 3 or 5 second hide problems that this PR try fix, that's why I think this lines should be restore to state from previous patch.

```

ok($spendTime <= 5, # allow extra time for slow machines

"pg_recvlogical exited promptly on signal when decoding");

ok((time() - $cancelTime) <= 3, # allow extra time for slow machines

"pg_recvlogical exited promptly on sigint when idle"

);

```

Also I do not understand why we do

$proc->pump while $proc->pumpable;

after send signal to process, why we can not just wait stop replication slot?

2016-09-23 8:01 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 19 September 2016 at 07:12, Vladimir Gordiychuk <folyga@gmail.com> wrote:
> New patch in attach. 0001 and 0002 without changes.
> 0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca
> after receive SIGINT will send CopyDone package to postgresql and wait from
> server CopyDone. For backward compatible after repeat send SIGINT
> pg_recvloginca will continue immediately without wait CopyDone from server.
> 0004 patch contain regression tests on scenarios that fix 0001 and 0002
> patches.

Great.

Thanks for this.

First observation (which I'll fix in updated patch):

It looks like you didn't import my updated patches, so I've rebased
your new patches on top of them.

Whitespace issues:

$ git am ~/Downloads/0003-Add-ability-for-pg_recvlogical-safe-stop-replication.patch
Applying: Add ability for pg_recvlogical safe stop replication
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:140: indent
with spaces.
msgBuf + hdr_len + bytes_written,
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:550: indent
with spaces.
/* Backward compatible, allow force interrupt logical replication
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:551: indent
with spaces.
* after second SIGINT without wait CopyDone from server
/home/craig/projects/2Q/postgres/.git/rebase-apply/patch:552: indent
with spaces.
*/
warning: 4 lines add whitespace errors.

Remember to use "git log --check" before sending patches.

Also, comment style, please do

/* this */

or

/*
* this
*/

not

/* this
*/

I did't see you explain why this was removed:

- /* fast path */
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- if (!pq_is_send_pending())
- return;

I fixed a warning introduced here:

pg_recvlogical.c: In function ‘ProcessXLogData’:
pg_recvlogical.c:289:2: warning: ISO C90 forbids mixed declarations
and code [-Wdeclaration-after-statement]
int bytes_left = msgLength - hdr_len;
^

Some of the changes to pg_recvlogical look to be copied from
receivelog.c, most notably the functions ProcessKeepalive and
ProcessXLogData . I thought that rather than copying them, shouldn't
the existing ones be modified to meet your needs? But it looks like
the issue is that a fair chunk of the rest of pg_recvlogical doesn't
re-use code from receivelog.c either; for example, pg_recvlogical's
sendFeedback differs from receivelog.c's sendFeedback. So
pg_recvlogical doesn't share the globals that receivelog.c assumes are
used. Also, what I thought were copied from receivelog.c were actually
extracted from code previously inline in StreamLogicalLog(...) in
pg_recvlogical.c .

I'm reluctant to try to untangle this in the same patch for risk that
it'll get stalled by issues with refactoring. The change you've
already made is a useful cleanup without messing with unnecessary
code.

Your patch 3 does something ... odd:

src/test/recovery/t/001_stream_rep.pl | 63
----------------------
src/test/recovery/t/002_archiving.pl | 53 -------------------
src/test/recovery/t/003_recovery_targets.pl | 146
---------------------------------------------------
src/test/recovery/t/004_timeline_switch.pl | 75
--------------------------
src/test/recovery/t/005_replay_delay.pl | 69
------------------------
src/test/recovery/t/006_logical_decoding.pl | 40 --------------
src/test/recovery/t/007_sync_rep.pl | 174
------------------------------------------------------------
src/test/recovery/t/previous/001_stream_rep.pl | 63
++++++++++++++++++++++
src/test/recovery/t/previous/002_archiving.pl | 53 +++++++++++++++++++
src/test/recovery/t/previous/003_recovery_targets.pl | 146
+++++++++++++++++++++++++++++++++++++++++++++++++++
src/test/recovery/t/previous/004_timeline_switch.pl | 75
++++++++++++++++++++++++++
src/test/recovery/t/previous/005_replay_delay.pl | 69
++++++++++++++++++++++++
src/test/recovery/t/previous/006_logical_decoding.pl | 40 ++++++++++++++
src/test/recovery/t/previous/007_sync_rep.pl | 174
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

so I've revised it to remove that whole change, which I think you
probably did not mean to commit.

Reworded commit message.

Ensured that we send feedback just before a client-initiated CopyDone
so server knows what position we saved up to. We don't discard already
buffered data

I've modified patch 3 so that it also flushes to disk and sends
feedback before sending CopyDone, then discards any xlog data received
after it sends CopyDone. This is helpful in ensuring that we get
predictable behaviour when cancelling pg_recvxlog and restarting it
because the client and server agree on the point at which replay
stopped.

I was evaluating the tests and I don't think they actually demonstrate
that the patch works. There's nothing done to check that
pg_recvlogical exited because of client-initiated CopyDone. While
looking into that I found that it also never actually hits
ProcessCopyDone(...) because libpq handles the CopyDone reply from the
server its self and treats it as end-of-stream. So the loop in
StreamLogicalLog will just end and ProcessCopyDone() is dead code.

Initialization of copyDoneSent and copyDoneReceived were also in the
wrong place and would've caused issues with retry looping.

I modified pg_recvlogical so that when in verbose mode it prints a
message when it exits by client request. Then modified the tests to
look for that message.

An updated patch series is attached. Please re-test and review my
changes. At that point I'll mark it ready to go unless someone else
wants to take a look. I'm pretty much out of time for this anyway.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

26 September 2016, 02:32:25

<p dir="ltr"><p dir="ltr">On 26 Sep. 2016 02:16, "Vladimir Gordiychuk" <<a
href="mailto:folyga@gmail.com">folyga@gmail.com</a>>wrote:<br /> ><br /> > >It looks like you didn't import
myupdated patches, so I've rebased your new patches on top of them.<br /> > Yes, i forgot it, sorry. Thanks for you
fixes. <br/> ><br /> > >I did't see you explain why this was removed:<br /> ><br /> > -    /* fast path
*/<br/> > -    /* Try to flush pending output to the client */<br /> > -    if (pq_flush_if_writable() != 0)<br
/>> -        WalSndShutdown();<br /> > -<br /> > -    if (!pq_is_send_pending())<br /> > -       
return;<br/> ><br /> > I remove it, because during decode long transaction code, we always exist from function by
fastpath and not receive messages from client. Now we always include in 'for' loop and executes<br /> > /* Check for
inputfrom the client */ ProcessRepliesIfAny();<p dir="ltr">Makes sense.<br /> I<br /> > >Some of the changes to
pg_recvlogicallook to be copied from<br /> > >receivelog.c, most notably the functions ProcessKeepalive and<br />
>>ProcessXLogData .<br /> ><br /> > During refactoring huge function I also notice It, but decide not
includeadditional changes in already huge patch. I only split method that do everything to few small functions. <p
dir="ltr">Gooddecision. This improves things already and makes future refactoring easier.<p dir="ltr">> >I was
evaluatingthe tests and I don't think they actually demonstrate<br /> > >that the patch works. There's nothing
doneto check that<br /> > >pg_recvlogical exited because of client-initiated CopyDone. While<br /> >
>lookinginto that I found that it also never actually hits<br /> > >ProcessCopyDone(...) because libpq handles
theCopyDone reply from the<br /> > >server its self and treats it as end-of-stream. So the loop in<br /> >
>StreamLogicalLogwill just end and ProcessCopyDone() is dead code.<br /> ><br /> > The main idea was about
faststop logical replication as it available, because if stop replication gets more them 1 seconds it takes more
pain.<pdir="ltr">You should rely on time I tests as little as possible. Some of the test hosts are VERY slow. If
possiblesomething deterministic should be used.<p dir="ltr">> Increase this timeout to 3 or 5 second hide problems
thatthis PR try fix, that's why I think this lines should be restore to state from previous patch.<p dir="ltr">Per
aboveI don't agree.<p dir="ltr">><br /> > ```<br /> >     ok($spendTime <= 5, # allow extra time for slow
machines<br/> >         "pg_recvlogical exited promptly on signal when decoding");<br /> ><br /> >    
ok((time()- $cancelTime) <= 3, # allow extra time for slow machines<br /> >         "pg_recvlogical exited
promptlyon sigint when idle"<br /> >     );<br /> ><br /> > ```<br /> ><br /> > Also I do not understand
whywe do <br /> ><br /> > $proc->pump while $proc->pumpable;<br /> ><br /> > after send signal to
process, whywe can not just wait stop replication slot? <p dir="ltr">Because verbose output can produce a lot of
writes.If we don't pump the buffer pg_recvlogical could block writing on its socket.   Also it lets us capture stderr
whichwe need to observe that pg_recvlogical actually ended copy on user command rather than just receiving all the
inputavailable.

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

26 September 2016, 13:52:06

>You should rely on time I tests as little as possible. Some of the test hosts are VERY slow. If possible something deterministic should be used.

That's why this changes difficult to verify. Maybe in that case we should write benchmark but not integration test?
In that case we can say, before this changes stopping logical replication gets N ms but after apply changes it gets NN ms where NN ms less than N ms.
Is it available add this kind of benchmark to postgresql? I will be grateful for links.

2016-09-26 5:32 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 26 Sep. 2016 02:16, "Vladimir Gordiychuk" <folyga@gmail.com> wrote:
>
> >It looks like you didn't import my updated patches, so I've rebased your new patches on top of them.
> Yes, i forgot it, sorry. Thanks for you fixes.
>
> >I did't see you explain why this was removed:
>
> - /* fast path */
> - /* Try to flush pending output to the client */
> - if (pq_flush_if_writable() != 0)
> - WalSndShutdown();
> -
> - if (!pq_is_send_pending())
> - return;
>
> I remove it, because during decode long transaction code, we always exist from function by fast path and not receive messages from client. Now we always include in 'for' loop and executes
> /* Check for input from the client */ ProcessRepliesIfAny();
Makes sense.
I
> >Some of the changes to pg_recvlogical look to be copied from
> >receivelog.c, most notably the functions ProcessKeepalive and
> >ProcessXLogData .
>
> During refactoring huge function I also notice It, but decide not include additional changes in already huge patch. I only split method that do everything to few small functions.
Good decision. This improves things already and makes future refactoring easier.
> >I was evaluating the tests and I don't think they actually demonstrate
> >that the patch works. There's nothing done to check that
> >pg_recvlogical exited because of client-initiated CopyDone. While
> >looking into that I found that it also never actually hits
> >ProcessCopyDone(...) because libpq handles the CopyDone reply from the
> >server its self and treats it as end-of-stream. So the loop in
> >StreamLogicalLog will just end and ProcessCopyDone() is dead code.
>
> The main idea was about fast stop logical replication as it available, because if stop replication gets more them 1 seconds it takes more pain.
You should rely on time I tests as little as possible. Some of the test hosts are VERY slow. If possible something deterministic should be used.
> Increase this timeout to 3 or 5 second hide problems that this PR try fix, that's why I think this lines should be restore to state from previous patch.
Per above I don't agree.
>
> ```
> ok($spendTime <= 5, # allow extra time for slow machines
> "pg_recvlogical exited promptly on signal when decoding");
>
> ok((time() - $cancelTime) <= 3, # allow extra time for slow machines
> "pg_recvlogical exited promptly on sigint when idle"
> );
>
> ```
>
> Also I do not understand why we do
>
> $proc->pump while $proc->pumpable;
>
> after send signal to process, why we can not just wait stop replication slot?
Because verbose output can produce a lot of writes. If we don't pump the buffer pg_recvlogical could block writing on its socket. Also it lets us capture stderr which we need to observe that pg_recvlogical actually ended copy on user command rather than just receiving all the input available.

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

27 September 2016, 02:05:25

On 26 September 2016 at 21:52, Vladimir Gordiychuk <folyga@gmail.com> wrote:
>>You should rely on time I tests as little as possible. Some of the test
>> hosts are VERY slow. If possible something deterministic should be used.
>
> That's why this changes difficult to verify. Maybe in that case we should
> write benchmark but not integration test?
> In that case we can say, before this changes stopping logical replication
> gets N ms but after apply changes it gets NN ms where NN ms less than N ms.
> Is it available add this kind of benchmark to postgresql? I will be grateful
> for links.

It's for that reason that I added a message printed only in verbose
mode that pg_recvlogical emits when it's exiting after a
client-initiated copydone.

You can use the TAP tests, print diag messages, etc. But we generally
want them to run fairly quickly, so big benchmark runs aren't
desirable. You'll notice that I left diag messages in to report the
timing for the results in your tests, I just changed the tests so they
didn't depend on very tight timing for success/failure anymore.

We don't currently have any automated benchmarking infrastructure.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Michael Paquier

Date:

03 October 2016, 02:15:10

On Tue, Sep 27, 2016 at 11:05 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 26 September 2016 at 21:52, Vladimir Gordiychuk <folyga@gmail.com> wrote:
>>>You should rely on time I tests as little as possible. Some of the test
>>> hosts are VERY slow. If possible something deterministic should be used.
>>
>> That's why this changes difficult to verify. Maybe in that case we should
>> write benchmark but not integration test?
>> In that case we can say, before this changes stopping logical replication
>> gets N ms but after apply changes it gets NN ms where NN ms less than N ms.
>> Is it available add this kind of benchmark to postgresql? I will be grateful
>> for links.
>
> It's for that reason that I added a message printed only in verbose
> mode that pg_recvlogical emits when it's exiting after a
> client-initiated copydone.
>
> You can use the TAP tests, print diag messages, etc. But we generally
> want them to run fairly quickly, so big benchmark runs aren't
> desirable. You'll notice that I left diag messages in to report the
> timing for the results in your tests, I just changed the tests so they
> didn't depend on very tight timing for success/failure anymore.
>
> We don't currently have any automated benchmarking infrastructure.

Which seems like this patch is not complete yet. I am marking it as
returned with feedback, but it may be a better idea to move it to next
CF once a new version with updated tests shows up.
-- 
Michael

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

03 October 2016, 03:52:00

<div dir="ltr"><p dir="ltr"><p dir="ltr">On 3 Oct. 2016 10:15, "Michael Paquier" <<a
href="mailto:michael.paquier@gmail.com"target="_blank">michael.paquier@gmail.com</a>> wrote:<br /> ><br /> >
OnTue, Sep 27, 2016 at 11:05 AM, Craig Ringer <<a href="mailto:craig@2ndquadrant.com"
target="_blank">craig@2ndquadrant.com</a>>wrote:<br /> > > On 26 September 2016 at 21:52, Vladimir Gordiychuk
<<ahref="mailto:folyga@gmail.com" target="_blank">folyga@gmail.com</a>> wrote:<br /> > >>>You should
relyon time I tests as little as possible. Some of the test<br /> > >>> hosts are VERY slow. If possible
somethingdeterministic should be used.<br /> > >><br /> > >> That's why this changes difficult to
verify.Maybe in that case we should<br /> > >> write benchmark but not integration test?<br /> > >>
Inthat case we can say, before this changes stopping logical replication<br /> > >> gets N ms but after apply
changesit gets NN ms where NN ms less than N ms.<br /> > >> Is it available add this kind of benchmark to
postgresql?I will be grateful<br /> > >> for links.<br /> > ><br /> > > It's for that reason that
Iadded a message printed only in verbose<br /> > > mode that pg_recvlogical emits when it's exiting after a<br />
>> client-initiated copydone.<br /> > ><br /> > > You can use the TAP tests, print diag messages,
etc.But we generally<br /> > > want them to run fairly quickly, so big benchmark runs aren't<br /> > >
desirable.You'll notice that I left diag messages in to report the<br /> > > timing for the results in your
tests,I just changed the tests so they<br /> > > didn't depend on very tight timing for success/failure
anymore.<br/> > ><br /> > > We don't currently have any automated benchmarking infrastructure.<br />
><br/> > Which seems like this patch is not complete yet.<p dir="ltr">Personally I think it is. I'm just
explainingwhy I adjusted Vladimir's tests to be less timing sensitive and rely on a qualitative property instead.<p
dir="ltr">Thatsaid, it's had recent change and it isn't a big intrusive change that really need attention this cf.<br
/><pdir="ltr">>  I am marking it as<br /> > returned with feedback, but it may be a better idea to move it to
next<br/> > CF once a new version with updated tests shows up.<p dir="ltr">I'll move it now. I think the tests are
fine,if Vladimir agrees, so IMO it's ready to go. More eyes wouldn't hurt though.<br /><p dir="ltr">If Vladimir wants
benchmarkingbased tests that's a whole separate project IMO. Something that will work robustly on the weird slow
machineswe have isn't simple. Probably a new buildfarm option etc since we won't want to clutter the really slow old
nicheboxes with it.<p dir="ltr">Vladimir, can I have your opinion on the latest patch or if you want to make changes,
anupdated patch?<br /><p dir="ltr"><br /></div>

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

04 October 2016, 21:01:13

<div dir="ltr">> <span style="font-size:12.8px">Vladimir, can I have your opinion on the latest patch or if you want
tomake changes, an updated patch?</span><p style="font-size:12.8px">I am satisfied with all our changes and I thinks it
enoughto complete this PR.</div><div class="gmail_extra"><br /><div class="gmail_quote">2016-10-03 6:51 GMT+03:00 Craig
Ringer<span dir="ltr"><<a href="mailto:craig@2ndquadrant.com"
target="_blank">craig@2ndquadrant.com</a>></span>:<br/><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div dir="ltr"><span class=""><p dir="ltr"><p dir="ltr">On 3 Oct. 2016
10:15,"Michael Paquier" <<a href="mailto:michael.paquier@gmail.com"
target="_blank">michael.paquier@gmail.com</a>>wrote:<br /> ><br /> > On Tue, Sep 27, 2016 at 11:05 AM, Craig
Ringer<<a href="mailto:craig@2ndquadrant.com" target="_blank">craig@2ndquadrant.com</a>> wrote:<br /> > >
On26 September 2016 at 21:52, Vladimir Gordiychuk <<a href="mailto:folyga@gmail.com"
target="_blank">folyga@gmail.com</a>>wrote:<br /> > >>>You should rely on time I tests as little as
possible.Some of the test<br /> > >>> hosts are VERY slow. If possible something deterministic should be
used.<br/> > >><br /> > >> That's why this changes difficult to verify. Maybe in that case we
should<br/> > >> write benchmark but not integration test?<br /> > >> In that case we can say, before
thischanges stopping logical replication<br /> > >> gets N ms but after apply changes it gets NN ms where NN
msless than N ms.<br /> > >> Is it available add this kind of benchmark to postgresql? I will be grateful<br
/>> >> for links.<br /> > ><br /> > > It's for that reason that I added a message printed only in
verbose<br/> > > mode that pg_recvlogical emits when it's exiting after a<br /> > > client-initiated
copydone.<br/> > ><br /> > > You can use the TAP tests, print diag messages, etc. But we generally<br />
>> want them to run fairly quickly, so big benchmark runs aren't<br /> > > desirable. You'll notice that I
leftdiag messages in to report the<br /> > > timing for the results in your tests, I just changed the tests so
they<br/> > > didn't depend on very tight timing for success/failure anymore.<br /> > ><br /> > > We
don'tcurrently have any automated benchmarking infrastructure.<br /> ><br /> > Which seems like this patch is not
completeyet.</span><p dir="ltr">Personally I think it is. I'm just explaining why I adjusted Vladimir's tests to be
lesstiming sensitive and rely on a qualitative property instead.<p dir="ltr">That said, it's had recent change and it
isn'ta big intrusive change that really need attention this cf.<br /><span class=""><p dir="ltr">>  I am marking it
as<br/> > returned with feedback, but it may be a better idea to move it to next<br /> > CF once a new version
withupdated tests shows up.</span><p dir="ltr">I'll move it now. I think the tests are fine, if Vladimir agrees, so IMO
it'sready to go. More eyes wouldn't hurt though.<br /><p dir="ltr">If Vladimir wants benchmarking based tests that's a
wholeseparate project IMO. Something that will work robustly on the weird slow machines we have isn't simple. Probably
anew buildfarm option etc since we won't want to clutter the really slow old niche boxes with it.<p dir="ltr">Vladimir,
canI have your opinion on the latest patch or if you want to make changes, an updated patch?<br /><p dir="ltr"><br
/></div></blockquote></div><br/></div>

Re: Stopping logical replication protocol

From

Andres Freund

Date:

04 October 2016, 21:15:49

Hi,

On 2016-09-23 13:01:27 +0800, Craig Ringer wrote:
> From f98f2388c57d938ebbe07ccd2dbe02138312858f Mon Sep 17 00:00:00 2001
> From: Vladimir Gordiychuk <folyga@gmail.com>
> Date: Wed, 7 Sep 2016 00:39:18 +0300
> Subject: [PATCH 2/4] Client-initiated CopyDone during transaction decoding in
>  walsender
> 
> The prior patch caused the walsender to react to a client-initiated
> CopyDone while it's in the WalSndLoop. That's not enough to let clients
> end logical decoding mid-transaction because we loop in ReorderBufferCommit
> during decoding of a transaction without ever returning to WalSndLoop.
> 
> Allow breaking out of ReorderBufferCommit by detecting that client
> input has requested an end to COPY BOTH mode, so clients can abort
> decoding mid-transaction.

Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
don't we just error out in the prepare write callback?

I don't think it's a good idea to return a non-error state if a command
is interrupted in the middle. We might already have sent a partial
transaction or such.   Also compare this to interrupting a query - we
don't just returning rows, we error out.

Andres

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

05 October 2016, 10:06:26

On 5 October 2016 at 05:14, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-09-23 13:01:27 +0800, Craig Ringer wrote:
>> From f98f2388c57d938ebbe07ccd2dbe02138312858f Mon Sep 17 00:00:00 2001
>> From: Vladimir Gordiychuk <folyga@gmail.com>
>> Date: Wed, 7 Sep 2016 00:39:18 +0300
>> Subject: [PATCH 2/4] Client-initiated CopyDone during transaction decoding in
>>  walsender
>>
>> The prior patch caused the walsender to react to a client-initiated
>> CopyDone while it's in the WalSndLoop. That's not enough to let clients
>> end logical decoding mid-transaction because we loop in ReorderBufferCommit
>> during decoding of a transaction without ever returning to WalSndLoop.
>>
>> Allow breaking out of ReorderBufferCommit by detecting that client
>> input has requested an end to COPY BOTH mode, so clients can abort
>> decoding mid-transaction.
>
> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> don't we just error out in the prepare write callback?

That's sensible.

> I don't think it's a good idea to return a non-error state if a command
> is interrupted in the middle. We might already have sent a partial
> transaction or such.   Also compare this to interrupting a query - we
> don't just returning rows, we error out.

Good point. It's not usually visible to the user because if it's a
client-initiated cancel the client will generally consume the error,
but there's still an error generated.

Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
was expecting the error the client can Sync and do whatever it next
wants to do on the walsender protocol, or bail out nicely.

Vladimir? I'm running out of time to spend on this at least until the
next CF. Think you can make these changes?


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

05 October 2016, 21:35:26

> Vladimir? I'm running out of time to spend on this at least until the next CF. Think you can make these changes?

Yes, I can. But I thinks It should be discuss first.

> Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it

> was expecting the error the client can Sync and do whatever it next
> wants to do on the walsender protocol, or bail out nicely.

Do you want return CopyFail with ERRCODE_QUERY_CANCELED instead of CopyDone on client initialized stop replication?

> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why

> don't we just error out in the prepare write callback?

But what about scenario when output plugin collect changes in memory with some transformation and send it only inside commit_cb?

It means that OutputPluginPrepareWrite will not execute until end transaction and we face with too long stops replication when decodes huge transaction.

2016-10-05 13:06 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:

On 5 October 2016 at 05:14, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-09-23 13:01:27 +0800, Craig Ringer wrote:
>> From f98f2388c57d938ebbe07ccd2dbe02138312858f Mon Sep 17 00:00:00 2001
>> From: Vladimir Gordiychuk <folyga@gmail.com>
>> Date: Wed, 7 Sep 2016 00:39:18 +0300
>> Subject: [PATCH 2/4] Client-initiated CopyDone during transaction decoding in
>> walsender
>>
>> The prior patch caused the walsender to react to a client-initiated
>> CopyDone while it's in the WalSndLoop. That's not enough to let clients
>> end logical decoding mid-transaction because we loop in ReorderBufferCommit
>> during decoding of a transaction without ever returning to WalSndLoop.
>>
>> Allow breaking out of ReorderBufferCommit by detecting that client
>> input has requested an end to COPY BOTH mode, so clients can abort
>> decoding mid-transaction.
>
> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> don't we just error out in the prepare write callback?

That's sensible.

> I don't think it's a good idea to return a non-error state if a command
> is interrupted in the middle. We might already have sent a partial
> transaction or such. Also compare this to interrupting a query - we
> don't just returning rows, we error out.

Good point. It's not usually visible to the user because if it's a
client-initiated cancel the client will generally consume the error,
but there's still an error generated.

Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
was expecting the error the client can Sync and do whatever it next
wants to do on the walsender protocol, or bail out nicely.

Vladimir? I'm running out of time to spend on this at least until the
next CF. Think you can make these changes?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Vladimir Gordiychuk

Date:

21 October 2016, 11:38:11

Craig, Andres what do you thinks about previous message?

2016-10-06 0:35 GMT+03:00 Vladimir Gordiychuk <folyga@gmail.com>:

> Vladimir? I'm running out of time to spend on this at least until the next CF. Think you can make these changes?

Yes, I can. But I thinks It should be discuss first.

> Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
> was expecting the error the client can Sync and do whatever it next
> wants to do on the walsender protocol, or bail out nicely.

Do you want return CopyFail with ERRCODE_QUERY_CANCELED instead of CopyDone on client initialized stop replication?

> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> don't we just error out in the prepare write callback?

But what about scenario when output plugin collect changes in memory with some transformation and send it only inside commit_cb?
It means that OutputPluginPrepareWrite will not execute until end transaction and we face with too long stops replication when decodes huge transaction.

2016-10-05 13:06 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:
On 5 October 2016 at 05:14, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2016-09-23 13:01:27 +0800, Craig Ringer wrote:
>> From f98f2388c57d938ebbe07ccd2dbe02138312858f Mon Sep 17 00:00:00 2001
>> From: Vladimir Gordiychuk <folyga@gmail.com>
>> Date: Wed, 7 Sep 2016 00:39:18 +0300
>> Subject: [PATCH 2/4] Client-initiated CopyDone during transaction decoding in
>> walsender
>>
>> The prior patch caused the walsender to react to a client-initiated
>> CopyDone while it's in the WalSndLoop. That's not enough to let clients
>> end logical decoding mid-transaction because we loop in ReorderBufferCommit
>> during decoding of a transaction without ever returning to WalSndLoop.
>>
>> Allow breaking out of ReorderBufferCommit by detecting that client
>> input has requested an end to COPY BOTH mode, so clients can abort
>> decoding mid-transaction.
>
> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> don't we just error out in the prepare write callback?

That's sensible.

> I don't think it's a good idea to return a non-error state if a command
> is interrupted in the middle. We might already have sent a partial
> transaction or such. Also compare this to interrupting a query - we
> don't just returning rows, we error out.

Good point. It's not usually visible to the user because if it's a
client-initiated cancel the client will generally consume the error,
but there's still an error generated.

Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
was expecting the error the client can Sync and do whatever it next
wants to do on the walsender protocol, or bail out nicely.

Vladimir? I'm running out of time to spend on this at least until the
next CF. Think you can make these changes?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Craig Ringer

Date:

03 November 2016, 13:44:40

On 21 October 2016 at 19:38, Vladimir Gordiychuk <folyga@gmail.com> wrote:
> Craig, Andres what do you thinks about previous message?

I haven't had a chance to look further to be honest.

Since a downstream disconnect works, though it's ugly, it's not
something I can justify spending a lot of time on, and I already did
spend a lot on it in patch review/updating/testing etc.

I don't know what Andres wants, but I think CopyFail with
ERRCODE_QUERY_CANCELED is fine.

As for plugins that collect changes in memory and only send them on
commit, I'd call them "broken". Not an interesting case IMO. Don't do
that.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Stopping logical replication protocol

From

Haribabu Kommi

Date:

02 December 2016, 02:25:54

On Fri, Nov 4, 2016 at 12:44 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 21 October 2016 at 19:38, Vladimir Gordiychuk <folyga@gmail.com> wrote:
> Craig, Andres what do you thinks about previous message?

I haven't had a chance to look further to be honest.

Since a downstream disconnect works, though it's ugly, it's not
something I can justify spending a lot of time on, and I already did
spend a lot on it in patch review/updating/testing etc.

I don't know what Andres wants, but I think CopyFail with
ERRCODE_QUERY_CANCELED is fine.

As for plugins that collect changes in memory and only send them on
commit, I'd call them "broken". Not an interesting case IMO. Don't do
that.

Currently the patch is marked as "returned with feedback" as there are some

comments from reviewer.

Please free to submit an update patch to the next commitfest.

Regards,

Hari Babu

Fujitsu Australia