Thread: [PATCH] add concurrent_abort callback for output plugin

[PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

25 March 2021, 09:07:28

Hi,

here is another tidbit from our experience with using logical decoding. 
The attached patch adds a callback to notify the output plugin of a 
concurrent abort.  I'll continue to describe the problem in more detail 
and how this additional callback solves it.

Streamed transactions as well as two-phase commit transactions may get 
decoded before they finish.  At the point the begin_cb is invoked and 
first changes are delivered to the output plugin, it is not necessarily 
known whether the transaction will commit or abort.

This leads to the possibility of the transaction getting aborted 
concurrent to logical decoding.  In that case, it is likely for the 
decoder to error on a catalog scan that conflicts with the abort of the 
transaction.  The reorderbuffer sports a PG_CATCH block to cleanup. 
However, it does not currently inform the output plugin.  From its point 
of view, the transaction is left dangling until another one comes along 
or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets 
decoded.  Therefore, what the output plugin might see in this case is:

* filter_prepare_cb (txn A)
* begin_prepare_cb  (txn A)
* apply_change      (txn A)
* apply_change      (txn A)
* apply_change      (txn A)
* begin_cb          (txn B)

In other words, in this example, only the begin_cb of the following 
transaction implicitly tells the output plugin that txn A could not be 
fully decoded.  And there's no upper time boundary on when that may 
happen.  (It could also be another filter_prepare_cb, if the subsequent 
transaction happens to be a two-phase transaction as well.  Or an 
explicit rollback_prepared_cb or stream_abort if there's no other 
transaction in between.)

An alternative and arguably cleaner approach for streamed transactions 
may be to directly invoke stream_abort.  However, the lsn argument 
passed could not be that of the abort record, as that's not known at the 
point in time of the concurrent abort.  Plus, this seems like a bad fit 
for two-phase commit transactions.

Again, this callback is especially important for output plugins that 
invoke further actions on downstream nodes that delay the COMMIT 
PREPARED of a transaction upstream, e.g. until prepared on other nodes. 
Up until now, the output plugin has no way to learn about a concurrent 
abort of the currently decoded (2PC or streamed) transaction (perhaps 
short of continued polling on the transaction status).

I also think it generally improves the API by allowing the output plugin 
to rely on such a callback, rather than having to implicitly deduce this 
from other callbacks.

Thoughts or comments?  If this is agreed on, I can look into adding 
tests (concurrent aborts are not currently covered, it seems).

Regards

Markus

Attachment

0001-add-concurrent_abort-callback_v1.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Andres Freund

Date:

25 March 2021, 20:21:55

Hi,

On 2021-03-25 10:07:28 +0100, Markus Wanner wrote:
> This leads to the possibility of the transaction getting aborted concurrent
> to logical decoding.  In that case, it is likely for the decoder to error on
> a catalog scan that conflicts with the abort of the transaction.  The
> reorderbuffer sports a PG_CATCH block to cleanup.

FWIW, I am seriously suspicuous of the code added as part of
7259736a6e5b7c75 and plan to look at it after the code freeze. I can't
really see this code surviving as is. The tableam.h changes, the bsyscan
stuff, ... Leaving correctness aside, the code bloat and performance
affects alone seems problematic.


> Again, this callback is especially important for output plugins that invoke
> further actions on downstream nodes that delay the COMMIT PREPARED of a
> transaction upstream, e.g. until prepared on other nodes. Up until now, the
> output plugin has no way to learn about a concurrent abort of the currently
> decoded (2PC or streamed) transaction (perhaps short of continued polling on
> the transaction status).

You may have only meant it as a shorthand: But imo output plugins have
absolutely no business "invoking actions downstream".


> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> index c291b05a423..a6d044b870b 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -2488,6 +2488,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
>              errdata = NULL;
>              curtxn->concurrent_abort = true;
>  
> +            /*
> +             * Call the cleanup hook to inform the output plugin that the
> +             * transaction just started had to be aborted.
> +             */
> +            rb->concurrent_abort(rb, txn, streaming, commit_lsn);
> +
>              /* Reset the TXN so that it is allowed to stream remaining data. */
>              ReorderBufferResetTXN(rb, txn, snapshot_now,
>                                    command_id, prev_lsn,

I don't think this would be ok, errors thrown in the callback wouldn't
be handled as they would be in other callbacks.

Greetings,

Andres Freund

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

25 March 2021, 22:15:50

On 25.03.21 21:21, Andres Freund wrote:
> ... the code added as part of 7259736a6e5b7c75 ...

That's the streaming part, which can be thought of as a more general 
variant of the two-phase decoding in that it allows multiple "flush 
points" (invoking ReorderBufferProcessTXN).  Unlike the PREPARE of a 
two-phase commit, where the reorderbuffer can be sure there's no further 
change to be processed after the PREPARE.  Nor is there any invocation 
of ReorderBufferProcessTXN before that fist one at PREPARE time.  With 
that in mind, I'm surprised support for streaming got committed before 
2PC.  It clearly has different use cases, though.

However, I'm sure your inputs on how to improve and cleanup the 
implementation will be appreciated.  The single tiny problem this patch 
addresses is the same for 2PC and streaming decoding: the output plugin 
currently has no way to learn about a concurrent abort of a transaction 
still being decoded, at the time this happens.

Both, 2PC and streaming do require the reorderbuffer to forward changes 
(possibly) prior to the transaction's commit.  That's the whole point of 
these two features.  Therefore, I don't think we can get around 
concurrent aborts.

> You may have only meant it as a shorthand: But imo output plugins have
> absolutely no business "invoking actions downstream".

 From my point of view, that's the raison d'être for an output plugin. 
Even if it does so merely by forwarding messages.  But yeah, of course a 
whole bunch of other components and changes are needed to implement the 
kind of global two-phase commit system I tried to describe.

I'm open to suggestions on how to reference that use case.

>> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
>> index c291b05a423..a6d044b870b 100644
>> --- a/src/backend/replication/logical/reorderbuffer.c
>> +++ b/src/backend/replication/logical/reorderbuffer.c
>> @@ -2488,6 +2488,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
>>               errdata = NULL;
>>               curtxn->concurrent_abort = true;
>>   
>> +            /*
>> +             * Call the cleanup hook to inform the output plugin that the
>> +             * transaction just started had to be aborted.
>> +             */
>> +            rb->concurrent_abort(rb, txn, streaming, commit_lsn);
>> +
>>               /* Reset the TXN so that it is allowed to stream remaining data. */
>>               ReorderBufferResetTXN(rb, txn, snapshot_now,
>>                                     command_id, prev_lsn,
> 
> I don't think this would be ok, errors thrown in the callback wouldn't
> be handled as they would be in other callbacks.

That's a good point.  Maybe the CATCH block should only set a flag, 
allowing for the callback to be invoked outside of it.

Regards

Markus my-callbacks-do-not-throw-error Wanner

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

26 March 2021, 03:28:27

On Thu, Mar 25, 2021 at 2:37 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> here is another tidbit from our experience with using logical decoding.
> The attached patch adds a callback to notify the output plugin of a
> concurrent abort.  I'll continue to describe the problem in more detail
> and how this additional callback solves it.
>
> Streamed transactions as well as two-phase commit transactions may get
> decoded before they finish.  At the point the begin_cb is invoked and
> first changes are delivered to the output plugin, it is not necessarily
> known whether the transaction will commit or abort.
>
> This leads to the possibility of the transaction getting aborted
> concurrent to logical decoding.  In that case, it is likely for the
> decoder to error on a catalog scan that conflicts with the abort of the
> transaction.  The reorderbuffer sports a PG_CATCH block to cleanup.
> However, it does not currently inform the output plugin.  From its point
> of view, the transaction is left dangling until another one comes along
> or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets
> decoded.  Therefore, what the output plugin might see in this case is:
>
> * filter_prepare_cb (txn A)
> * begin_prepare_cb  (txn A)
> * apply_change      (txn A)
> * apply_change      (txn A)
> * apply_change      (txn A)
> * begin_cb          (txn B)
>
> In other words, in this example, only the begin_cb of the following
> transaction implicitly tells the output plugin that txn A could not be
> fully decoded.  And there's no upper time boundary on when that may
> happen.  (It could also be another filter_prepare_cb, if the subsequent
> transaction happens to be a two-phase transaction as well.  Or an
> explicit rollback_prepared_cb or stream_abort if there's no other
> transaction in between.)
>
> An alternative and arguably cleaner approach for streamed transactions
> may be to directly invoke stream_abort.  However, the lsn argument
> passed could not be that of the abort record, as that's not known at the
> point in time of the concurrent abort.  Plus, this seems like a bad fit
> for two-phase commit transactions.
>
> Again, this callback is especially important for output plugins that
> invoke further actions on downstream nodes that delay the COMMIT
> PREPARED of a transaction upstream, e.g. until prepared on other nodes.
> Up until now, the output plugin has no way to learn about a concurrent
> abort of the currently decoded (2PC or streamed) transaction (perhaps
> short of continued polling on the transaction status).
>

I think as you have noted that stream_abort or rollback_prepared will
be sent (the remaining changes in-between will be skipped) as we
decode them from WAL so it is not clear to me how it causes any delays
as opposed to where we don't detect concurrent abort say because after
that we didn't access catalog table.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

26 March 2021, 09:12:54

On 26.03.21 04:28, Amit Kapila wrote:
> I think as you have noted that stream_abort or rollback_prepared will
> be sent (the remaining changes in-between will be skipped) as we
> decode them from WAL

Yes, but as outlined, too late.  Multiple other transactions may get 
decoded until the decoder reaches the ROLLBACK PREPARED.  Thus, 
effectively, the output plugin currently needs to deduce that a 
transaction got aborted concurrently from one out of half a dozen other 
callbacks that may trigger right after that transaction, because it will 
only get closed properly much later.

> so it is not clear to me how it causes any delays
> as opposed to where we don't detect concurrent abort say because after
> that we didn't access catalog table.

You're assuming very little traffic, where the ROLLBACK ABORT follows 
the PREPARE immediately in WAL.  On a busy system, chances for that to 
happen are rather low.

(I think the same is true for streaming and stream_abort being sent only 
at the time the decoder reaches the ROLLBACK record in WAL.  However, I 
did not try.  Unlike 2PC, where this actually bit me.)

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

26 March 2021, 10:19:27

On Fri, Mar 26, 2021 at 2:42 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 26.03.21 04:28, Amit Kapila wrote:
> > I think as you have noted that stream_abort or rollback_prepared will
> > be sent (the remaining changes in-between will be skipped) as we
> > decode them from WAL
>
> Yes, but as outlined, too late.  Multiple other transactions may get
> decoded until the decoder reaches the ROLLBACK PREPARED.  Thus,
> effectively, the output plugin currently needs to deduce that a
> transaction got aborted concurrently from one out of half a dozen other
> callbacks that may trigger right after that transaction, because it will
> only get closed properly much later.
>
> > so it is not clear to me how it causes any delays
> > as opposed to where we don't detect concurrent abort say because after
> > that we didn't access catalog table.
>
> You're assuming very little traffic, where the ROLLBACK ABORT follows
> the PREPARE immediately in WAL.  On a busy system, chances for that to
> happen are rather low.
>

No, I am not assuming that. I am just trying to describe you that it
is not necessary that we will be able to detect concurrent abort in
each and every case. Say if any transaction operates on one relation
and concurrent abort happens after first access of relation then we
won't access catalog and hence won't detect abort. In such cases, you
will get the abort only when it happens in WAL. So, why try to get
earlier in some cases when it is not guaranteed in every case. Also,
what will you do when you receive actual Rollback, may be the plugin
can throw it by checking in some way that it has already aborted the
transaction, if so, that sounds a bit awkward to me.

The other related thing is it may not be a good idea to finish the
transaction before we see its actual WAL record because after the
client (say subscriber) finishes xact, it sends the updated LSN
location based on which we update the slot LSNs from where it will
start decoding next time after restart, so by bad timing it might try
to decode the contents of same transaction but may be for
concurrent_aborts the plugin might arrange such that client won't send
updated LSN.

> (I think the same is true for streaming and stream_abort being sent only
> at the time the decoder reaches the ROLLBACK record in WAL.  However, I
> did not try.
>

Yes, both streaming and 2PC behaves in a similar way in this regard.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

26 March 2021, 12:20:24

On 26.03.21 11:19, Amit Kapila wrote:
> No, I am not assuming that. I am just trying to describe you that it
> is not necessary that we will be able to detect concurrent abort in
> each and every case.

Sure.  Nor am I claiming that would be necessary or that the patch 
changed anything about it.

As it stands, assuming the the output plugin basically just forwards the 
events and the subscriber tries to replicate them as is, the following 
would happen on the subscriber for a concurrently aborted two-phase 
transaction:

  * start a transaction (begin_prepare_cb)
  * apply changes for it (change_cb)
  * digress to other, unrelated transactions (leaving unspecified what
    exactly happens to the opened transaction)
  * attempt to rollback a transaction that has not ever been prepared
    (rollback_prepared_cb)

The point of the patch is for the output plugin to get proper 
transaction entry and exit callbacks.  Even in the unfortunate case of a 
concurrent abort.  It offers the output plugin a clean way to learn that 
the decoder stopped decoding for the current transaction and it won't 
possibly see a prepare_cb for it (despite the decoder having passed the 
PREPARE record in WAL).

> The other related thing is it may not be a good idea to finish the
> transaction

You're speaking subscriber side here.  And yes, I agree, the subscriber 
should not abort the transaction at a concurrent_abort.  I never claimed 
it should.

If you are curious, in our case I made the subscriber PREPARE the 
transaction at its end when receiving a concurrent_abort notification, 
so that the subscriber:

  * can hop out of that started transaction and safely proceed
    to process events for other transactions, and
  * has the transaction in the appropriate state for processing the
    subsequent rollback_prepared_cb, once that gets through

That's probably not ideal in the sense that subscribers do unnecessary 
work.  However, it pretty closely replicates the transaction's state as 
it was on the origin at any given point in time (by LSN).

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

27 March 2021, 06:37:36

On Fri, Mar 26, 2021 at 5:50 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 26.03.21 11:19, Amit Kapila wrote:
> > No, I am not assuming that. I am just trying to describe you that it
> > is not necessary that we will be able to detect concurrent abort in
> > each and every case.
>
> Sure.  Nor am I claiming that would be necessary or that the patch
> changed anything about it.
>
> As it stands, assuming the the output plugin basically just forwards the
> events and the subscriber tries to replicate them as is, the following
> would happen on the subscriber for a concurrently aborted two-phase
> transaction:
>
>   * start a transaction (begin_prepare_cb)
>   * apply changes for it (change_cb)
>   * digress to other, unrelated transactions (leaving unspecified what
>     exactly happens to the opened transaction)
>   * attempt to rollback a transaction that has not ever been prepared
>     (rollback_prepared_cb)
>
> The point of the patch is for the output plugin to get proper
> transaction entry and exit callbacks.  Even in the unfortunate case of a
> concurrent abort.  It offers the output plugin a clean way to learn that
> the decoder stopped decoding for the current transaction and it won't
> possibly see a prepare_cb for it (despite the decoder having passed the
> PREPARE record in WAL).
>
> > The other related thing is it may not be a good idea to finish the
> > transaction
>
> You're speaking subscriber side here.  And yes, I agree, the subscriber
> should not abort the transaction at a concurrent_abort.  I never claimed
> it should.
>
> If you are curious, in our case I made the subscriber PREPARE the
> transaction at its end when receiving a concurrent_abort notification,
> so that the subscriber:
>
>   * can hop out of that started transaction and safely proceed
>     to process events for other transactions, and
>   * has the transaction in the appropriate state for processing the
>     subsequent rollback_prepared_cb, once that gets through
>
> That's probably not ideal in the sense that subscribers do unnecessary
> work.
>

Isn't it better to send prepare from the publisher in such a case so
that subscribers can know about it when rollback prepared arrives? I
think we have already done the same (sent prepare, exactly to handle
the case you have described above) for *streamed* transactions.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

29 March 2021, 07:06:50

On 27.03.21 07:37, Amit Kapila wrote:
> Isn't it better to send prepare from the publisher in such a case so
> that subscribers can know about it when rollback prepared arrives?

That's exactly what this callback allows (among other options).  It 
provides a way for the output plugin to react to a transaction aborting 
while it is being decoded.  This would not be possible without this 
additional callback.

Also note that I would like to retain the option to do some basic 
protocol validity checks.  Certain messages only make sense within a 
transaction ('U'pdate, 'C'ommit).  Others are only valid outside of a 
transaction ('B'egin, begin_prepare_cb).  This is only possible if the 
output plugin has a callback for every entry into and exit out of a 
transaction (being decoded).  This used to be the case prior to 2PC 
decoding and this patch re-establishes that.

> I think we have already done the same (sent prepare, exactly to handle
> the case you have described above) for *streamed* transactions.

Where can I find that?  ISTM streaming transactions have the same issue: 
the output plugin does not (or only implicitly) learn about a concurrent 
abort of the transaction.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

29 March 2021, 09:33:37

On Mon, Mar 29, 2021 at 12:36 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 27.03.21 07:37, Amit Kapila wrote:
> > Isn't it better to send prepare from the publisher in such a case so
> > that subscribers can know about it when rollback prepared arrives?
>
> That's exactly what this callback allows (among other options).  It
> provides a way for the output plugin to react to a transaction aborting
> while it is being decoded.  This would not be possible without this
> additional callback.
>

You don't need an additional callback for that if we do what I am
suggesting above.

> Also note that I would like to retain the option to do some basic
> protocol validity checks.  Certain messages only make sense within a
> transaction ('U'pdate, 'C'ommit).  Others are only valid outside of a
> transaction ('B'egin, begin_prepare_cb).  This is only possible if the
> output plugin has a callback for every entry into and exit out of a
> transaction (being decoded).  This used to be the case prior to 2PC
> decoding and this patch re-establishes that.
>
> > I think we have already done the same (sent prepare, exactly to handle
> > the case you have described above) for *streamed* transactions.
>
> Where can I find that?  ISTM streaming transactions have the same issue:
> the output plugin does not (or only implicitly) learn about a concurrent
> abort of the transaction.
>

One is you can try to test it, otherwise, there are comments atop
DecodePrepare() ("Note that we don't skip prepare even if have
detected concurrent abort because it is quite possible that ....")
which explains this.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

29 March 2021, 09:53:20

On 29.03.21 11:33, Amit Kapila wrote:
> You don't need an additional callback for that if we do what I am
> suggesting above.

Ah, are you suggesting a different change, then?  To make two-phase 
transactions always send PREPARE even if concurrently aborted?  In that 
case, sorry, I misunderstood.

I'm perfectly fine with that approach as well (even though it removes 
flexibility compared to the concurrent abort callback, as the comment 
above DecodePrepare indicates, i.e. "not impossible to optimize the 
concurrent abort case").

> One is you can try to test it, otherwise, there are comments atop
> DecodePrepare() ("Note that we don't skip prepare even if have
> detected concurrent abort because it is quite possible that ....")
> which explains this.

Thanks for this pointer, very helpful.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Ajin Cherian

Date:

29 March 2021, 11:02:34

On Mon, Mar 29, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Mar 29, 2021 at 12:36 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 27.03.21 07:37, Amit Kapila wrote:
> > Isn't it better to send prepare from the publisher in such a case so
> > that subscribers can know about it when rollback prepared arrives?

Nice catch, Markus.

Interesting suggestion Amit. Let me try and code this.

regards,

Ajin Cherian

Fujitsu Australia

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

29 March 2021, 11:09:45

On 29.03.21 13:02, Ajin Cherian wrote:
> Nice catch, Markus.
> Interesting suggestion Amit. Let me try and code this.

Thanks, Ajin.  Please consider this concurrent_abort callback as well. 
I think it provides more flexibility for the output plugin and I would 
therefore prefer it over a solution that hides this.  It clearly makes 
all potential optimizations impossible, as it means the output plugin 
cannot distinguish between a proper PREAPRE and a bail-out PREPARE (that 
does not fully replicate the PREPARE as on the origin node, either, 
which I think is dangerous).

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Ajin Cherian

Date:

30 March 2021, 04:48:30

On Mon, Mar 29, 2021 at 10:09 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:

On 29.03.21 13:02, Ajin Cherian wrote:
> Nice catch, Markus.
> Interesting suggestion Amit. Let me try and code this.

Thanks, Ajin. Please consider this concurrent_abort callback as well.
I think it provides more flexibility for the output plugin and I would
therefore prefer it over a solution that hides this. It clearly makes
all potential optimizations impossible, as it means the output plugin
cannot distinguish between a proper PREAPRE and a bail-out PREPARE (that
does not fully replicate the PREPARE as on the origin node, either,
which I think is dangerous).

I understand your concern Markus, but I will leave it to one of the committers to decide on the new callback.

For now, I've created a patch that addresses the problem reported using the existing callbacks.

Do have a look if this fixes the problem reported.

regards,

Ajin Cherian

Fujitsu Australia

Attachment

v1-0001-Make-sure-a-prepare-is-sent-when-decoder-detects-.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

30 March 2021, 06:30:34

Hello Ajin,

On 30.03.21 06:48, Ajin Cherian wrote:
> For now, I've created a patch that addresses the problem reported using 
> the existing callbacks.

Thanks.

> Do have a look if this fixes the problem reported.

Yes, this replaces the PREPARE I would do from the concurrent_abort 
callback in a direct call to rb->prepare.  However, it misses the most 
important part: documentation.  Because this clearly is a surprising 
behavior for a transaction that's not fully decoded and guaranteed to 
get aborted.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Ajin Cherian

Date:

30 March 2021, 07:39:31

On Tue, Mar 30, 2021 at 5:30 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:

Hello Ajin,

On 30.03.21 06:48, Ajin Cherian wrote:
> For now, I've created a patch that addresses the problem reported using
> the existing callbacks.

Thanks.

> Do have a look if this fixes the problem reported.

Yes, this replaces the PREPARE I would do from the concurrent_abort
callback in a direct call to rb->prepare. However, it misses the most
important part: documentation. Because this clearly is a surprising
behavior for a transaction that's not fully decoded and guaranteed to
get aborted.

Where do you suggest this be documented? From an externally visible point of view, I dont see much of a surprise.

A transaction that was prepared and rolled back can be decoded to be prepared and rolled back with incomplete changes.

Are you suggesting more comments in code?

regards,

Ajin Cherian

Fujitsu Australia

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

30 March 2021, 08:10:54

On 30.03.21 09:39, Ajin Cherian wrote:
> Where do you suggest this be documented? From an externally visible 
> point of view, I dont see much of a surprise.

If you start to think about the option of committing a prepared 
transaction from a different node, the danger becomes immediately 
apparent:  A subscriber doesn't even know that the transaction is not 
complete.  How could it possibly know it's futile to COMMIT PREPARE it? 
  I think it's not just surprising, but outright dangerous to pretend 
having prepared the transaction, but potentially miss some of the changes.

(Essentially: do not assume the ROLLBACK PREPARED will make it to the 
subscriber.  There's no such guarantee.  The provider may crash, burn, 
and vanish before that happens.)

So I suggest to document this as a caveat for the prepare callback, 
because with this patch that's the callback which may be invoked for an 
incomplete transaction without the output plugin knowing.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

30 March 2021, 09:02:55

On Tue, Mar 30, 2021 at 12:00 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> Hello Ajin,
>
> On 30.03.21 06:48, Ajin Cherian wrote:
> > For now, I've created a patch that addresses the problem reported using
> > the existing callbacks.
>
> Thanks.
>
> > Do have a look if this fixes the problem reported.
>
> Yes, this replaces the PREPARE I would do from the concurrent_abort
> callback in a direct call to rb->prepare.
>

That sounds clearly a better choice. Because concurrent_abort()
internally trying to prepare transaction seems a bit ugly and not only
that if we want to go via that route, it needs to distinguish between
rollback to savepoint and rollback cases as well.

Now, we can try to find a way where for such cases we don't send
prepare/rollback prepare, but somehow detect it and send rollback
instead. And also some more checks will be required so that if we have
streamed the transaction then send stream_abort. I am not telling that
all this is not possible but I don't find worth making all such
checks.

>  However, it misses the most
> important part: documentation.  Because this clearly is a surprising
> behavior for a transaction that's not fully decoded and guaranteed to
> get aborted.
>

Yeah, I guess that makes sense to me. I think we can document it in
the docs [1] where we explained two-phase decoding. I think we can add
a point about concurrent aborts at the end of points mentioned in the
following paragraph: "The users that want to decode prepared
transactions need to be careful ....."

[1] - https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html

-- 
With Regards,
Amit Kapila.

Re: [PATCH] add concurrent_abort callback for output plugin

From

Ajin Cherian

Date:

30 March 2021, 09:12:57

On Tue, Mar 30, 2021 at 7:10 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:

On 30.03.21 09:39, Ajin Cherian wrote:
> Where do you suggest this be documented? From an externally visible
> point of view, I dont see much of a surprise.

So I suggest to document this as a caveat for the prepare callback,
because with this patch that's the callback which may be invoked for an
incomplete transaction without the output plugin knowing.

I found some documentation that already was talking about concurrent aborts and updated that.

Patch attached.

regards,

Ajin Cherian

Fujitsu Australia

Attachment

v2-0001-Make-sure-a-prepare-is-sent-when-decoder-detects-.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

30 March 2021, 09:54:31

On 30.03.21 11:02, Amit Kapila wrote:
> On Tue, Mar 30, 2021 at 12:00 PM Markus Wanner
>> Yes, this replaces the PREPARE I would do from the concurrent_abort
>> callback in a direct call to rb->prepare.
> 
> Because concurrent_abort()
> internally trying to prepare transaction seems a bit ugly and not only
> that if we want to go via that route, it needs to distinguish between
> rollback to savepoint and rollback cases as well.

Just to clarify: of course, the concurrent_abort callback only sends a 
message to the subscriber, which then (in our current implementation) 
upon reception of the concurrent_abort message opts to prepare the 
transaction.  Different implementations would be possible.

I would recommend this more explicit API and communication over hiding 
the concurrent abort in a prepare callback.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

30 March 2021, 11:29:43

On 30.03.21 11:12, Ajin Cherian wrote:
> I found some documentation that already was talking about concurrent 
> aborts and updated that.

Thanks.

I just noticed as of PG13, concurrent_abort is part of ReorderBufferTXN, 
so it seems the prepare_cb (or stream_prepare_cb) can actually figure a 
concurrent abort happened (and the transaction may be incomplete). 
That's good and indeed makes an additional callback unnecessary.

I recommend giving a hint to that field in the documentation as well.

> diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
> index 80eb96d..d2f8d39 100644
> --- a/doc/src/sgml/logicaldecoding.sgml
> +++ b/doc/src/sgml/logicaldecoding.sgml
> @@ -545,12 +545,14 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
>       executed within that transaction. A transaction that is prepared for
>       a two-phase commit using <command>PREPARE TRANSACTION</command> will
>       also be decoded if the output plugin callbacks needed for decoding
> -     them are provided. It is possible that the current transaction which
> +     them are provided. It is possible that the current prepared transaction which
>       is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
>       command. In that case, the logical decoding of this transaction will
> -     be aborted too. We will skip all the changes of such a transaction once
> -     the abort is detected and abort the transaction when we read WAL for
> -     <command>ROLLBACK PREPARED</command>.
> +     be aborted too. All the changes of such a transaction is skipped once

typo: changes [..] *are* skipped, plural.

> +     the abort is detected and the <function>prepare_cb</function> callback is invoked.
> +     This could result in a prepared transaction with incomplete changes.

... "in which case the <literal>concurrent_abort</literal> field of the 
passed <literal>ReorderBufferTXN</literal> struct is set.", as a proposal?

> +     This is done so that eventually when the <command>ROLLBACK PREPARED</command>
> +     is decoded, there is a corresponding prepared transaction with a matching gid.
>      </para>
>  
>      <note>

Everything else sounds good to me.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

30 March 2021, 11:31:36

On 30.03.21 11:54, Markus Wanner wrote:
> I would recommend this more explicit API and communication over hiding 
> the concurrent abort in a prepare callback.

I figured we already have the ReorderBufferTXN's concurrent_abort flag, 
thus I agree the prepare_cb is sufficient and revoke this recommendation 
(and the concurrent_abort callback patch).

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Ajin Cherian

Date:

31 March 2021, 01:12:46

On Tue, Mar 30, 2021 at 10:29 PM Markus Wanner <markus@bluegap.ch> wrote:

I just noticed as of PG13, concurrent_abort is part of ReorderBufferTXN,
so it seems the prepare_cb (or stream_prepare_cb) can actually figure a
concurrent abort happened (and the transaction may be incomplete).
That's good and indeed makes an additional callback unnecessary.

I recommend giving a hint to that field in the documentation as well.

> diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
> index 80eb96d..d2f8d39 100644
> --- a/doc/src/sgml/logicaldecoding.sgml
> +++ b/doc/src/sgml/logicaldecoding.sgml
> @@ -545,12 +545,14 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
> executed within that transaction. A transaction that is prepared for
> a two-phase commit using <command>PREPARE TRANSACTION</command> will
> also be decoded if the output plugin callbacks needed for decoding
> - them are provided. It is possible that the current transaction which
> + them are provided. It is possible that the current prepared transaction which
> is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
> command. In that case, the logical decoding of this transaction will
> - be aborted too. We will skip all the changes of such a transaction once
> - the abort is detected and abort the transaction when we read WAL for
> - <command>ROLLBACK PREPARED</command>.
> + be aborted too. All the changes of such a transaction is skipped once

typo: changes [..] *are* skipped, plural.

Updated.

> + the abort is detected and the <function>prepare_cb</function> callback is invoked.
> + This could result in a prepared transaction with incomplete changes.

... "in which case the <literal>concurrent_abort</literal> field of the
passed <literal>ReorderBufferTXN</literal> struct is set.", as a proposal?

> + This is done so that eventually when the <command>ROLLBACK PREPARED</command>
> + is decoded, there is a corresponding prepared transaction with a matching gid.
> </para>
>
> <note>

Everything else sounds good to me.

Updated.

regards,

Ajin Cherian

Fujitsu Australia

Attachment

v3-0001-Make-sure-a-prepare-is-sent-when-decoder-detects-.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

31 March 2021, 04:39:17

On Wed, Mar 31, 2021 at 6:42 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Updated.
>

I have slightly adjusted the comments, docs, and commit message. What
do you think about the attached?

-- 
With Regards,
Amit Kapila.

Attachment

v4-0001-Ensure-to-send-a-prepare-after-we-detect-concurre.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

31 March 2021, 06:25:36

On 31.03.21 06:39, Amit Kapila wrote:
> I have slightly adjusted the comments, docs, and commit message. What
> do you think about the attached?

Thank you both, Amit and Ajin.  This looks good to me.

Only one minor gripe:

> +     a prepared transaction with incomplete changes, in which case the
> +     <literal>concurrent_abort</literal> field of the passed
> +     <literal>ReorderBufferTXN</literal> struct is set. This is done so that
> +     eventually when the <command>ROLLBACK PREPARED</command> is decoded, there
> +     is a corresponding prepared transaction with a matching gid.

The last sentences there now seems to relate to just the setting of 
"concurrent_abort", rather than the whole reason to invoke the 
prepare_cb.  And the reference to the "gid" is a bit lost.  Maybe:

    "Thus even in case of a concurrent abort, enough information is
     provided to the output plugin for it to properly deal with the
     <command>ROLLBACK PREPARED</command> once that is decoded."

Alternatively, state that the gid is otherwise missing earlier in the 
docs (similar to how the commit message describes it).

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Ajin Cherian

Date:

31 March 2021, 09:09:58

On Wed, Mar 31, 2021 at 5:25 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:

The last sentences there now seems to relate to just the setting of
"concurrent_abort", rather than the whole reason to invoke the
prepare_cb. And the reference to the "gid" is a bit lost. Maybe:

"Thus even in case of a concurrent abort, enough information is
provided to the output plugin for it to properly deal with the
<command>ROLLBACK PREPARED</command> once that is decoded."

Alternatively, state that the gid is otherwise missing earlier in the
docs (similar to how the commit message describes it).

I'm fine with Amit's changes and like Markus's last suggestion as well.

regards,

Ajin Cherian

Fujitsu Australia

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

31 March 2021, 13:18:49

On Wed, Mar 31, 2021 at 11:55 AM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 31.03.21 06:39, Amit Kapila wrote:
> > I have slightly adjusted the comments, docs, and commit message. What
> > do you think about the attached?
>
> Thank you both, Amit and Ajin.  This looks good to me.
>
> Only one minor gripe:
>
> > +     a prepared transaction with incomplete changes, in which case the
> > +     <literal>concurrent_abort</literal> field of the passed
> > +     <literal>ReorderBufferTXN</literal> struct is set. This is done so that
> > +     eventually when the <command>ROLLBACK PREPARED</command> is decoded, there
> > +     is a corresponding prepared transaction with a matching gid.
>
> The last sentences there now seems to relate to just the setting of
> "concurrent_abort", rather than the whole reason to invoke the
> prepare_cb.  And the reference to the "gid" is a bit lost.  Maybe:
>
>     "Thus even in case of a concurrent abort, enough information is
>      provided to the output plugin for it to properly deal with the
>      <command>ROLLBACK PREPARED</command> once that is decoded."
>

Okay, Changed the patch accordingly.

-- 
With Regards,
Amit Kapila.

Attachment

v5-0001-Ensure-to-send-a-prepare-after-we-detect-concurre.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Markus Wanner

Date:

31 March 2021, 13:50:53

On 31.03.21 15:18, Amit Kapila wrote:
> On Wed, Mar 31, 2021 at 11:55 AM Markus Wanner
>> The last sentences there now seems to relate to just the setting of
>> "concurrent_abort", rather than the whole reason to invoke the
>> prepare_cb.  And the reference to the "gid" is a bit lost.  Maybe:
>>
>>      "Thus even in case of a concurrent abort, enough information is
>>       provided to the output plugin for it to properly deal with the
>>       <command>ROLLBACK PREPARED</command> once that is decoded."
> 
> Okay, Changed the patch accordingly.

That's fine with me.  I didn't necessarily mean to eliminate the hint to 
the concurrent_abort field, but it's more concise that way.  Thank you.

Regards

Markus

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

01 April 2021, 03:30:09

On Wed, Mar 31, 2021 at 7:20 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 31.03.21 15:18, Amit Kapila wrote:
> > On Wed, Mar 31, 2021 at 11:55 AM Markus Wanner
> >> The last sentences there now seems to relate to just the setting of
> >> "concurrent_abort", rather than the whole reason to invoke the
> >> prepare_cb.  And the reference to the "gid" is a bit lost.  Maybe:
> >>
> >>      "Thus even in case of a concurrent abort, enough information is
> >>       provided to the output plugin for it to properly deal with the
> >>       <command>ROLLBACK PREPARED</command> once that is decoded."
> >
> > Okay, Changed the patch accordingly.
>
> That's fine with me.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] add concurrent_abort callback for output plugin

From

Peter Smith

Date:

26 April 2021, 01:35:30

Hi,

While testing another WIP patch [1] a clashing GID problem was found,
which gives us apply worker errors like:

2021-04-26 10:07:12.883 AEST [22055] ERROR:  transaction identifier
"pg_gid_16403_608" is already in use
2021-04-26 10:08:05.149 AEST [22124] ERROR:  transaction identifier
"pg_gid_16403_757" is already in use

These GID clashes were traced back to a problem of the
concurrent-abort logic: when "streaming" is enabled the
concurrent-abort logic was always sending "prepare" even though a
"stream_prepare" had already been sent.

PSA a patch to correct this.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPuB07xOgJLnDhvbtp0t_qMDhjDD%2BkO%2B2yB%2Br6tgfaR-5Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v1-0001-Fix-concurrent-abort-for-when-streaming.patch

Re: [PATCH] add concurrent_abort callback for output plugin

From

Amit Kapila

Date:

26 April 2021, 08:12:32

On Mon, Apr 26, 2021 at 7:05 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi,
>
> While testing another WIP patch [1] a clashing GID problem was found,
> which gives us apply worker errors like:
>
> 2021-04-26 10:07:12.883 AEST [22055] ERROR:  transaction identifier
> "pg_gid_16403_608" is already in use
> 2021-04-26 10:08:05.149 AEST [22124] ERROR:  transaction identifier
> "pg_gid_16403_757" is already in use
>
> These GID clashes were traced back to a problem of the
> concurrent-abort logic: when "streaming" is enabled the
> concurrent-abort logic was always sending "prepare" even though a
> "stream_prepare" had already been sent.
>
> PSA a patch to correct this.
>

Your patch looks good to me, so pushed! Thanks for the report and patch.

-- 
With Regards,
Amit Kapila.