Thread: [PATCH] add concurrent_abort callback for output plugin
Hi, here is another tidbit from our experience with using logical decoding. The attached patch adds a callback to notify the output plugin of a concurrent abort. I'll continue to describe the problem in more detail and how this additional callback solves it. Streamed transactions as well as two-phase commit transactions may get decoded before they finish. At the point the begin_cb is invoked and first changes are delivered to the output plugin, it is not necessarily known whether the transaction will commit or abort. This leads to the possibility of the transaction getting aborted concurrent to logical decoding. In that case, it is likely for the decoder to error on a catalog scan that conflicts with the abort of the transaction. The reorderbuffer sports a PG_CATCH block to cleanup. However, it does not currently inform the output plugin. From its point of view, the transaction is left dangling until another one comes along or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets decoded. Therefore, what the output plugin might see in this case is: * filter_prepare_cb (txn A) * begin_prepare_cb (txn A) * apply_change (txn A) * apply_change (txn A) * apply_change (txn A) * begin_cb (txn B) In other words, in this example, only the begin_cb of the following transaction implicitly tells the output plugin that txn A could not be fully decoded. And there's no upper time boundary on when that may happen. (It could also be another filter_prepare_cb, if the subsequent transaction happens to be a two-phase transaction as well. Or an explicit rollback_prepared_cb or stream_abort if there's no other transaction in between.) An alternative and arguably cleaner approach for streamed transactions may be to directly invoke stream_abort. However, the lsn argument passed could not be that of the abort record, as that's not known at the point in time of the concurrent abort. Plus, this seems like a bad fit for two-phase commit transactions. Again, this callback is especially important for output plugins that invoke further actions on downstream nodes that delay the COMMIT PREPARED of a transaction upstream, e.g. until prepared on other nodes. Up until now, the output plugin has no way to learn about a concurrent abort of the currently decoded (2PC or streamed) transaction (perhaps short of continued polling on the transaction status). I also think it generally improves the API by allowing the output plugin to rely on such a callback, rather than having to implicitly deduce this from other callbacks. Thoughts or comments? If this is agreed on, I can look into adding tests (concurrent aborts are not currently covered, it seems). Regards Markus
Attachment
Hi, On 2021-03-25 10:07:28 +0100, Markus Wanner wrote: > This leads to the possibility of the transaction getting aborted concurrent > to logical decoding. In that case, it is likely for the decoder to error on > a catalog scan that conflicts with the abort of the transaction. The > reorderbuffer sports a PG_CATCH block to cleanup. FWIW, I am seriously suspicuous of the code added as part of 7259736a6e5b7c75 and plan to look at it after the code freeze. I can't really see this code surviving as is. The tableam.h changes, the bsyscan stuff, ... Leaving correctness aside, the code bloat and performance affects alone seems problematic. > Again, this callback is especially important for output plugins that invoke > further actions on downstream nodes that delay the COMMIT PREPARED of a > transaction upstream, e.g. until prepared on other nodes. Up until now, the > output plugin has no way to learn about a concurrent abort of the currently > decoded (2PC or streamed) transaction (perhaps short of continued polling on > the transaction status). You may have only meant it as a shorthand: But imo output plugins have absolutely no business "invoking actions downstream". > diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c > index c291b05a423..a6d044b870b 100644 > --- a/src/backend/replication/logical/reorderbuffer.c > +++ b/src/backend/replication/logical/reorderbuffer.c > @@ -2488,6 +2488,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, > errdata = NULL; > curtxn->concurrent_abort = true; > > + /* > + * Call the cleanup hook to inform the output plugin that the > + * transaction just started had to be aborted. > + */ > + rb->concurrent_abort(rb, txn, streaming, commit_lsn); > + > /* Reset the TXN so that it is allowed to stream remaining data. */ > ReorderBufferResetTXN(rb, txn, snapshot_now, > command_id, prev_lsn, I don't think this would be ok, errors thrown in the callback wouldn't be handled as they would be in other callbacks. Greetings, Andres Freund
On 25.03.21 21:21, Andres Freund wrote: > ... the code added as part of 7259736a6e5b7c75 ... That's the streaming part, which can be thought of as a more general variant of the two-phase decoding in that it allows multiple "flush points" (invoking ReorderBufferProcessTXN). Unlike the PREPARE of a two-phase commit, where the reorderbuffer can be sure there's no further change to be processed after the PREPARE. Nor is there any invocation of ReorderBufferProcessTXN before that fist one at PREPARE time. With that in mind, I'm surprised support for streaming got committed before 2PC. It clearly has different use cases, though. However, I'm sure your inputs on how to improve and cleanup the implementation will be appreciated. The single tiny problem this patch addresses is the same for 2PC and streaming decoding: the output plugin currently has no way to learn about a concurrent abort of a transaction still being decoded, at the time this happens. Both, 2PC and streaming do require the reorderbuffer to forward changes (possibly) prior to the transaction's commit. That's the whole point of these two features. Therefore, I don't think we can get around concurrent aborts. > You may have only meant it as a shorthand: But imo output plugins have > absolutely no business "invoking actions downstream". From my point of view, that's the raison d'être for an output plugin. Even if it does so merely by forwarding messages. But yeah, of course a whole bunch of other components and changes are needed to implement the kind of global two-phase commit system I tried to describe. I'm open to suggestions on how to reference that use case. >> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c >> index c291b05a423..a6d044b870b 100644 >> --- a/src/backend/replication/logical/reorderbuffer.c >> +++ b/src/backend/replication/logical/reorderbuffer.c >> @@ -2488,6 +2488,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, >> errdata = NULL; >> curtxn->concurrent_abort = true; >> >> + /* >> + * Call the cleanup hook to inform the output plugin that the >> + * transaction just started had to be aborted. >> + */ >> + rb->concurrent_abort(rb, txn, streaming, commit_lsn); >> + >> /* Reset the TXN so that it is allowed to stream remaining data. */ >> ReorderBufferResetTXN(rb, txn, snapshot_now, >> command_id, prev_lsn, > > I don't think this would be ok, errors thrown in the callback wouldn't > be handled as they would be in other callbacks. That's a good point. Maybe the CATCH block should only set a flag, allowing for the callback to be invoked outside of it. Regards Markus my-callbacks-do-not-throw-error Wanner
On Thu, Mar 25, 2021 at 2:37 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > here is another tidbit from our experience with using logical decoding. > The attached patch adds a callback to notify the output plugin of a > concurrent abort. I'll continue to describe the problem in more detail > and how this additional callback solves it. > > Streamed transactions as well as two-phase commit transactions may get > decoded before they finish. At the point the begin_cb is invoked and > first changes are delivered to the output plugin, it is not necessarily > known whether the transaction will commit or abort. > > This leads to the possibility of the transaction getting aborted > concurrent to logical decoding. In that case, it is likely for the > decoder to error on a catalog scan that conflicts with the abort of the > transaction. The reorderbuffer sports a PG_CATCH block to cleanup. > However, it does not currently inform the output plugin. From its point > of view, the transaction is left dangling until another one comes along > or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets > decoded. Therefore, what the output plugin might see in this case is: > > * filter_prepare_cb (txn A) > * begin_prepare_cb (txn A) > * apply_change (txn A) > * apply_change (txn A) > * apply_change (txn A) > * begin_cb (txn B) > > In other words, in this example, only the begin_cb of the following > transaction implicitly tells the output plugin that txn A could not be > fully decoded. And there's no upper time boundary on when that may > happen. (It could also be another filter_prepare_cb, if the subsequent > transaction happens to be a two-phase transaction as well. Or an > explicit rollback_prepared_cb or stream_abort if there's no other > transaction in between.) > > An alternative and arguably cleaner approach for streamed transactions > may be to directly invoke stream_abort. However, the lsn argument > passed could not be that of the abort record, as that's not known at the > point in time of the concurrent abort. Plus, this seems like a bad fit > for two-phase commit transactions. > > Again, this callback is especially important for output plugins that > invoke further actions on downstream nodes that delay the COMMIT > PREPARED of a transaction upstream, e.g. until prepared on other nodes. > Up until now, the output plugin has no way to learn about a concurrent > abort of the currently decoded (2PC or streamed) transaction (perhaps > short of continued polling on the transaction status). > I think as you have noted that stream_abort or rollback_prepared will be sent (the remaining changes in-between will be skipped) as we decode them from WAL so it is not clear to me how it causes any delays as opposed to where we don't detect concurrent abort say because after that we didn't access catalog table. -- With Regards, Amit Kapila.
On 26.03.21 04:28, Amit Kapila wrote: > I think as you have noted that stream_abort or rollback_prepared will > be sent (the remaining changes in-between will be skipped) as we > decode them from WAL Yes, but as outlined, too late. Multiple other transactions may get decoded until the decoder reaches the ROLLBACK PREPARED. Thus, effectively, the output plugin currently needs to deduce that a transaction got aborted concurrently from one out of half a dozen other callbacks that may trigger right after that transaction, because it will only get closed properly much later. > so it is not clear to me how it causes any delays > as opposed to where we don't detect concurrent abort say because after > that we didn't access catalog table. You're assuming very little traffic, where the ROLLBACK ABORT follows the PREPARE immediately in WAL. On a busy system, chances for that to happen are rather low. (I think the same is true for streaming and stream_abort being sent only at the time the decoder reaches the ROLLBACK record in WAL. However, I did not try. Unlike 2PC, where this actually bit me.) Regards Markus
On Fri, Mar 26, 2021 at 2:42 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > On 26.03.21 04:28, Amit Kapila wrote: > > I think as you have noted that stream_abort or rollback_prepared will > > be sent (the remaining changes in-between will be skipped) as we > > decode them from WAL > > Yes, but as outlined, too late. Multiple other transactions may get > decoded until the decoder reaches the ROLLBACK PREPARED. Thus, > effectively, the output plugin currently needs to deduce that a > transaction got aborted concurrently from one out of half a dozen other > callbacks that may trigger right after that transaction, because it will > only get closed properly much later. > > > so it is not clear to me how it causes any delays > > as opposed to where we don't detect concurrent abort say because after > > that we didn't access catalog table. > > You're assuming very little traffic, where the ROLLBACK ABORT follows > the PREPARE immediately in WAL. On a busy system, chances for that to > happen are rather low. > No, I am not assuming that. I am just trying to describe you that it is not necessary that we will be able to detect concurrent abort in each and every case. Say if any transaction operates on one relation and concurrent abort happens after first access of relation then we won't access catalog and hence won't detect abort. In such cases, you will get the abort only when it happens in WAL. So, why try to get earlier in some cases when it is not guaranteed in every case. Also, what will you do when you receive actual Rollback, may be the plugin can throw it by checking in some way that it has already aborted the transaction, if so, that sounds a bit awkward to me. The other related thing is it may not be a good idea to finish the transaction before we see its actual WAL record because after the client (say subscriber) finishes xact, it sends the updated LSN location based on which we update the slot LSNs from where it will start decoding next time after restart, so by bad timing it might try to decode the contents of same transaction but may be for concurrent_aborts the plugin might arrange such that client won't send updated LSN. > (I think the same is true for streaming and stream_abort being sent only > at the time the decoder reaches the ROLLBACK record in WAL. However, I > did not try. > Yes, both streaming and 2PC behaves in a similar way in this regard. -- With Regards, Amit Kapila.
On 26.03.21 11:19, Amit Kapila wrote: > No, I am not assuming that. I am just trying to describe you that it > is not necessary that we will be able to detect concurrent abort in > each and every case. Sure. Nor am I claiming that would be necessary or that the patch changed anything about it. As it stands, assuming the the output plugin basically just forwards the events and the subscriber tries to replicate them as is, the following would happen on the subscriber for a concurrently aborted two-phase transaction: * start a transaction (begin_prepare_cb) * apply changes for it (change_cb) * digress to other, unrelated transactions (leaving unspecified what exactly happens to the opened transaction) * attempt to rollback a transaction that has not ever been prepared (rollback_prepared_cb) The point of the patch is for the output plugin to get proper transaction entry and exit callbacks. Even in the unfortunate case of a concurrent abort. It offers the output plugin a clean way to learn that the decoder stopped decoding for the current transaction and it won't possibly see a prepare_cb for it (despite the decoder having passed the PREPARE record in WAL). > The other related thing is it may not be a good idea to finish the > transaction You're speaking subscriber side here. And yes, I agree, the subscriber should not abort the transaction at a concurrent_abort. I never claimed it should. If you are curious, in our case I made the subscriber PREPARE the transaction at its end when receiving a concurrent_abort notification, so that the subscriber: * can hop out of that started transaction and safely proceed to process events for other transactions, and * has the transaction in the appropriate state for processing the subsequent rollback_prepared_cb, once that gets through That's probably not ideal in the sense that subscribers do unnecessary work. However, it pretty closely replicates the transaction's state as it was on the origin at any given point in time (by LSN). Regards Markus
On Fri, Mar 26, 2021 at 5:50 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > On 26.03.21 11:19, Amit Kapila wrote: > > No, I am not assuming that. I am just trying to describe you that it > > is not necessary that we will be able to detect concurrent abort in > > each and every case. > > Sure. Nor am I claiming that would be necessary or that the patch > changed anything about it. > > As it stands, assuming the the output plugin basically just forwards the > events and the subscriber tries to replicate them as is, the following > would happen on the subscriber for a concurrently aborted two-phase > transaction: > > * start a transaction (begin_prepare_cb) > * apply changes for it (change_cb) > * digress to other, unrelated transactions (leaving unspecified what > exactly happens to the opened transaction) > * attempt to rollback a transaction that has not ever been prepared > (rollback_prepared_cb) > > The point of the patch is for the output plugin to get proper > transaction entry and exit callbacks. Even in the unfortunate case of a > concurrent abort. It offers the output plugin a clean way to learn that > the decoder stopped decoding for the current transaction and it won't > possibly see a prepare_cb for it (despite the decoder having passed the > PREPARE record in WAL). > > > The other related thing is it may not be a good idea to finish the > > transaction > > You're speaking subscriber side here. And yes, I agree, the subscriber > should not abort the transaction at a concurrent_abort. I never claimed > it should. > > If you are curious, in our case I made the subscriber PREPARE the > transaction at its end when receiving a concurrent_abort notification, > so that the subscriber: > > * can hop out of that started transaction and safely proceed > to process events for other transactions, and > * has the transaction in the appropriate state for processing the > subsequent rollback_prepared_cb, once that gets through > > That's probably not ideal in the sense that subscribers do unnecessary > work. > Isn't it better to send prepare from the publisher in such a case so that subscribers can know about it when rollback prepared arrives? I think we have already done the same (sent prepare, exactly to handle the case you have described above) for *streamed* transactions. -- With Regards, Amit Kapila.
On 27.03.21 07:37, Amit Kapila wrote: > Isn't it better to send prepare from the publisher in such a case so > that subscribers can know about it when rollback prepared arrives? That's exactly what this callback allows (among other options). It provides a way for the output plugin to react to a transaction aborting while it is being decoded. This would not be possible without this additional callback. Also note that I would like to retain the option to do some basic protocol validity checks. Certain messages only make sense within a transaction ('U'pdate, 'C'ommit). Others are only valid outside of a transaction ('B'egin, begin_prepare_cb). This is only possible if the output plugin has a callback for every entry into and exit out of a transaction (being decoded). This used to be the case prior to 2PC decoding and this patch re-establishes that. > I think we have already done the same (sent prepare, exactly to handle > the case you have described above) for *streamed* transactions. Where can I find that? ISTM streaming transactions have the same issue: the output plugin does not (or only implicitly) learn about a concurrent abort of the transaction. Regards Markus
On Mon, Mar 29, 2021 at 12:36 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > On 27.03.21 07:37, Amit Kapila wrote: > > Isn't it better to send prepare from the publisher in such a case so > > that subscribers can know about it when rollback prepared arrives? > > That's exactly what this callback allows (among other options). It > provides a way for the output plugin to react to a transaction aborting > while it is being decoded. This would not be possible without this > additional callback. > You don't need an additional callback for that if we do what I am suggesting above. > Also note that I would like to retain the option to do some basic > protocol validity checks. Certain messages only make sense within a > transaction ('U'pdate, 'C'ommit). Others are only valid outside of a > transaction ('B'egin, begin_prepare_cb). This is only possible if the > output plugin has a callback for every entry into and exit out of a > transaction (being decoded). This used to be the case prior to 2PC > decoding and this patch re-establishes that. > > > I think we have already done the same (sent prepare, exactly to handle > > the case you have described above) for *streamed* transactions. > > Where can I find that? ISTM streaming transactions have the same issue: > the output plugin does not (or only implicitly) learn about a concurrent > abort of the transaction. > One is you can try to test it, otherwise, there are comments atop DecodePrepare() ("Note that we don't skip prepare even if have detected concurrent abort because it is quite possible that ....") which explains this. -- With Regards, Amit Kapila.
On 29.03.21 11:33, Amit Kapila wrote: > You don't need an additional callback for that if we do what I am > suggesting above. Ah, are you suggesting a different change, then? To make two-phase transactions always send PREPARE even if concurrently aborted? In that case, sorry, I misunderstood. I'm perfectly fine with that approach as well (even though it removes flexibility compared to the concurrent abort callback, as the comment above DecodePrepare indicates, i.e. "not impossible to optimize the concurrent abort case"). > One is you can try to test it, otherwise, there are comments atop > DecodePrepare() ("Note that we don't skip prepare even if have > detected concurrent abort because it is quite possible that ....") > which explains this. Thanks for this pointer, very helpful. Regards Markus
On Mon, Mar 29, 2021 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 29, 2021 at 12:36 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 27.03.21 07:37, Amit Kapila wrote:
> > Isn't it better to send prepare from the publisher in such a case so
> > that subscribers can know about it when rollback prepared arrives?
Nice catch, Markus.
Interesting suggestion Amit. Let me try and code this.
regards,
Ajin Cherian
Fujitsu Australia
On 29.03.21 13:02, Ajin Cherian wrote: > Nice catch, Markus. > Interesting suggestion Amit. Let me try and code this. Thanks, Ajin. Please consider this concurrent_abort callback as well. I think it provides more flexibility for the output plugin and I would therefore prefer it over a solution that hides this. It clearly makes all potential optimizations impossible, as it means the output plugin cannot distinguish between a proper PREAPRE and a bail-out PREPARE (that does not fully replicate the PREPARE as on the origin node, either, which I think is dangerous). Regards Markus
On Mon, Mar 29, 2021 at 10:09 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:
On 29.03.21 13:02, Ajin Cherian wrote:
> Nice catch, Markus.
> Interesting suggestion Amit. Let me try and code this.
Thanks, Ajin. Please consider this concurrent_abort callback as well.
I think it provides more flexibility for the output plugin and I would
therefore prefer it over a solution that hides this. It clearly makes
all potential optimizations impossible, as it means the output plugin
cannot distinguish between a proper PREAPRE and a bail-out PREPARE (that
does not fully replicate the PREPARE as on the origin node, either,
which I think is dangerous).
I understand your concern Markus, but I will leave it to one of the committers to decide on the new callback.
For now, I've created a patch that addresses the problem reported using the existing callbacks.
Do have a look if this fixes the problem reported.
regards,
Ajin Cherian
Fujitsu Australia
Attachment
Hello Ajin, On 30.03.21 06:48, Ajin Cherian wrote: > For now, I've created a patch that addresses the problem reported using > the existing callbacks. Thanks. > Do have a look if this fixes the problem reported. Yes, this replaces the PREPARE I would do from the concurrent_abort callback in a direct call to rb->prepare. However, it misses the most important part: documentation. Because this clearly is a surprising behavior for a transaction that's not fully decoded and guaranteed to get aborted. Regards Markus
On Tue, Mar 30, 2021 at 5:30 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:
Hello Ajin,
On 30.03.21 06:48, Ajin Cherian wrote:
> For now, I've created a patch that addresses the problem reported using
> the existing callbacks.
Thanks.
> Do have a look if this fixes the problem reported.
Yes, this replaces the PREPARE I would do from the concurrent_abort
callback in a direct call to rb->prepare. However, it misses the most
important part: documentation. Because this clearly is a surprising
behavior for a transaction that's not fully decoded and guaranteed to
get aborted.
Where do you suggest this be documented? From an externally visible point of view, I dont see much of a surprise.
A transaction that was prepared and rolled back can be decoded to be prepared and rolled back with incomplete changes.
Are you suggesting more comments in code?
regards,
Ajin Cherian
Fujitsu Australia
On 30.03.21 09:39, Ajin Cherian wrote: > Where do you suggest this be documented? From an externally visible > point of view, I dont see much of a surprise. If you start to think about the option of committing a prepared transaction from a different node, the danger becomes immediately apparent: A subscriber doesn't even know that the transaction is not complete. How could it possibly know it's futile to COMMIT PREPARE it? I think it's not just surprising, but outright dangerous to pretend having prepared the transaction, but potentially miss some of the changes. (Essentially: do not assume the ROLLBACK PREPARED will make it to the subscriber. There's no such guarantee. The provider may crash, burn, and vanish before that happens.) So I suggest to document this as a caveat for the prepare callback, because with this patch that's the callback which may be invoked for an incomplete transaction without the output plugin knowing. Regards Markus
On Tue, Mar 30, 2021 at 12:00 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > Hello Ajin, > > On 30.03.21 06:48, Ajin Cherian wrote: > > For now, I've created a patch that addresses the problem reported using > > the existing callbacks. > > Thanks. > > > Do have a look if this fixes the problem reported. > > Yes, this replaces the PREPARE I would do from the concurrent_abort > callback in a direct call to rb->prepare. > That sounds clearly a better choice. Because concurrent_abort() internally trying to prepare transaction seems a bit ugly and not only that if we want to go via that route, it needs to distinguish between rollback to savepoint and rollback cases as well. Now, we can try to find a way where for such cases we don't send prepare/rollback prepare, but somehow detect it and send rollback instead. And also some more checks will be required so that if we have streamed the transaction then send stream_abort. I am not telling that all this is not possible but I don't find worth making all such checks. > However, it misses the most > important part: documentation. Because this clearly is a surprising > behavior for a transaction that's not fully decoded and guaranteed to > get aborted. > Yeah, I guess that makes sense to me. I think we can document it in the docs [1] where we explained two-phase decoding. I think we can add a point about concurrent aborts at the end of points mentioned in the following paragraph: "The users that want to decode prepared transactions need to be careful ....." [1] - https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html -- With Regards, Amit Kapila.
On Tue, Mar 30, 2021 at 7:10 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:
On 30.03.21 09:39, Ajin Cherian wrote:
> Where do you suggest this be documented? From an externally visible
> point of view, I dont see much of a surprise.
So I suggest to document this as a caveat for the prepare callback,
because with this patch that's the callback which may be invoked for an
incomplete transaction without the output plugin knowing.
I found some documentation that already was talking about concurrent aborts and updated that.
Patch attached.
regards,
Ajin Cherian
Fujitsu Australia
Attachment
On 30.03.21 11:02, Amit Kapila wrote: > On Tue, Mar 30, 2021 at 12:00 PM Markus Wanner >> Yes, this replaces the PREPARE I would do from the concurrent_abort >> callback in a direct call to rb->prepare. > > Because concurrent_abort() > internally trying to prepare transaction seems a bit ugly and not only > that if we want to go via that route, it needs to distinguish between > rollback to savepoint and rollback cases as well. Just to clarify: of course, the concurrent_abort callback only sends a message to the subscriber, which then (in our current implementation) upon reception of the concurrent_abort message opts to prepare the transaction. Different implementations would be possible. I would recommend this more explicit API and communication over hiding the concurrent abort in a prepare callback. Regards Markus
On 30.03.21 11:12, Ajin Cherian wrote: > I found some documentation that already was talking about concurrent > aborts and updated that. Thanks. I just noticed as of PG13, concurrent_abort is part of ReorderBufferTXN, so it seems the prepare_cb (or stream_prepare_cb) can actually figure a concurrent abort happened (and the transaction may be incomplete). That's good and indeed makes an additional callback unnecessary. I recommend giving a hint to that field in the documentation as well. > diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml > index 80eb96d..d2f8d39 100644 > --- a/doc/src/sgml/logicaldecoding.sgml > +++ b/doc/src/sgml/logicaldecoding.sgml > @@ -545,12 +545,14 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true); > executed within that transaction. A transaction that is prepared for > a two-phase commit using <command>PREPARE TRANSACTION</command> will > also be decoded if the output plugin callbacks needed for decoding > - them are provided. It is possible that the current transaction which > + them are provided. It is possible that the current prepared transaction which > is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command> > command. In that case, the logical decoding of this transaction will > - be aborted too. We will skip all the changes of such a transaction once > - the abort is detected and abort the transaction when we read WAL for > - <command>ROLLBACK PREPARED</command>. > + be aborted too. All the changes of such a transaction is skipped once typo: changes [..] *are* skipped, plural. > + the abort is detected and the <function>prepare_cb</function> callback is invoked. > + This could result in a prepared transaction with incomplete changes. ... "in which case the <literal>concurrent_abort</literal> field of the passed <literal>ReorderBufferTXN</literal> struct is set.", as a proposal? > + This is done so that eventually when the <command>ROLLBACK PREPARED</command> > + is decoded, there is a corresponding prepared transaction with a matching gid. > </para> > > <note> Everything else sounds good to me. Regards Markus
On 30.03.21 11:54, Markus Wanner wrote: > I would recommend this more explicit API and communication over hiding > the concurrent abort in a prepare callback. I figured we already have the ReorderBufferTXN's concurrent_abort flag, thus I agree the prepare_cb is sufficient and revoke this recommendation (and the concurrent_abort callback patch). Regards Markus
On Tue, Mar 30, 2021 at 10:29 PM Markus Wanner <markus@bluegap.ch> wrote:
I just noticed as of PG13, concurrent_abort is part of ReorderBufferTXN,
so it seems the prepare_cb (or stream_prepare_cb) can actually figure a
concurrent abort happened (and the transaction may be incomplete).
That's good and indeed makes an additional callback unnecessary.
I recommend giving a hint to that field in the documentation as well.
> diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
> index 80eb96d..d2f8d39 100644
> --- a/doc/src/sgml/logicaldecoding.sgml
> +++ b/doc/src/sgml/logicaldecoding.sgml
> @@ -545,12 +545,14 @@ CREATE TABLE another_catalog_table(data text) WITH (user_catalog_table = true);
> executed within that transaction. A transaction that is prepared for
> a two-phase commit using <command>PREPARE TRANSACTION</command> will
> also be decoded if the output plugin callbacks needed for decoding
> - them are provided. It is possible that the current transaction which
> + them are provided. It is possible that the current prepared transaction which
> is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
> command. In that case, the logical decoding of this transaction will
> - be aborted too. We will skip all the changes of such a transaction once
> - the abort is detected and abort the transaction when we read WAL for
> - <command>ROLLBACK PREPARED</command>.
> + be aborted too. All the changes of such a transaction is skipped once
typo: changes [..] *are* skipped, plural.
Updated.
> + the abort is detected and the <function>prepare_cb</function> callback is invoked.
> + This could result in a prepared transaction with incomplete changes.
... "in which case the <literal>concurrent_abort</literal> field of the
passed <literal>ReorderBufferTXN</literal> struct is set.", as a proposal?
> + This is done so that eventually when the <command>ROLLBACK PREPARED</command>
> + is decoded, there is a corresponding prepared transaction with a matching gid.
> </para>
>
> <note>
Everything else sounds good to me.
Updated.
regards,
Ajin Cherian
Fujitsu Australia
Attachment
On Wed, Mar 31, 2021 at 6:42 AM Ajin Cherian <itsajin@gmail.com> wrote: > > Updated. > I have slightly adjusted the comments, docs, and commit message. What do you think about the attached? -- With Regards, Amit Kapila.
Attachment
On 31.03.21 06:39, Amit Kapila wrote: > I have slightly adjusted the comments, docs, and commit message. What > do you think about the attached? Thank you both, Amit and Ajin. This looks good to me. Only one minor gripe: > + a prepared transaction with incomplete changes, in which case the > + <literal>concurrent_abort</literal> field of the passed > + <literal>ReorderBufferTXN</literal> struct is set. This is done so that > + eventually when the <command>ROLLBACK PREPARED</command> is decoded, there > + is a corresponding prepared transaction with a matching gid. The last sentences there now seems to relate to just the setting of "concurrent_abort", rather than the whole reason to invoke the prepare_cb. And the reference to the "gid" is a bit lost. Maybe: "Thus even in case of a concurrent abort, enough information is provided to the output plugin for it to properly deal with the <command>ROLLBACK PREPARED</command> once that is decoded." Alternatively, state that the gid is otherwise missing earlier in the docs (similar to how the commit message describes it). Regards Markus
On Wed, Mar 31, 2021 at 5:25 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:
The last sentences there now seems to relate to just the setting of
"concurrent_abort", rather than the whole reason to invoke the
prepare_cb. And the reference to the "gid" is a bit lost. Maybe:
"Thus even in case of a concurrent abort, enough information is
provided to the output plugin for it to properly deal with the
<command>ROLLBACK PREPARED</command> once that is decoded."
Alternatively, state that the gid is otherwise missing earlier in the
docs (similar to how the commit message describes it).
I'm fine with Amit's changes and like Markus's last suggestion as well.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Mar 31, 2021 at 11:55 AM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > On 31.03.21 06:39, Amit Kapila wrote: > > I have slightly adjusted the comments, docs, and commit message. What > > do you think about the attached? > > Thank you both, Amit and Ajin. This looks good to me. > > Only one minor gripe: > > > + a prepared transaction with incomplete changes, in which case the > > + <literal>concurrent_abort</literal> field of the passed > > + <literal>ReorderBufferTXN</literal> struct is set. This is done so that > > + eventually when the <command>ROLLBACK PREPARED</command> is decoded, there > > + is a corresponding prepared transaction with a matching gid. > > The last sentences there now seems to relate to just the setting of > "concurrent_abort", rather than the whole reason to invoke the > prepare_cb. And the reference to the "gid" is a bit lost. Maybe: > > "Thus even in case of a concurrent abort, enough information is > provided to the output plugin for it to properly deal with the > <command>ROLLBACK PREPARED</command> once that is decoded." > Okay, Changed the patch accordingly. -- With Regards, Amit Kapila.
Attachment
On 31.03.21 15:18, Amit Kapila wrote: > On Wed, Mar 31, 2021 at 11:55 AM Markus Wanner >> The last sentences there now seems to relate to just the setting of >> "concurrent_abort", rather than the whole reason to invoke the >> prepare_cb. And the reference to the "gid" is a bit lost. Maybe: >> >> "Thus even in case of a concurrent abort, enough information is >> provided to the output plugin for it to properly deal with the >> <command>ROLLBACK PREPARED</command> once that is decoded." > > Okay, Changed the patch accordingly. That's fine with me. I didn't necessarily mean to eliminate the hint to the concurrent_abort field, but it's more concise that way. Thank you. Regards Markus
On Wed, Mar 31, 2021 at 7:20 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > On 31.03.21 15:18, Amit Kapila wrote: > > On Wed, Mar 31, 2021 at 11:55 AM Markus Wanner > >> The last sentences there now seems to relate to just the setting of > >> "concurrent_abort", rather than the whole reason to invoke the > >> prepare_cb. And the reference to the "gid" is a bit lost. Maybe: > >> > >> "Thus even in case of a concurrent abort, enough information is > >> provided to the output plugin for it to properly deal with the > >> <command>ROLLBACK PREPARED</command> once that is decoded." > > > > Okay, Changed the patch accordingly. > > That's fine with me. > Pushed. -- With Regards, Amit Kapila.
Hi, While testing another WIP patch [1] a clashing GID problem was found, which gives us apply worker errors like: 2021-04-26 10:07:12.883 AEST [22055] ERROR: transaction identifier "pg_gid_16403_608" is already in use 2021-04-26 10:08:05.149 AEST [22124] ERROR: transaction identifier "pg_gid_16403_757" is already in use These GID clashes were traced back to a problem of the concurrent-abort logic: when "streaming" is enabled the concurrent-abort logic was always sending "prepare" even though a "stream_prepare" had already been sent. PSA a patch to correct this. ------ [1] https://www.postgresql.org/message-id/CAHut%2BPuB07xOgJLnDhvbtp0t_qMDhjDD%2BkO%2B2yB%2Br6tgfaR-5Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Mon, Apr 26, 2021 at 7:05 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi, > > While testing another WIP patch [1] a clashing GID problem was found, > which gives us apply worker errors like: > > 2021-04-26 10:07:12.883 AEST [22055] ERROR: transaction identifier > "pg_gid_16403_608" is already in use > 2021-04-26 10:08:05.149 AEST [22124] ERROR: transaction identifier > "pg_gid_16403_757" is already in use > > These GID clashes were traced back to a problem of the > concurrent-abort logic: when "streaming" is enabled the > concurrent-abort logic was always sending "prepare" even though a > "stream_prepare" had already been sent. > > PSA a patch to correct this. > Your patch looks good to me, so pushed! Thanks for the report and patch. -- With Regards, Amit Kapila.