Re: repeated decoding of prepared transactions - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: repeated decoding of prepared transactions |
Date | |
Msg-id | CAA4eK1JCf+UQGf3XEMKR=H1SWhxN+bcXF=FgY1dix1gkJE1Vaw@mail.gmail.com Whole thread Raw |
In response to | Re: repeated decoding of prepared transactions (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Tue, Feb 16, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 11, 2021 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner > > <markus.wanner@enterprisedb.com> wrote: > > > > > Now, coming back to the restart case where the prepared transaction > > can be sent again by the publisher. I understand yours and others > > point that we should not send prepared transaction if there is a > > restart between prepare and commit but there are reasons why we have > > done that way and I am open to your suggestions. I'll once again try > > to explain the exact case to you which is not very apparent. The basic > > idea is that we ship/replay all transactions where commit happens > > after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see > > atop snapbuild.c for details. Now, for transactions where prepare is > > before snapshot state SNAPBUILD_CONSISTENT and commit prepared is > > after SNAPBUILD_CONSISTENT, we need to send the entire transaction > > including prepare at the commit time. One might think it is quite easy > > to detect that, basically if we skip prepare when the snapshot state > > was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and > > use the same to detect during commit and accordingly take the decision > > to send prepare but unfortunately it is not that easy. There is always > > a chance that on restart we reuse the snapshot serialized by some > > other Walsender at a location prior to Prepare and if that happens > > then this time the prepare won't be skipped due to snapshot state > > (SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering > > we have already shipped some of the later commits but not prepare). > > Now, this will actually become the same situation where the restart > > has happened after we have sent the prepare but not commit. This is > > the reason we have to resend the prepare when the subscriber restarts > > between prepare and commit. > > > > After further thinking on this problem and some off-list discussions > with Ajin, there appears to be another way to solve the above problem > by which we can avoid resending the prepare after restart if it has > already been processed by the subscriber. The main reason why we were > not able to distinguish between the two cases ((a) prepare happened > before SNAPBUILD_CONSISTENT state but commit prepared happened after > we reach SNAPBUILD_CONSISTENT state and (b) prepare is already > decoded, successfully processed by the subscriber and we have > restarted the decoding) is that we can re-use the serialized snapshot > at LSN location prior to Prepare of some concurrent WALSender after > the restart. Now, if we ensure that we don't use serialized snapshots > for decoding via slots where two_phase decoding option is enabled then > we won't have that problem. The drawback is that in some cases it can > take a bit more time for initial snapshot building but maybe that is > better than the current solution. > I see another thing which we need to address if we have to use the above solution. The issue is if initially the two-pc option for subscription is off and we skipped prepare because of that and then some unrelated commit happened which allowed start_decoding_at point to move ahead. And then the user enabled the two-pc option for the subscription, then we will again skip prepare because it is behind start_decoding_at point which becomes the same case where prepare seems to have already been sent. So, in such a situation with the above solution, we will miss sending the prepared transaction and its data and hence risk making replica out-of-sync. Now, this can be avoided if we don't allow users to alter the two-pc option once the subscription is created. I am not sure but maybe for the first version of this feature that might be okay and we can improve it later if we have better ideas. This will definitely allow us to avoid checks in the plugins and or apply-worker which seems like a good trade-off and it will address the concern most people have raised in this thread. Any thoughts? -- With Regards, Amit Kapila.
pgsql-hackers by date: