Re: repeated decoding of prepared transactions - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: repeated decoding of prepared transactions |
Date | |
Msg-id | CAA4eK1L5aX1BL9Xg-wSULbFeB417G0v9qk5qZ6NbYCkCo6JUGQ@mail.gmail.com Whole thread Raw |
In response to | repeated decoding of prepared transactions (Markus Wanner <markus.wanner@enterprisedb.com>) |
Responses |
Re: repeated decoding of prepared transactions
Re: repeated decoding of prepared transactions Re: repeated decoding of prepared transactions |
List | pgsql-hackers |
On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote: > > I did not expect this, as any receiver that wants to have decoded 2PC is > likely supporting some kind of two-phase commits itself. And would > therefore prepare the transaction upon its first reception. Potentially > receiving it a second time would require complicated filtering on every > prepared transaction. > I would like to bring one other scenario to your notice where you might want to handle things differently for prepared transactions on the plugin side. Assume we have multiple publications (for simplicity say 2) on publisher with corresponding subscriptions (say 2, each corresponding to one publication on the publisher). When a user performs a transaction on a publisher that involves the tables from both publications, on the subscriber-side, we do that work via two different transactions, corresponding to each subscription. But, we need some way to deal with prepared xacts because they need GID and we can't use the same GID for both subscriptions. Please see the detailed example and one idea to deal with the same in the main thread[1]. It would be really helpful if you or others working on the plugin side can share your opinion on the same. Now, coming back to the restart case where the prepared transaction can be sent again by the publisher. I understand yours and others point that we should not send prepared transaction if there is a restart between prepare and commit but there are reasons why we have done that way and I am open to your suggestions. I'll once again try to explain the exact case to you which is not very apparent. The basic idea is that we ship/replay all transactions where commit happens after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see atop snapbuild.c for details. Now, for transactions where prepare is before snapshot state SNAPBUILD_CONSISTENT and commit prepared is after SNAPBUILD_CONSISTENT, we need to send the entire transaction including prepare at the commit time. One might think it is quite easy to detect that, basically if we skip prepare when the snapshot state was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and use the same to detect during commit and accordingly take the decision to send prepare but unfortunately it is not that easy. There is always a chance that on restart we reuse the snapshot serialized by some other Walsender at a location prior to Prepare and if that happens then this time the prepare won't be skipped due to snapshot state (SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering we have already shipped some of the later commits but not prepare). Now, this will actually become the same situation where the restart has happened after we have sent the prepare but not commit. This is the reason we have to resend the prepare when the subscriber restarts between prepare and commit. You can reproduce the case where we can't distinguish between two situations by using the test case in twophase_snapshot.spec and additionally starting a separate session via the debugger. So, the steps in the test case are as below: "s2b" "s2txid" "s1init" "s3b" "s3txid" "s2c" "s2b" "s2insert" "s2p" "s3c" "s1insert" "s1start" "s2cp" "s1start" Define new steps as "s4init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot_1', 'test_decoding');} "s4start" {SELECT data FROM pg_logical_slot_get_changes('isolation_slot_1', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');} The first thing we need to do is s4init and stop the debugger in SnapBuildProcessRunningXacts. Now perform steps from 's2b' till first 's1start' in twophase_snapshot.spec. Then continue in the s4 session and perform s4start. After this, if you debug (or add the logs) the second s1start, you will notice that we are skipping prepare not because of inconsistent snapshot but a forward location in start_decoding_at. If you don't involve session-4, then it will always skip prepare due to an inconsistent snapshot state. This involves a debugger so not easy to write an automated test for it. I have used a bit tricky scenario to explain this but not sure if there was any other simpler way. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BLvkeX%3DB3xon7RcBwD4CVaFSryPj3pTBAALrDxQVPDwA%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: