Thread: Question about initial logical decoding snapshot
Hi hackers.
I'm studying the source code about creation of initial logical decoding snapshot. What confused me is that why must we process 3 xl_running_xacts before we get to the consistent state. I think we only need 2 xl_running_xacts.
I think we can get to consistent state when we meet the 2nd xl_running_xact with its oldestRunningXid > 1st xl_running_xact's nextXid, this means the active transactions in 1st xl_running_xact all had commited, and we have all the logs of transactions who will commit afterwards, so there is consistent state in this time point and we can export a snapshot.
I had read the discussion in [0] and the comment of commit '955a684', but I haven't got a detailed explanation about why we need 4 stages during creation of initial logical decoding snapshot but not 3 stages.
My rencent job is relevant to logical decoding so I want to figure this problem out, I'm very grateful if you can answer me, thanks.
[0] https://www.postgresql.org/message-id/flat/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
--
Best regards
Chong Wang
Greenplum DataFlow team
On Fri, Dec 30, 2022 at 11:57 PM Chong Wang <chongwa@vmware.com> wrote: > > I'm studying the source code about creation of initial logical decoding snapshot. What confused me is that why must weprocess 3 xl_running_xacts before we get to the consistent state. I think we only need 2 xl_running_xacts. > > I think we can get to consistent state when we meet the 2nd xl_running_xact with its oldestRunningXid > 1st xl_running_xact'snextXid, this means the active transactions in 1st xl_running_xact all had commited, and we have all thelogs of transactions who will commit afterwards, so there is consistent state in this time point and we can export a snapshot. > Yeah, we will have logs for all transactions in such a case but I think we won't have a valid snapshot by that time. Consider a case that there are two transactions 723,724 in the 2nd xl_running_xact record for which we have waited to finish and then consider that point as a consistent point and exported that snapshot. It is quite possible that by that time the commit record of one or more of those xacts (say 724) wouldn't have been encountered by decoding process and that means it won't be recorded in the xip list of the snapshot (we do that in DecodeCommit->SnapBuildCommitTxn). So, during export in function SnapBuildInitialSnapshot(), we will consider 723 as committed and 724 as running. This could not lead to inconsistent data on the client side that imports such a snapshot and use it for copy and further replicating the other xacts. OTOH, currently, before marking snapshot state as consistent we wait for these xacts to finish and for another xl_running_xact where oldestRunningXid >= builder->next_phase_at to appear which means the commit for both 723 and 724 would have appeared in the snapshot. Does that makes sense to you or am, I missing something here? -- With Regards, Amit Kapila.
On Tue, Jan 3, 2023 at 4:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 30, 2022 at 11:57 PM Chong Wang <chongwa@vmware.com> wrote: > > > > I'm studying the source code about creation of initial logical decoding snapshot. What confused me is that why must weprocess 3 xl_running_xacts before we get to the consistent state. I think we only need 2 xl_running_xacts. > > > > I think we can get to consistent state when we meet the 2nd xl_running_xact with its oldestRunningXid > 1st xl_running_xact'snextXid, this means the active transactions in 1st xl_running_xact all had commited, and we have all thelogs of transactions who will commit afterwards, so there is consistent state in this time point and we can export a snapshot. > > > > Yeah, we will have logs for all transactions in such a case but I > think we won't have a valid snapshot by that time. Consider a case > that there are two transactions 723,724 in the 2nd xl_running_xact > record for which we have waited to finish and then consider that point > as a consistent point and exported that snapshot. It is quite possible > that by that time the commit record of one or more of those xacts (say > 724) wouldn't have been encountered by decoding process and that means > it won't be recorded in the xip list of the snapshot (we do that in > DecodeCommit->SnapBuildCommitTxn). So, during export in function > SnapBuildInitialSnapshot(), we will consider 723 as committed and 724 > as running. This could not lead to inconsistent data on the client > side that imports such a snapshot and use it for copy and further > replicating the other xacts. > > OTOH, currently, before marking snapshot state as consistent we wait > for these xacts to finish and for another xl_running_xact where > oldestRunningXid >= builder->next_phase_at to appear which means the > commit for both 723 and 724 would have appeared in the snapshot. > > Does that makes sense to you or am, I missing something here? > You can also refer to the discussion in the thread [1] which is related to your question. [1] - https://www.postgresql.org/message-id/c94be044-818f-15e3-1ad3-7a7ae2dfed0a%40iki.fi -- With Regards, Amit Kapila.
Hello, I was curious as to why we need 3rd running_xact and wanted to learn more about it, so I have made a few changes to come up with a patch which builds the snapshot in 2 running_xacts. The motive is to run the tests to see the failures/issues with this approach to understand the need of reading 3rd running_xact to build a consistent snapshot. On this patch, I have got one test-failure which is test_decoding/twophase_snapshot. Approach: When we start building a snapshot, on the occurrence of first running_xact, move the state from START to BUILDING and wait for all in-progress transactions to finish. On the second running_xact where we find oldestRunningXid >= 1st xl_running_xact's nextXid, move to CONSISTENT state. So, it means all the transactions started before BUILDING state are now finished and all the new transactions that are currently in progress are the ones that are started after BUILDING state and thus have enough info to be decoded. Failure analysis for twophase_snapshot test: After the patch application, test-case fails because slot is created sooner and 'PREPARE TRANSACTION test1' is available as result of first 'pg_logical_slot_get_changes' itself. Intent of this testcase is to see how two-phase txn is handled when snapshot-build completes in 3 stages (BUILDING-->FULL-->CONSISTENT). Originally, the PREPARED txn is started between FULL and CONSISTENT stage and thus as per the current code logic, 'DecodePrepare' will skip it. Please see code in DecodePrepare: /* We can't start streaming unless a consistent state is reached. */ if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT) { ReorderBufferSkipPrepare(ctx->reorder, xid); return; } So first 'pg_logical_slot_get_changes' will not show these changes. Once we do 'commit prepared' after CONSISTENT state is reached, it will be available for next 'pg_logical_slot_get_changes' to consume. On the other hand, after the current patch, since we reach consistent state sooner, so with the same test-case, PREPARED transaction now ends up starting after CONSISTENT state and thus will be available to be consumed by first 'pg_logical_slot_get_changes' itself. This makes the testcase to fail. Please note that in the patch, I have maintained 'WAIT for all running transactions to end' even after reaching CONSISTENT state. I have tried running tests even after removing that WAIT after CONSISTENT, with that, we get one more test failure which is test_decoding/ondisk_startup. The reason for failure here is the same as previous case i.e., since we reach CONSISTENT state earlier, slot-creation finishes faster and thus we see slight change in result for this test. ('step s1init completed' seen earlier in log file). Both the failing tests here are written in such a way that they align with the 3-phase snapshot build process. Otherwise, I do not see any logical issues yet with this approach based on the test-cases available so far. So, I still have not gotten clarity on why we need 3rd running_xact here. In code, I see a comment in SnapBuildFindSnapshot() which says "c) ...But for older running transactions no viable snapshot exists yet, so CONSISTENT will only be reached once all of those have finished." This comment refers to txns started between BUILDING and FULL state. I do not understand it fully. I am not sure what tests I need to run on the patch to reproduce this issue where we do not have a viable snapshot when we go by two running_xacts only. Any thoughts/comments are most welcome. Attached the patch for review. Thanks Shveta On Fri, Dec 30, 2022 at 11:57 PM Chong Wang <chongwa@vmware.com> wrote: > > Hi hackers. > > I'm studying the source code about creation of initial logical decoding snapshot. What confused me is that why must weprocess 3 xl_running_xacts before we get to the consistent state. I think we only need 2 xl_running_xacts. > > I think we can get to consistent state when we meet the 2nd xl_running_xact with its oldestRunningXid > 1st xl_running_xact'snextXid, this means the active transactions in 1st xl_running_xact all had commited, and we have all thelogs of transactions who will commit afterwards, so there is consistent state in this time point and we can export a snapshot. > > I had read the discussion in [0] and the comment of commit '955a684', but I haven't got a detailed explanation about whywe need 4 stages during creation of initial logical decoding snapshot but not 3 stages. > > My rencent job is relevant to logical decoding so I want to figure this problem out, I'm very grateful if you can answerme, thanks. > > > > [0] https://www.postgresql.org/message-id/flat/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com > > > > -- > > Best regards > > Chong Wang > > Greenplum DataFlow team