Thread: Found issues related with logical replication and 2PC
Hi hackers, While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1], we found some issues related with logical replication and two_phase. I think this can happen not only HEAD but PG14+, but for now I shared patches for HEAD. Issue #1 When handling a PREPARE message, the subscriber mistook the wrong lsn position (the end position of the last commit) as the end position of the current prepare. This can be fixed by adding a new global variable to record the end position of the last prepare. 0001 patch fixes the issue. Issue #2 When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0 and a transaction is prepared on the publisher, the apply worker reports an ERROR on the subscriber. After that, the prepared transaction is not replayed, which means it's lost forever. Attached script can emulate the situation. -- ERROR: prepared transactions are disabled HINT: Set "max_prepared_transactions" to a nonzero value. -- The reason is that we advanced the origin progress when aborting the transaction as well (RecordTransactionAbort->replorigin_session_advance). So, after setting replorigin_session_origin_lsn, if any ERROR happens when preparing the transaction, the transaction aborts which incorrectly advances the origin lsn. An easiest fix is to reset session replication origin before calling the RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop() raises an ERROR or 2) apply worker exits. 0002 patch fixes the issue. How do you think? [1]: https://www.postgresql.org/message-id/flat/8fab8-65d74c80-1-2f28e880@39088166 Best regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Hi hackers, > > While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1], > we found some issues related with logical replication and two_phase. I think this > can happen not only HEAD but PG14+, but for now I shared patches for HEAD. > > Issue #1 > > When handling a PREPARE message, the subscriber mistook the wrong lsn position > (the end position of the last commit) as the end position of the current prepare. > This can be fixed by adding a new global variable to record the end position of > the last prepare. 0001 patch fixes the issue. Thanks for the patches. I have started reviewing this. I reviewed and tested patch001 alone. I have a query, shouldn't the local-lsn stored in apply_handle_commit_prepared() be the end position of 'COMMIT_PREPARED' instead of 'PREPARE'? I put additional logging on sub and got this: LOG: apply_handle_prepare - prepare_data.end_lsn: 0/15892E0 , XactLastPrepareEnd: 0/1537FD8. LOG: apply_handle_commit_prepared - prepare_data.end_lsn: 0/1589318 , XactLastPrepareEnd: 0/1537FD8. In apply_handle_prepare(), remote-lsn ('0/15892E0') is end position of 'PREPARE' and in apply_handle_commit_prepared(), remote-lsn ('0/1589318') is end position of 'COMMIT_PREPARED', while local-lsn in both cases is end-lsn of 'PREPARE'. Details at [1]. Shouldn't we use 'XactLastCommitEnd' in apply_handle_commit_prepared() which is the end position of last COMMIT_PREPARED? It is assigned in the below flow: apply_handle_commit_prepared-->CommitTransactionCommand...->RecordTransactionCommit? Please let me know if I have misunderstood. [1]: Pub: ------ SELECT * FROM pg_get_wal_record_info('0/15892E0'); start_lsn | end_lsn | prev_lsn | record_type ---------+-----------+-----------+----------------- 0/15892E0 | 0/1589318 | 0/15891E8 | COMMIT_PREPARED --see prev_lsn SELECT * FROM pg_get_wal_record_info('0/15891E8'); start_lsn | end_lsn | prev_lsn | record_type ---------+-----------+-----------+------------- 0/15891E8 | 0/15892E0 | 0/15891A8 | PREPARE SELECT * FROM pg_get_wal_record_info('0/1589318'); start_lsn | end_lsn | prev_lsn | record_type ---------+-----------+-----------+--------------- 0/1589318 | 0/1589350 | 0/15892E0 | RUNNING_XACTS --see prev_lsn SELECT * FROM pg_get_wal_record_info('0/15892E0'); start_lsn | end_lsn | prev_lsn | record_type ---------+-----------+-----------+----------------- 0/15892E0 | 0/1589318 | 0/15891E8 | COMMIT_PREPARED Sub: ------ SELECT * FROM pg_get_wal_record_info('0/1537FD8'); start_lsn | end_lsn | prev_lsn | record_type ---------+-----------+-----------+------------------ 0/1537FD8 | 0/1538030 | 0/1537ED0 | COMMIT_PREPARED --see prev_lsn: SELECT * FROM pg_get_wal_record_info('0/1537ED0'); start_lsn | end_lsn | prev_lsn |record_type ---------+-----------+-----------+------------ 0/1537ED0 | 0/1537FD8 | 0/1537E90 |PREPARE thanks Shveta
On Wed, Aug 7, 2024 at 12:38 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Issue #1 > > > > When handling a PREPARE message, the subscriber mistook the wrong lsn position > > (the end position of the last commit) as the end position of the current prepare. > > This can be fixed by adding a new global variable to record the end position of > > the last prepare. 0001 patch fixes the issue. > > Thanks for the patches. I have started reviewing this. I reviewed and > tested patch001 alone. > It makes sense. As both are different bugs we should discuss them separately. > I have a query, shouldn't the local-lsn stored in > apply_handle_commit_prepared() be the end position of > 'COMMIT_PREPARED' instead of 'PREPARE'? I put additional logging on > sub and got this: > > LOG: apply_handle_prepare - prepare_data.end_lsn: 0/15892E0 , > XactLastPrepareEnd: 0/1537FD8. > LOG: apply_handle_commit_prepared - prepare_data.end_lsn: 0/1589318 > , XactLastPrepareEnd: 0/1537FD8. > > In apply_handle_prepare(), remote-lsn ('0/15892E0') is end position of > 'PREPARE' and in apply_handle_commit_prepared(), remote-lsn > ('0/1589318') is end position of 'COMMIT_PREPARED', while local-lsn in > both cases is end-lsn of 'PREPARE'. Details at [1]. > > Shouldn't we use 'XactLastCommitEnd' in apply_handle_commit_prepared() > which is the end position of last COMMIT_PREPARED? It is assigned in > the below flow: > apply_handle_commit_prepared-->CommitTransactionCommand...->RecordTransactionCommit? > I also think so. Additionally, I feel a test case (or some description of the bug that can arise) should be provided for issue-1. -- With Regards, Amit Kapila.
On Wed, Aug 7, 2024 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I also think so. Additionally, I feel a test case (or some description > of the bug that can arise) should be provided for issue-1. > IIUC, the problem could be that we would end up updating the wrong local_end LSN in lsn_mappings via store_flush_position(). Then get_flush_position() could end up computing the wrong flush position and send the confirmation of flush to the publisher even when it is not flushed. This ideally could lead to a situation where the prepared transaction is not flushed to disk on the subscriber and because publisher would have gotten the confirmation earlier than required, it won't send the prepared transaction again. I think this theory is not true for prepare transactions because we always flush WAL of prepare even for asynchronous commit mode. See EndPrepare(). So, if my analysis is correct, this shouldn't be a bug and ideally, we should update local_end LSN as InvalidXLogRecPtr and add appropriate comments. -- With Regards, Amit Kapila.
On Wed, Aug 7, 2024 at 5:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 7, 2024 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I also think so. Additionally, I feel a test case (or some description > > of the bug that can arise) should be provided for issue-1. > > > > IIUC, the problem could be that we would end up updating the wrong > local_end LSN in lsn_mappings via store_flush_position(). Then > get_flush_position() could end up computing the wrong flush position > and send the confirmation of flush to the publisher even when it is > not flushed. This ideally could lead to a situation where the prepared > transaction is not flushed to disk on the subscriber and because > publisher would have gotten the confirmation earlier than required, it > won't send the prepared transaction again. Yes, that is what my understanding was. > I think this theory is not > true for prepare transactions because we always flush WAL of prepare > even for asynchronous commit mode. See EndPrepare(). Okay, I was not aware of this. Thanks for explaining. > So, if my > analysis is correct, this shouldn't be a bug and ideally, we should > update local_end LSN as InvalidXLogRecPtr and add appropriate > comments. Okay, we can do that. Then get_flush_position() can also be changed to *explicitly* deal with the case where local_end is InvalidXLogRecPtr. Having said that, even though it is not a bug, shouldn't we still have the correct mapping updated in lsn_mapping? When remote_end is PREPARE Or COMMIT_PREPARED, local_end should also point to the same? thanks Shveta
On Thu, Aug 8, 2024 at 8:54 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 7, 2024 at 5:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > So, if my > > analysis is correct, this shouldn't be a bug and ideally, we should > > update local_end LSN as InvalidXLogRecPtr and add appropriate > > comments. > > Okay, we can do that. Then get_flush_position() can also be changed to > *explicitly* deal with the case where local_end is InvalidXLogRecPtr. > AFAICS, it should be handled without any change as the value of InvalidXLogRecPtr is 0. So, it should be less than equal to the local_flush position. > Having said that, even though it is not a bug, shouldn't we still have > the correct mapping updated in lsn_mapping? When remote_end is PREPARE > Or COMMIT_PREPARED, local_end should also point to the same? > Ideally yes, but introducing a new global variable just for this purpose doesn't sound advisable. We can add in comments that in the future, if adding such a variable serves some purpose then we can surely extend the functionality. -- With Regards, Amit Kapila.
On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1], > we found some issues related with logical replication and two_phase. I think this > can happen not only HEAD but PG14+, but for now I shared patches for HEAD. > > Issue #1 > > When handling a PREPARE message, the subscriber mistook the wrong lsn position > (the end position of the last commit) as the end position of the current prepare. > This can be fixed by adding a new global variable to record the end position of > the last prepare. 0001 patch fixes the issue. > > Issue #2 > > When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0 > and a transaction is prepared on the publisher, the apply worker reports an ERROR > on the subscriber. After that, the prepared transaction is not replayed, which > means it's lost forever. Attached script can emulate the situation. > > -- > ERROR: prepared transactions are disabled > HINT: Set "max_prepared_transactions" to a nonzero value. > -- > > The reason is that we advanced the origin progress when aborting the > transaction as well (RecordTransactionAbort->replorigin_session_advance). So, > after setting replorigin_session_origin_lsn, if any ERROR happens when preparing > the transaction, the transaction aborts which incorrectly advances the origin lsn. > > An easiest fix is to reset session replication origin before calling the > RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop() > raises an ERROR or 2) apply worker exits. 0002 patch fixes the issue. > Can we start a separate thread to issue 2? I understand that this one is also related to two_phase but since both are different issues it is better to discuss in separate threads. This will also help us refer to the discussion in future if required. BTW, why did the 0002 patch change the below code: --- a/src/include/replication/worker_internal.h +++ b/src/include/replication/worker_internal.h @@ -164,7 +164,8 @@ typedef struct ParallelApplyWorkerShared /* * XactLastCommitEnd or XactLastPrepareEnd from the parallel apply worker. - * This is required by the leader worker so it can update the lsn_mappings. + * This is required by the leader worker so it can update the + * lsn_mappings. */ XLogRecPtr last_commit_end; -- With Regards, Amit Kapila.
Dear Amit, > Can we start a separate thread to issue 2? I understand that this one > is also related to two_phase but since both are different issues it is > better to discuss in separate threads. This will also help us refer to > the discussion in future if required. You are right, we should discuss one topic per thread. Forked: [1]. > BTW, why did the 0002 patch change the below code: > --- a/src/include/replication/worker_internal.h > +++ b/src/include/replication/worker_internal.h > @@ -164,7 +164,8 @@ typedef struct ParallelApplyWorkerShared > > /* > * XactLastCommitEnd or XactLastPrepareEnd from the parallel apply worker. > - * This is required by the leader worker so it can update the lsn_mappings. > + * This is required by the leader worker so it can update the > + * lsn_mappings. > */ > XLogRecPtr last_commit_end; > Opps. Fixed version is posted in [1]. [1]: https://www.postgresql.org/message-id/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92%40TYAPR01MB5692.jpnprd01.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
On Thu, Aug 8, 2024 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 8, 2024 at 8:54 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 7, 2024 at 5:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > So, if my > > > analysis is correct, this shouldn't be a bug and ideally, we should > > > update local_end LSN as InvalidXLogRecPtr and add appropriate > > > comments. > > > > Okay, we can do that. Then get_flush_position() can also be changed to > > *explicitly* deal with the case where local_end is InvalidXLogRecPtr. > > > > AFAICS, it should be handled without any change as the value of > InvalidXLogRecPtr is 0. So, it should be less than equal to the > local_flush position. Yes, existing code will work, no doubt about that. But generally we explictly use XLogRecPtrIsInvalid if we need to include or exclude lsn=0 in some logic. We do not consider 0 lsn for comparisons like this which we currently have in get_flush_position. Thus stated for an explicit check. But, yes, the current code will work. > > Having said that, even though it is not a bug, shouldn't we still have > > the correct mapping updated in lsn_mapping? When remote_end is PREPARE > > Or COMMIT_PREPARED, local_end should also point to the same? > > > > Ideally yes, but introducing a new global variable just for this > purpose doesn't sound advisable. We can add in comments that in the > future, if adding such a variable serves some purpose then we can > surely extend the functionality. Okay. Sounds reasonable. thanks Shveta
Dear Amit, Shveta, Thanks for discussing! I reported the issue because 1) I feared the risk of data loss and 2) simply because the coding looked incorrect. However, per discussion, I understood that it wouldn't lead to loss, and adding a global variable was unacceptable in this case. I modified the patch completely. The attached patch avoids using the LastCommitLSN as the local_lsn while applying PREPARE. get_flush_position() was not changed. Also, it contains changes that have not been discussed yet: - Set last_commit_end to InvaldXLogPtr in the PREPARE case. This causes the same result as when the stream option is not "parallel." - XactLastCommitEnd was replaced even ROLLBACK PREPARED case. Since the COMMIT PREPARED record is flushed in RecordTransactionAbortPrepared(), there is no need to ensure the WAL must be sent. Best regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Thu, Aug 8, 2024 at 2:37 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Thanks for discussing! > > I reported the issue because 1) I feared the risk of data loss and 2) simply > because the coding looked incorrect. However, per discussion, I understood that > it wouldn't lead to loss, and adding a global variable was unacceptable in this > case. I modified the patch completely. > > The attached patch avoids using the LastCommitLSN as the local_lsn while applying > PREPARE. get_flush_position() was not changed. Also, it contains changes that > have not been discussed yet: > > - Set last_commit_end to InvaldXLogPtr in the PREPARE case. > This causes the same result as when the stream option is not "parallel." > - XactLastCommitEnd was replaced even ROLLBACK PREPARED case. > Since the COMMIT PREPARED record is flushed in RecordTransactionAbortPrepared(), > there is no need to ensure the WAL must be sent. > The code changes look mostly good to me. I have changed/added a few comments in the attached modified version. -- With Regards, Amit Kapila.
Attachment
On Thu, Aug 8, 2024 at 5:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 8, 2024 at 2:37 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Thanks for discussing! > > > > I reported the issue because 1) I feared the risk of data loss and 2) simply > > because the coding looked incorrect. However, per discussion, I understood that > > it wouldn't lead to loss, and adding a global variable was unacceptable in this > > case. I modified the patch completely. > > > > The attached patch avoids using the LastCommitLSN as the local_lsn while applying > > PREPARE. get_flush_position() was not changed. Also, it contains changes that > > have not been discussed yet: > > > > - Set last_commit_end to InvaldXLogPtr in the PREPARE case. > > This causes the same result as when the stream option is not "parallel." > > - XactLastCommitEnd was replaced even ROLLBACK PREPARED case. > > Since the COMMIT PREPARED record is flushed in RecordTransactionAbortPrepared(), > > there is no need to ensure the WAL must be sent. > > > > The code changes look mostly good to me. I have changed/added a few > comments in the attached modified version. > Code changes with Amit's correction patch look good to me. thanks Shveta
Dear Amit, > > The code changes look mostly good to me. I have changed/added a few > comments in the attached modified version. > Thanks for updating the patch! It LGTM. I've tested your patch and confirmed it did not cause the data loss. I used the source which was applied v3 and additional fix to visualize the replication command [1]. Method ====== 1. Construct a logical replication system with two_phase = true and synchronous_commit = false 2. attach a walwriter of the subscriber to stop the process 3. Start a transaction and prepare it for the publisher. 4. Wait until the worker replies to the publisher. 5. Stop the subscriber 6. Restart subscriber. 7. Do COMMIT PREPARED Attached script can construct the same situation. Result ====== After the step 5, I ran pg_waldump and confirmed PREPARE record existed on the subscriber. ``` $ pg_waldump data_sub/pg_wal/000000010000000000000001 ... rmgr: Transaction len..., desc: PREPARE gid pg_gid_16389_741: ... rmgr: XLOG len..., desc: CHECKPOINT_SHUTDOWN ... ``` Also, after the step 7, I confirmed that only the COMMIT PREPARED record was sent because log output the below line. "75" means the ASCII character 'K'; this indicated that the replication message corresponded to COMMIT PREPARED. ``` LOG: XXX got message 75 ``` Additionally, I did another test, which is basically same as above but 1) XLogFlush() in EndPrepare() was commented out and 2) kill -9 was used at step 5 to emulate a crash. Since the PREPAREd transaction cannot survive on the subscriber in this case, so COMMIT PREPARED command on publisher causes an ERROR on the subscriber. ``` ERROR: prepared transaction with identifier "pg_gid_16389_741" does not exist CONTEXT: processing remote data for replication origin "pg_16389" during message type "COMMIT PREPARED" in transaction 741, finished at 0/15463C0 ``` I think this shows that the backend process can ensure the WAL is persisted so data loss won't occur. [1]: ``` @@ -3297,6 +3297,8 @@ apply_dispatch(StringInfo s) saved_command = apply_error_callback_arg.command; apply_error_callback_arg.command = action; + elog(LOG, "XXX got message %d", action); ``` Best regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On Fri, Aug 9, 2024 at 10:34 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > > > The code changes look mostly good to me. I have changed/added a few > > comments in the attached modified version. > > > > Thanks for updating the patch! It LGTM. I've tested your patch and confirmed > it did not cause the data loss. > Thanks for the additional testing. I have pushed this patch. -- With Regards, Amit Kapila.