Thread: Found issues related with logical replication and 2PC

Found issues related with logical replication and 2PC

From
"Hayato Kuroda (Fujitsu)"
Date:
Hi hackers,

While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1],
we found some issues related with logical replication and two_phase. I think this
can happen not only HEAD but PG14+, but for now I shared patches for HEAD.

Issue #1

When handling a PREPARE message, the subscriber mistook the wrong lsn position
(the end position of the last commit) as the end position of the current prepare.
This can be fixed by adding a new global variable to record the end position of
the last prepare. 0001 patch fixes the issue.

Issue #2

When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0
and a transaction is prepared on the publisher, the apply worker reports an ERROR
on the subscriber. After that, the prepared transaction is not replayed, which
means it's lost forever. Attached script can emulate the situation.

--
ERROR:  prepared transactions are disabled
HINT:  Set "max_prepared_transactions" to a nonzero value.
--

The reason is that we advanced the origin progress when aborting the
transaction as well (RecordTransactionAbort->replorigin_session_advance). So,
after setting replorigin_session_origin_lsn, if any ERROR happens when preparing
the transaction, the transaction aborts which incorrectly advances the origin lsn.

An easiest fix is to reset session replication origin before calling the
RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
raises an ERROR or 2) apply worker exits. 0002 patch fixes the issue.


How do you think?

[1]: https://www.postgresql.org/message-id/flat/8fab8-65d74c80-1-2f28e880@39088166

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Found issues related with logical replication and 2PC

From
shveta malik
Date:
On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Hi hackers,
>
> While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1],
> we found some issues related with logical replication and two_phase. I think this
> can happen not only HEAD but PG14+, but for now I shared patches for HEAD.
>
> Issue #1
>
> When handling a PREPARE message, the subscriber mistook the wrong lsn position
> (the end position of the last commit) as the end position of the current prepare.
> This can be fixed by adding a new global variable to record the end position of
> the last prepare. 0001 patch fixes the issue.

Thanks for the patches. I have started reviewing this. I reviewed and
tested patch001 alone.

I have a query, shouldn't the local-lsn stored in
apply_handle_commit_prepared() be the end position of
'COMMIT_PREPARED' instead of 'PREPARE'? I put additional logging on
sub and got this:

LOG:  apply_handle_prepare - prepare_data.end_lsn:  0/15892E0 ,
XactLastPrepareEnd: 0/1537FD8.
LOG:  apply_handle_commit_prepared - prepare_data.end_lsn:  0/1589318
, XactLastPrepareEnd: 0/1537FD8.

In apply_handle_prepare(), remote-lsn ('0/15892E0') is end position of
'PREPARE' and in apply_handle_commit_prepared(), remote-lsn
('0/1589318') is end position of 'COMMIT_PREPARED', while local-lsn in
both cases is end-lsn of 'PREPARE'. Details at [1].

Shouldn't we use 'XactLastCommitEnd' in apply_handle_commit_prepared()
which is the end position of last COMMIT_PREPARED? It is assigned in
the below flow:
apply_handle_commit_prepared-->CommitTransactionCommand...->RecordTransactionCommit?

Please let me know if I have misunderstood.

[1]:

Pub:
------
SELECT * FROM pg_get_wal_record_info('0/15892E0');
start_lsn |  end_lsn  | prev_lsn  |   record_type
---------+-----------+-----------+-----------------
0/15892E0 | 0/1589318 | 0/15891E8 | COMMIT_PREPARED

--see prev_lsn
SELECT * FROM pg_get_wal_record_info('0/15891E8');
start_lsn |  end_lsn  | prev_lsn  | record_type
---------+-----------+-----------+-------------
0/15891E8 | 0/15892E0 | 0/15891A8 | PREPARE

SELECT * FROM pg_get_wal_record_info('0/1589318');
start_lsn |  end_lsn  | prev_lsn  |  record_type
---------+-----------+-----------+---------------
0/1589318 | 0/1589350 | 0/15892E0 | RUNNING_XACTS

--see prev_lsn
SELECT * FROM pg_get_wal_record_info('0/15892E0');
start_lsn |  end_lsn  | prev_lsn  |   record_type
---------+-----------+-----------+-----------------
0/15892E0 | 0/1589318 | 0/15891E8 | COMMIT_PREPARED


Sub:
------
SELECT * FROM pg_get_wal_record_info('0/1537FD8');
start_lsn |  end_lsn  | prev_lsn  |    record_type
---------+-----------+-----------+------------------
0/1537FD8 | 0/1538030 | 0/1537ED0 |  COMMIT_PREPARED

--see prev_lsn:
SELECT * FROM pg_get_wal_record_info('0/1537ED0');
start_lsn |  end_lsn  | prev_lsn  |record_type
---------+-----------+-----------+------------
0/1537ED0 | 0/1537FD8 | 0/1537E90 |PREPARE

thanks
Shveta



Re: Found issues related with logical replication and 2PC

From
Amit Kapila
Date:
On Wed, Aug 7, 2024 at 12:38 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Issue #1
> >
> > When handling a PREPARE message, the subscriber mistook the wrong lsn position
> > (the end position of the last commit) as the end position of the current prepare.
> > This can be fixed by adding a new global variable to record the end position of
> > the last prepare. 0001 patch fixes the issue.
>
> Thanks for the patches. I have started reviewing this. I reviewed and
> tested patch001 alone.
>

It makes sense. As both are different bugs we should discuss them separately.

> I have a query, shouldn't the local-lsn stored in
> apply_handle_commit_prepared() be the end position of
> 'COMMIT_PREPARED' instead of 'PREPARE'? I put additional logging on
> sub and got this:
>
> LOG:  apply_handle_prepare - prepare_data.end_lsn:  0/15892E0 ,
> XactLastPrepareEnd: 0/1537FD8.
> LOG:  apply_handle_commit_prepared - prepare_data.end_lsn:  0/1589318
> , XactLastPrepareEnd: 0/1537FD8.
>
> In apply_handle_prepare(), remote-lsn ('0/15892E0') is end position of
> 'PREPARE' and in apply_handle_commit_prepared(), remote-lsn
> ('0/1589318') is end position of 'COMMIT_PREPARED', while local-lsn in
> both cases is end-lsn of 'PREPARE'. Details at [1].
>
> Shouldn't we use 'XactLastCommitEnd' in apply_handle_commit_prepared()
> which is the end position of last COMMIT_PREPARED? It is assigned in
> the below flow:
> apply_handle_commit_prepared-->CommitTransactionCommand...->RecordTransactionCommit?
>

I also think so. Additionally, I feel a test case (or some description
of the bug that can arise) should be provided for issue-1.

--
With Regards,
Amit Kapila.



Re: Found issues related with logical replication and 2PC

From
Amit Kapila
Date:
On Wed, Aug 7, 2024 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I also think so. Additionally, I feel a test case (or some description
> of the bug that can arise) should be provided for issue-1.
>

IIUC, the problem could be that we would end up updating the wrong
local_end LSN in lsn_mappings via store_flush_position(). Then
get_flush_position() could end up computing the wrong flush position
and send the confirmation of flush to the publisher even when it is
not flushed. This ideally could lead to a situation where the prepared
transaction is not flushed to disk on the subscriber and because
publisher would have gotten the confirmation earlier than required, it
won't send the prepared transaction again. I think this theory is not
true for prepare transactions because we always flush WAL of prepare
even for asynchronous commit mode. See EndPrepare(). So, if my
analysis is correct, this shouldn't be a bug and ideally, we should
update local_end LSN as InvalidXLogRecPtr and add appropriate
comments.

--
With Regards,
Amit Kapila.



Re: Found issues related with logical replication and 2PC

From
shveta malik
Date:
On Wed, Aug 7, 2024 at 5:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I also think so. Additionally, I feel a test case (or some description
> > of the bug that can arise) should be provided for issue-1.
> >
>
> IIUC, the problem could be that we would end up updating the wrong
> local_end LSN in lsn_mappings via store_flush_position(). Then
> get_flush_position() could end up computing the wrong flush position
> and send the confirmation of flush to the publisher even when it is
> not flushed. This ideally could lead to a situation where the prepared
> transaction is not flushed to disk on the subscriber and because
> publisher would have gotten the confirmation earlier than required, it
> won't send the prepared transaction again.

Yes, that is what my understanding was.

> I think this theory is not
> true for prepare transactions because we always flush WAL of prepare
> even for asynchronous commit mode. See EndPrepare().

Okay, I was not aware of this. Thanks for explaining.

> So, if my
> analysis is correct, this shouldn't be a bug and ideally, we should
> update local_end LSN as InvalidXLogRecPtr and add appropriate
> comments.

Okay, we can do that. Then get_flush_position() can also be changed to
*explicitly* deal with the case where local_end is InvalidXLogRecPtr.
Having said that, even though it is not a bug, shouldn't we still have
the correct mapping updated in lsn_mapping? When remote_end is PREPARE
Or COMMIT_PREPARED, local_end should also point to the same?

thanks
Shveta



Re: Found issues related with logical replication and 2PC

From
Amit Kapila
Date:
On Thu, Aug 8, 2024 at 8:54 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 5:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > So, if my
> > analysis is correct, this shouldn't be a bug and ideally, we should
> > update local_end LSN as InvalidXLogRecPtr and add appropriate
> > comments.
>
> Okay, we can do that. Then get_flush_position() can also be changed to
> *explicitly* deal with the case where local_end is InvalidXLogRecPtr.
>

AFAICS, it should be handled without any change as the value of
InvalidXLogRecPtr is 0. So, it should be less than equal to the
local_flush position.

> Having said that, even though it is not a bug, shouldn't we still have
> the correct mapping updated in lsn_mapping? When remote_end is PREPARE
> Or COMMIT_PREPARED, local_end should also point to the same?
>

Ideally yes, but introducing a new global variable just for this
purpose doesn't sound advisable. We can add in comments that in the
future, if adding such a variable serves some purpose then we can
surely extend the functionality.

--
With Regards,
Amit Kapila.



Re: Found issues related with logical replication and 2PC

From
Amit Kapila
Date:
On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1],
> we found some issues related with logical replication and two_phase. I think this
> can happen not only HEAD but PG14+, but for now I shared patches for HEAD.
>
> Issue #1
>
> When handling a PREPARE message, the subscriber mistook the wrong lsn position
> (the end position of the last commit) as the end position of the current prepare.
> This can be fixed by adding a new global variable to record the end position of
> the last prepare. 0001 patch fixes the issue.
>
> Issue #2
>
> When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0
> and a transaction is prepared on the publisher, the apply worker reports an ERROR
> on the subscriber. After that, the prepared transaction is not replayed, which
> means it's lost forever. Attached script can emulate the situation.
>
> --
> ERROR:  prepared transactions are disabled
> HINT:  Set "max_prepared_transactions" to a nonzero value.
> --
>
> The reason is that we advanced the origin progress when aborting the
> transaction as well (RecordTransactionAbort->replorigin_session_advance). So,
> after setting replorigin_session_origin_lsn, if any ERROR happens when preparing
> the transaction, the transaction aborts which incorrectly advances the origin lsn.
>
> An easiest fix is to reset session replication origin before calling the
> RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
> raises an ERROR or 2) apply worker exits. 0002 patch fixes the issue.
>

Can we start a separate thread to issue 2? I understand that this one
is also related to two_phase but since both are different issues it is
better to discuss in separate threads. This will also help us refer to
the discussion in future if required.

BTW, why did the 0002 patch change the below code:
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -164,7 +164,8 @@ typedef struct ParallelApplyWorkerShared

  /*
  * XactLastCommitEnd or XactLastPrepareEnd from the parallel apply worker.
- * This is required by the leader worker so it can update the lsn_mappings.
+ * This is required by the leader worker so it can update the
+ * lsn_mappings.
  */
  XLogRecPtr last_commit_end;

--
With Regards,
Amit Kapila.



RE: Found issues related with logical replication and 2PC

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

> Can we start a separate thread to issue 2? I understand that this one
> is also related to two_phase but since both are different issues it is
> better to discuss in separate threads. This will also help us refer to
> the discussion in future if required.

You are right, we should discuss one topic per thread. Forked: [1].

> BTW, why did the 0002 patch change the below code:
> --- a/src/include/replication/worker_internal.h
> +++ b/src/include/replication/worker_internal.h
> @@ -164,7 +164,8 @@ typedef struct ParallelApplyWorkerShared
> 
>   /*
>   * XactLastCommitEnd or XactLastPrepareEnd from the parallel apply worker.
> - * This is required by the leader worker so it can update the lsn_mappings.
> + * This is required by the leader worker so it can update the
> + * lsn_mappings.
>   */
>   XLogRecPtr last_commit_end;
>

Opps. Fixed version is posted in [1].

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92%40TYAPR01MB5692.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Found issues related with logical replication and 2PC

From
shveta malik
Date:
On Thu, Aug 8, 2024 at 9:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 8:54 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 5:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > So, if my
> > > analysis is correct, this shouldn't be a bug and ideally, we should
> > > update local_end LSN as InvalidXLogRecPtr and add appropriate
> > > comments.
> >
> > Okay, we can do that. Then get_flush_position() can also be changed to
> > *explicitly* deal with the case where local_end is InvalidXLogRecPtr.
> >
>
> AFAICS, it should be handled without any change as the value of
> InvalidXLogRecPtr is 0. So, it should be less than equal to the
> local_flush position.

Yes, existing code will work, no doubt about that. But generally we
explictly use XLogRecPtrIsInvalid if we need to include or exclude
lsn=0 in some logic. We do not consider 0 lsn for comparisons like
this which we currently have in get_flush_position. Thus stated for an
explicit check. But, yes, the current code will work.

> > Having said that, even though it is not a bug, shouldn't we still have
> > the correct mapping updated in lsn_mapping? When remote_end is PREPARE
> > Or COMMIT_PREPARED, local_end should also point to the same?
> >
>
> Ideally yes, but introducing a new global variable just for this
> purpose doesn't sound advisable. We can add in comments that in the
> future, if adding such a variable serves some purpose then we can
> surely extend the functionality.

Okay. Sounds reasonable.

thanks
Shveta



RE: Found issues related with logical replication and 2PC

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit, Shveta,

Thanks for discussing!

I reported the issue because 1) I feared the risk of data loss and 2) simply
because the coding looked incorrect. However, per discussion, I understood that
it wouldn't lead to loss, and adding a global variable was unacceptable in this
case. I modified the patch completely.

The attached patch avoids using the LastCommitLSN as the local_lsn while applying
PREPARE. get_flush_position() was not changed. Also, it contains changes that
have not been discussed yet:

- Set last_commit_end to InvaldXLogPtr in the PREPARE case.
  This causes the same result as when the stream option is not "parallel."
- XactLastCommitEnd was replaced even ROLLBACK PREPARED case.
  Since the COMMIT PREPARED record is flushed in RecordTransactionAbortPrepared(),
  there is no need to ensure the WAL must be sent.


Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Found issues related with logical replication and 2PC

From
Amit Kapila
Date:
On Thu, Aug 8, 2024 at 2:37 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thanks for discussing!
>
> I reported the issue because 1) I feared the risk of data loss and 2) simply
> because the coding looked incorrect. However, per discussion, I understood that
> it wouldn't lead to loss, and adding a global variable was unacceptable in this
> case. I modified the patch completely.
>
> The attached patch avoids using the LastCommitLSN as the local_lsn while applying
> PREPARE. get_flush_position() was not changed. Also, it contains changes that
> have not been discussed yet:
>
> - Set last_commit_end to InvaldXLogPtr in the PREPARE case.
>   This causes the same result as when the stream option is not "parallel."
> - XactLastCommitEnd was replaced even ROLLBACK PREPARED case.
>   Since the COMMIT PREPARED record is flushed in RecordTransactionAbortPrepared(),
>   there is no need to ensure the WAL must be sent.
>

The code changes look mostly good to me. I have changed/added a few
comments in the attached modified version.

--
With Regards,
Amit Kapila.

Attachment

Re: Found issues related with logical replication and 2PC

From
shveta malik
Date:
On Thu, Aug 8, 2024 at 5:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 2:37 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Thanks for discussing!
> >
> > I reported the issue because 1) I feared the risk of data loss and 2) simply
> > because the coding looked incorrect. However, per discussion, I understood that
> > it wouldn't lead to loss, and adding a global variable was unacceptable in this
> > case. I modified the patch completely.
> >
> > The attached patch avoids using the LastCommitLSN as the local_lsn while applying
> > PREPARE. get_flush_position() was not changed. Also, it contains changes that
> > have not been discussed yet:
> >
> > - Set last_commit_end to InvaldXLogPtr in the PREPARE case.
> >   This causes the same result as when the stream option is not "parallel."
> > - XactLastCommitEnd was replaced even ROLLBACK PREPARED case.
> >   Since the COMMIT PREPARED record is flushed in RecordTransactionAbortPrepared(),
> >   there is no need to ensure the WAL must be sent.
> >
>
> The code changes look mostly good to me. I have changed/added a few
> comments in the attached modified version.
>

Code changes with Amit's correction patch look good to me.

thanks
Shveta



RE: Found issues related with logical replication and 2PC

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Amit,

>
> The code changes look mostly good to me. I have changed/added a few
> comments in the attached modified version.
>

Thanks for updating the patch! It LGTM. I've tested your patch and confirmed
it did not cause the data loss. I used the source which was applied v3 and additional
fix to visualize the replication command [1].

Method
======

1. Construct a logical replication system with two_phase = true and
   synchronous_commit = false
2. attach a walwriter of the subscriber to stop the process
3. Start a transaction and prepare it for the publisher.
4. Wait until the worker replies to the publisher.
5. Stop the subscriber
6. Restart subscriber.
7. Do COMMIT PREPARED

Attached script can construct the same situation.

Result
======

After the step 5, I ran pg_waldump and confirmed PREPARE record existed on
the subscriber.

```
$ pg_waldump data_sub/pg_wal/000000010000000000000001
...
rmgr: Transaction len..., desc: PREPARE gid pg_gid_16389_741: ...
rmgr: XLOG        len..., desc: CHECKPOINT_SHUTDOWN ...
```

Also, after the step 7, I confirmed that only the COMMIT PREPARED record
was sent because log output the below line. "75" means the ASCII character 'K';
this indicated that the replication message corresponded to COMMIT PREPARED.
```
LOG:  XXX got message 75
```



Additionally, I did another test, which is basically same as above but 1) XLogFlush()
in EndPrepare() was commented out and 2) kill -9 was used at step 5 to emulate a
crash. Since the PREPAREd transaction cannot survive on the subscriber in this case,
so COMMIT PREPARED command on publisher causes an ERROR on the subscriber.
```
ERROR:  prepared transaction with identifier "pg_gid_16389_741" does not exist
CONTEXT:  processing remote data for replication origin "pg_16389" during message
            type "COMMIT PREPARED" in transaction 741, finished at 0/15463C0
```
I think this shows that the backend process can ensure the WAL is persisted so data loss
won't occur.


[1]:
```
@@ -3297,6 +3297,8 @@ apply_dispatch(StringInfo s)
     saved_command = apply_error_callback_arg.command;
     apply_error_callback_arg.command = action;
 
+    elog(LOG, "XXX got message %d", action);
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: Found issues related with logical replication and 2PC

From
Amit Kapila
Date:
On Fri, Aug 9, 2024 at 10:34 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> >
> > The code changes look mostly good to me. I have changed/added a few
> > comments in the attached modified version.
> >
>
> Thanks for updating the patch! It LGTM. I've tested your patch and confirmed
> it did not cause the data loss.
>

Thanks for the additional testing. I have pushed this patch.

--
With Regards,
Amit Kapila.