Thread: Assertion failure with replication origins and PREPARE TRANSACTIOn

Assertion failure with replication origins and PREPARE TRANSACTIOn

From
Michael Paquier
Date:
Hi all,
(CCing some folks who worked on this area lately)

The following sequence of commands generates an assertion failure, as
of $subject:
select pg_replication_origin_create('popo');
select pg_replication_origin_session_setup('popo');
begin;
select txid_current();
prepare transaction 'popo'; -- assertion fails

The problem originates from 1eb6d65, down to 11, where we finish by
triggering this assertion before replorigin_session_origin_lsn is not
valid:
+   if (replorigin)
+   {
+       Assert(replorigin_session_origin_lsn != InvalidXLogRecPtr);
+       hdr->origin_lsn = replorigin_session_origin_lsn;
+       hdr->origin_timestamp = replorigin_session_origin_timestamp;
+   }

As far as I understand this code and based on the docs,
pg_replication_origin_xact_setup(), that would set of the session
origin LSN and timestamp, is an optional choise.
replorigin_session_advance() would be a no-op for remote_lsn, and
local_lsn requires an update.  Now please note that I am really fluent
with this stuff, so feel free to correct me.  The intention of the
code also seems that XACT_XINFO_HAS_ORIGIN should still be set, but
with no data.

At the end, it seems to me that the assertion could just be dropped as
per the attached, as we'd finish by setting up origin_lsn and
origin_timestamp in the 2PC file header with some invalid data.  The
else block could just be removed, but then one would need to guess
from origin.c that both replorigin_session_* may not be set.

I am not completely sure that this is the right move, though, as
pg_replication_origin_status has a *very* limited amount of tests.
Adding a test to replorigin.sql with 2PC transactions able to trigger
the problem is easy once we rely on a different slot to not influence
the existing tests, as the expected contents of
pg_replication_origin_status could be messed up in the follow-up
tests.

Thoughts?
--
Michael

Attachment

Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

From
Masahiko Sawada
Date:
On Mon, Dec 13, 2021 at 12:44 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
> (CCing some folks who worked on this area lately)
>
> The following sequence of commands generates an assertion failure, as
> of $subject:
> select pg_replication_origin_create('popo');
> select pg_replication_origin_session_setup('popo');
> begin;
> select txid_current();
> prepare transaction 'popo'; -- assertion fails
>
> The problem originates from 1eb6d65, down to 11, where we finish by
> triggering this assertion before replorigin_session_origin_lsn is not
> valid:
> +   if (replorigin)
> +   {
> +       Assert(replorigin_session_origin_lsn != InvalidXLogRecPtr);
> +       hdr->origin_lsn = replorigin_session_origin_lsn;
> +       hdr->origin_timestamp = replorigin_session_origin_timestamp;
> +   }
>
> As far as I understand this code and based on the docs,
> pg_replication_origin_xact_setup(), that would set of the session
> origin LSN and timestamp, is an optional choise.
> replorigin_session_advance() would be a no-op for remote_lsn, and
> local_lsn requires an update.  Now please note that I am really fluent
> with this stuff, so feel free to correct me.  The intention of the
> code also seems that XACT_XINFO_HAS_ORIGIN should still be set, but
> with no data.
>
> At the end, it seems to me that the assertion could just be dropped as
> per the attached, as we'd finish by setting up origin_lsn and
> origin_timestamp in the 2PC file header with some invalid data.

Why do we check if replorigin_session_origin_lsn is not invalid data
only when PREPARE TRANSACTION? Looking at commit and rollback code, we
don't have assertions or checks that check
replorigin_session_origin_lsn/timestamp is valid data. So it looks
like we accept also invalid data in those cases since
replorigin_advance doesn’t move LSN backward while applying a commit
or rollback record even if LSN is invalid. The same is true for
PREPARE records, i.g., even if replication origin LSN is invalid, it
doesn't go backward. If replication origin LSN and timestamp in commit
record and rollback record must be valid data too, I think we should
similar checks for commit and rollback code and I think the assertions
will fail in the case I reported before[1].

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAMuTrXezGqaV1Q5H-Hf%2BzKqGbU8XuCZk9iQMYueF3%2B8w%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

From
Michael Paquier
Date:
On Mon, Dec 13, 2021 at 04:30:36PM +0900, Masahiko Sawada wrote:
> Why do we check if replorigin_session_origin_lsn is not invalid data
> only when PREPARE TRANSACTION?

Well, it does not matter for the case of PREPARE TRANSACTION, does it?
we would include values for the the origin LSN and timestamp in
any case as these are fixed in the 2PC file header.

> Looking at commit and rollback code, we
> don't have assertions or checks that check
> replorigin_session_origin_lsn/timestamp is valid data. So it looks
> like we accept also invalid data in those cases since
> replorigin_advance doesn’t move LSN backward while applying a commit
> or rollback record even if LSN is invalid. The same is true for
> PREPARE records, i.g., even if replication origin LSN is invalid, it
> doesn't go backward. If replication origin LSN and timestamp in commit
> record and rollback record must be valid data too, I think we should
> similar checks for commit and rollback code and I think the assertions
> will fail in the case I reported before[1].

It seems to me that the origin LSN and timestamp are optional, so as
one may choose to not call pg_replication_origin_xact_setup() (as said
in my first message), and we would not require more sanity checks when
advancing the replication origin in the commit and rollback code
paths.  Let's see what others think here.
--
Michael

Attachment

Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

From
Amit Kapila
Date:
On Mon, Dec 13, 2021 at 2:00 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Dec 13, 2021 at 04:30:36PM +0900, Masahiko Sawada wrote:
> > Why do we check if replorigin_session_origin_lsn is not invalid data
> > only when PREPARE TRANSACTION?
>
> Well, it does not matter for the case of PREPARE TRANSACTION, does it?
> we would include values for the the origin LSN and timestamp in
> any case as these are fixed in the 2PC file header.
>
> > Looking at commit and rollback code, we
> > don't have assertions or checks that check
> > replorigin_session_origin_lsn/timestamp is valid data. So it looks
> > like we accept also invalid data in those cases since
> > replorigin_advance doesn’t move LSN backward while applying a commit
> > or rollback record even if LSN is invalid. The same is true for
> > PREPARE records, i.g., even if replication origin LSN is invalid, it
> > doesn't go backward. If replication origin LSN and timestamp in commit
> > record and rollback record must be valid data too, I think we should
> > similar checks for commit and rollback code and I think the assertions
> > will fail in the case I reported before[1].
>
> It seems to me that the origin LSN and timestamp are optional, so as
> one may choose to not call pg_replication_origin_xact_setup() (as said
> in my first message), and we would not require more sanity checks when
> advancing the replication origin in the commit and rollback code
> paths.
>

This is my understanding as well. I think here the point of Sawada-San
is why to have additional for replorigin_session_origin_lsn in prepare
code path? I think the way you have it in your patch is correct as
well but it is probably better to keep the check only based on
replorigin so as to keep this check consistent in all code paths.

--
With Regards,
Amit Kapila.



Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

From
Michael Paquier
Date:
On Mon, Dec 13, 2021 at 03:46:55PM +0530, Amit Kapila wrote:
> This is my understanding as well. I think here the point of Sawada-San
> is why to have additional for replorigin_session_origin_lsn in prepare
> code path? I think the way you have it in your patch is correct as
> well but it is probably better to keep the check only based on
> replorigin so as to keep this check consistent in all code paths.

Well, I don't think that it is a big deal one way or the other, as
we'd finish with InvalidXLogRecPtr for the LSN and 0 for the timestamp
anyway.  If both of you feel that just removing the assertion rather
than adding an extra check is better, that's fine by me :)
--
Michael

Attachment

Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

From
Michael Paquier
Date:
On Mon, Dec 13, 2021 at 07:53:43PM +0900, Michael Paquier wrote:
> Well, I don't think that it is a big deal one way or the other, as
> we'd finish with InvalidXLogRecPtr for the LSN and 0 for the timestamp
> anyway.  If both of you feel that just removing the assertion rather
> than adding an extra check is better, that's fine by me :)

Looked at that today, and done this way.  The tests have been extended
a bit more with one ROLLBACK and one ROLLBACK PREPARED, while checking
for the contents decoded.
--
Michael

Attachment