Assertion failure with replication origins and PREPARE TRANSACTIOn - Mailing list pgsql-hackers

From Michael Paquier
Subject Assertion failure with replication origins and PREPARE TRANSACTIOn
Date
Msg-id YbbBfNSvMm5nIINV@paquier.xyz
Whole thread Raw
Responses Re: Assertion failure with replication origins and PREPARE TRANSACTIOn  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi all,
(CCing some folks who worked on this area lately)

The following sequence of commands generates an assertion failure, as
of $subject:
select pg_replication_origin_create('popo');
select pg_replication_origin_session_setup('popo');
begin;
select txid_current();
prepare transaction 'popo'; -- assertion fails

The problem originates from 1eb6d65, down to 11, where we finish by
triggering this assertion before replorigin_session_origin_lsn is not
valid:
+   if (replorigin)
+   {
+       Assert(replorigin_session_origin_lsn != InvalidXLogRecPtr);
+       hdr->origin_lsn = replorigin_session_origin_lsn;
+       hdr->origin_timestamp = replorigin_session_origin_timestamp;
+   }

As far as I understand this code and based on the docs,
pg_replication_origin_xact_setup(), that would set of the session
origin LSN and timestamp, is an optional choise.
replorigin_session_advance() would be a no-op for remote_lsn, and
local_lsn requires an update.  Now please note that I am really fluent
with this stuff, so feel free to correct me.  The intention of the
code also seems that XACT_XINFO_HAS_ORIGIN should still be set, but
with no data.

At the end, it seems to me that the assertion could just be dropped as
per the attached, as we'd finish by setting up origin_lsn and
origin_timestamp in the 2PC file header with some invalid data.  The
else block could just be removed, but then one would need to guess
from origin.c that both replorigin_session_* may not be set.

I am not completely sure that this is the right move, though, as
pg_replication_origin_status has a *very* limited amount of tests.
Adding a test to replorigin.sql with 2PC transactions able to trigger
the problem is easy once we rely on a different slot to not influence
the existing tests, as the expected contents of
pg_replication_origin_status could be messed up in the follow-up
tests.

Thoughts?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side