Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: logical decoding and replication of sequences
Date
Msg-id CAA4eK1LSUY0XF5r8jZRVB1Ug=kfnFWt9f5eEThRzWD0tFZi5HA@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Sat, Apr 2, 2022 at 5:47 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 4/1/22 17:02, Tomas Vondra wrote:
>
> So, I investigated this a bit more, and I wrote a couple test_decoding
> isolation tests (patch attached) demonstrating the issue. Actually, I
> should say "issues" because it's a bit worse than you described ...
>
> The whole problem is in this chunk of code in sequence_decode():
>
>
>   /* Skip the change if already processed (per the snapshot). */
>   if (transactional &&
>       !SnapBuildProcessChange(builder, xid, buf->origptr))
>       return;
>   else if (!transactional &&
>            (SnapBuildCurrentState(builder) != SNAPBUILD_CONSISTENT ||
>             SnapBuildXactNeedsSkip(builder, buf->origptr)))
>       return;
>
>   /* Queue the increment (or send immediately if not transactional). */
>   snapshot = SnapBuildGetOrBuildSnapshot(builder, xid);
>   ReorderBufferQueueSequence(ctx->reorder, xid, snapshot, buf->endptr,
>                              origin_id, target_node, transactional,
>                              xlrec->created, tuplebuf);
>
> With the script you described, the increment is non-transactional, so we
> end up in the second branch, return and thus discard the increment.
>
> But it's also possible the change is transactional, which can only
> trigger the first branch. But it does not, so we start building the
> snapshot. But the first thing SnapBuildGetOrBuildSnapshot does is
>
>    Assert(builder->state == SNAPBUILD_CONSISTENT);
>
> and we're still not in a consistent snapshot, so it just crashes and
> burn :-(
>
> The sequences.spec file has two definitions of s2restart step, one empty
> (resulting in non-transactional change), one with ALTER SEQUENCE (which
> means the change will be transactional).
>
>
> The really "funny" thing is this is not new code - this is an exact copy
> from logicalmsg_decode(), and logical messages have all those issues
> too. We may discard some messages, trigger the same Assert, etc. There's
> a messages2.spec demonstrating this (s2message step defines whether the
> message is transactional or not).
>

It seems to me that the Assert in SnapBuildGetOrBuildSnapshot() is
wrong. It is required only for non-transactional logical messages. For
transactional message(s), we decide at the commit time whether the
snapshot has reached a consistent state and then decide whether to
skip the entire transaction or not. So, the possible fix for Assert
could be that we pass an additional parameter 'transactional' to
SnapBuildGetOrBuildSnapshot() and then assert only when it is false. I
have also checked the development thread for this work and it appears
to be introduced for non-transactional cases only. See email[1], this
new function and Assert was for the non-transactional case and later
while rearranging the code, this problem got introduced.

Now, for the non-transactional cases, I am not sure if there is a
one-to-one mapping with the sequence case. The way sequences are dealt
with on the subscriber-side (first we copy initial data and then
replicate the incremental changes) appears more as we deal with the
table and its incremental changes. There is some commonality with
non-transactional messages w.r.t the case where we want sequence
changes to be sent even on rollbacks unless some DDL has happened for
them but if we see the overall solution it doesn't appear that we can
use it similar to messages. I think this is the reason we are facing
the other problems w.r.t to syncing sequences to subscribers including
the problem reported by Sawada-San yesterday.

Now, the particular case where we won't send a non-transactional
logical message unless the snapshot is consistent could be considered
as its behavior and may be documented better. I am not very sure about
this as there is no example of the way sync for these messages happens
in the core but if someone outside the core wants a different behavior
and presents the case then we can probably try to enhance it. I feel
the same is not true for sequences because it can cause the replica
(subscriber) to go out of sync with the master (publisher).

> So I guess we need to fix both places, perhaps in a similar way.
>

It depends but I think for logical messages we should do the minimal
fix required for Asserts and probably document the current behavior
bit better unless we think there is a case to make it behave similar
to sequences.


[1] - https://www.postgresql.org/message-id/56D4B3AD.5000207%402ndquadrant.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Tablesync early exit
Next
From: Amit Kapila
Date:
Subject: Re: Tablesync early exit