Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences, take 2
Date
Msg-id 9386d80a-ca2d-2d10-9617-d83341b7d3c3@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences, take 2  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding and replication of sequences, take 2  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 3/20/23 04:42, Amit Kapila wrote:
> On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 3/18/23 06:35, Amit Kapila wrote:
>>> On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>
>>>> ...
>>>>
>>>> Clearly, for sequences we can't quite rely on snapshots/slots, we need
>>>> to get the LSN to decide what changes to apply/skip from somewhere else.
>>>> I wonder if we can just ignore the queued changes in tablesync, but I
>>>> guess not - there can be queued increments after reading the sequence
>>>> state, and we need to apply those. But maybe we could use the page LSN
>>>> from the relfilenode - that should be the LSN of the last WAL record.
>>>>
>>>> Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
>>>> use to read the sequence state ...
>>>>
>>>
>>> What if some Alter Sequence is performed before the copy starts and
>>> after the copy is finished, the containing transaction rolled back?
>>> Won't it copy something which shouldn't have been copied?
>>>
>>
>> That shouldn't be possible - the alter creates a new relfilenode and
>> it's invisible until commit. So either it gets committed (and then
>> replicated), or it remains invisible to the SELECT during sync.
>>
> 
> Okay, however, we need to ensure that such a change will later be
> replicated and also need to ensure that the required WAL doesn't get
> removed.
> 
> Say, if we use your first idea of page LSN from the relfilenode, then
> how do we ensure that the corresponding WAL doesn't get removed when
> later the sync worker tries to start replication from that LSN? I am
> imagining here the sync_sequence_slot will be created before
> copy_sequence but even then it is possible that the sequence has not
> been updated for a long time and the LSN location will be in the past
> (as compared to the slot's LSN) which means the corresponding WAL
> could be removed. Now, here we can't directly start using the slot's
> LSN to stream changes because there is no correlation of it with the
> LSN (page LSN of sequence's relfilnode) where we want to start
> streaming.
> 

I don't understand why we'd need WAL from before the slot is created,
which happens before copy_sequence so the sync will see a more recent
state (reflecting all changes up to the slot LSN).

I think the only "issue" are the WAL records after the slot LSN, or more
precisely deciding which of the decoded changes to apply.


> Now, for the second idea which is to directly use
> pg_current_wal_insert_lsn(), I think we won't be able to ensure that
> the changes covered by in-progress transactions like the one with
> Alter Sequence I have given example would be streamed later after the
> initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
> could be an LSN after the LSN associated with Alter Sequence but
> before the corresponding xact's commit.

Yeah, I think you're right - the locking itself is not sufficient to
prevent this ordering of operations. copy_sequence would have to lock
the sequence exclusively, which seems bit disruptive.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Timeline ID hexadecimal format
Next
From: Ronan Dunklau
Date:
Subject: Re: Exclusion constraints on partitioned tables