Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences, take 2
Date
Msg-id 68b4942c-be04-dbd5-c630-b22564151ca8@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences, take 2  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding and replication of sequences, take 2  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 3/20/23 12:00, Amit Kapila wrote:
> On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>>
>> On 3/20/23 04:42, Amit Kapila wrote:
>>> On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>
>>>> On 3/18/23 06:35, Amit Kapila wrote:
>>>>> On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra
>>>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> Clearly, for sequences we can't quite rely on snapshots/slots, we need
>>>>>> to get the LSN to decide what changes to apply/skip from somewhere else.
>>>>>> I wonder if we can just ignore the queued changes in tablesync, but I
>>>>>> guess not - there can be queued increments after reading the sequence
>>>>>> state, and we need to apply those. But maybe we could use the page LSN
>>>>>> from the relfilenode - that should be the LSN of the last WAL record.
>>>>>>
>>>>>> Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
>>>>>> use to read the sequence state ...
>>>>>>
>>>>>
>>>>> What if some Alter Sequence is performed before the copy starts and
>>>>> after the copy is finished, the containing transaction rolled back?
>>>>> Won't it copy something which shouldn't have been copied?
>>>>>
>>>>
>>>> That shouldn't be possible - the alter creates a new relfilenode and
>>>> it's invisible until commit. So either it gets committed (and then
>>>> replicated), or it remains invisible to the SELECT during sync.
>>>>
>>>
>>> Okay, however, we need to ensure that such a change will later be
>>> replicated and also need to ensure that the required WAL doesn't get
>>> removed.
>>>
>>> Say, if we use your first idea of page LSN from the relfilenode, then
>>> how do we ensure that the corresponding WAL doesn't get removed when
>>> later the sync worker tries to start replication from that LSN? I am
>>> imagining here the sync_sequence_slot will be created before
>>> copy_sequence but even then it is possible that the sequence has not
>>> been updated for a long time and the LSN location will be in the past
>>> (as compared to the slot's LSN) which means the corresponding WAL
>>> could be removed. Now, here we can't directly start using the slot's
>>> LSN to stream changes because there is no correlation of it with the
>>> LSN (page LSN of sequence's relfilnode) where we want to start
>>> streaming.
>>>
>>
>> I don't understand why we'd need WAL from before the slot is created,
>> which happens before copy_sequence so the sync will see a more recent
>> state (reflecting all changes up to the slot LSN).
>>
> 
> Imagine the following sequence of events:
> 1. Operation on a sequence seq-1 which requires WAL. Say, this is done
> at LSN 1000.
> 2. Some other random operations on unrelated objects. This would
> increase LSN to 2000.
> 3. Create a slot that uses current LSN 2000.
> 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then
> you will use LSN 1000 as a starting point to start replication in
> sequence sync worker.
> 
> It is quite possible that WAL from LSN 1000 may not be present. Now,
> it may be possible that we use the slot's LSN in this case but
> currently, it may not be possible without some changes in the slot
> machinery. Even, if we somehow solve this, we have the below problem
> where we can miss some concurrent activity.
> 

I think the question is what would be the WAL-requiring operation at LSN
1000. If it's just regular nextval(), then we *will* see it during
copy_sequence - sequences are not transactional in the MVCC sense.

If it's an ALTER SEQUENCE, I guess it might create a new relfilenode,
and then we might fail to apply this - that'd be bad.

I wonder if we'd allow actually discarding the WAL while building the
consistent snapshot, though. You're however right we can't just decide
this based on LSN, we'd probably need to compare the relfilenodes too or
something like that ...

>> I think the only "issue" are the WAL records after the slot LSN, or more
>> precisely deciding which of the decoded changes to apply.
>>
>>
>>> Now, for the second idea which is to directly use
>>> pg_current_wal_insert_lsn(), I think we won't be able to ensure that
>>> the changes covered by in-progress transactions like the one with
>>> Alter Sequence I have given example would be streamed later after the
>>> initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
>>> could be an LSN after the LSN associated with Alter Sequence but
>>> before the corresponding xact's commit.
>>
>> Yeah, I think you're right - the locking itself is not sufficient to
>> prevent this ordering of operations. copy_sequence would have to lock
>> the sequence exclusively, which seems bit disruptive.
>>
> 
> Right, that doesn't sound like a good idea.
> 

Although, maybe we could use a less strict lock level? I mean, one that
allows nextval() to continue, but would conflict with ALTER SEQUENCE.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jeevan Ladhe
Date:
Subject: Re: server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls
Next
From: Robert Haas
Date:
Subject: Re: allow_in_place_tablespaces vs. pg_basebackup