Re: Restrict copying of invalidated replication slots - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Restrict copying of invalidated replication slots |
Date | |
Msg-id | CAD21AoD6qSTx_H6XRZVNkSLX0Bh6yeQEVM6GoTGFqbQP-xAwGQ@mail.gmail.com Whole thread Raw |
In response to | Re: Restrict copying of invalidated replication slots (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
List | pgsql-hackers |
On Mon, Feb 24, 2025 at 3:06 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > On Sat, 22 Feb 2025 at 04:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Feb 21, 2025 at 4:30 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > On Fri, 21 Feb 2025 at 01:14, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > > > > > On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu) > > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > > > > > On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > > > > > > > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > Currently, we can copy an invalidated slot using the function > > > > > > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the > > > > > > > > > thread [1], we should prohibit copying of such slots. > > > > > > > > > > > > > > > > > > I have created a patch to address the issue. > > > > > > > > > > > > > > > > This patch does not fix all the copy_replication_slot scenarios > > > > > > > > completely, there is a very corner concurrency case where an > > > > > > > > invalidated slot still gets copied: > > > > > > > > + /* We should not copy invalidated replication slots */ > > > > > > > > + if (src_isinvalidated) > > > > > > > > + ereport(ERROR, > > > > > > > > + > > > > > > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > > > > > > > + errmsg("cannot copy an invalidated > > > > > > > > replication slot"))); > > > > > > > > > > > > > > > > Consider the following scenario: > > > > > > > > step 1) Set up streaming replication between the primary and standby nodes. > > > > > > > > step 2) Create a logical replication slot (test1) on the standby node. > > > > > > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause > > > > > > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or > > > > > > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below: > > > > > > > > if (cause == RS_INVAL_WAL_LEVEL) > > > > > > > > { > > > > > > > > while (bsleep) > > > > > > > > sleep(1); > > > > > > > > } > > > > > > > > step 4) Reduce wal_level on the primary to replica and restart the primary > > > > > > > node. > > > > > > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1', > > > > > > > > 'test2'); -- It will wait till the lock held by > > > > > > > > InvalidatePossiblyObsoleteSlot is released while trying to create a > > > > > > > > slot. > > > > > > > > step 6) Increase wal_level back to logical on the primary node and > > > > > > > > restart the primary. > > > > > > > > step 7) Now allow the invalidation to happen (continue the breakpoint > > > > > > > > held at step 3), the replication control lock will be released and the > > > > > > > > invalidated slot will be copied > > > > > > > > > > > > > > > > After this: > > > > > > > > postgres=# SELECT 'copy' FROM > > > > > > > > pg_copy_logical_replication_slot('test1', 'test2'); ?column? > > > > > > > > ---------- > > > > > > > > copy > > > > > > > > (1 row) > > > > > > > > > > > > > > > > -- The invalidated slot (test1) is copied successfully: > > > > > > > > postgres=# select * from pg_replication_slots ; > > > > > > > > slot_name | plugin | slot_type | datoid | database | temporary > > > > > > > > | active | active_pid | xmin | catalog_xmin | restart_lsn | > > > > > > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas > > > > > > > > e | inactive_since | conflicting | > > > > > > > > invalidation_reason | failover | synced > > > > > > > > > > > > > > > -----------+---------------+-----------+--------+----------+-----------+ > > > > > > > --------+------------+------+--------------+-------------+--------------- > > > > > > > ------+------------+---------------+--------- > > > > > > > > > > > > > > > --+----------------------------------+-------------+---------------------- > > > > > > > --+----------+-------- > > > > > > > > test1 | test_decoding | logical | 5 | postgres | f > > > > > > > > | f | | | 745 | 0/4029060 | 0/4029098 > > > > > > > > | lost | | f > > > > > > > > | 2025-02-13 15:26:54.666725+05:30 | t | > > > > > > > > wal_level_insufficient | f | f > > > > > > > > test2 | test_decoding | logical | 5 | postgres | f > > > > > > > > | f | | | 745 | 0/4029060 | 0/4029098 > > > > > > > > | reserved | | f > > > > > > > > | 2025-02-13 15:30:30.477836+05:30 | f | > > > > > > > > | f | f > > > > > > > > (2 rows) > > > > > > > > > > > > > > > > -- A subsequent attempt to decode changes from the invalidated slot > > > > > > > > (test2) fails: > > > > > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL, > > > > > > > > NULL); > > > > > > > > WARNING: detected write past chunk end in TXN 0x5e77e6c6f300 > > > > > > > > ERROR: logical decoding on standby requires "wal_level" >= "logical" > > > > > > > > on the primary > > > > > > > > > > > > > > > > -- Alternatively, the following error may occur: > > > > > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL, > > > > > > > > NULL); > > > > > > > > WARNING: detected write past chunk end in TXN 0x582d1b2d6ef0 > > > > > > > > data > > > > > > > > ------------ > > > > > > > > BEGIN 744 > > > > > > > > COMMIT 744 > > > > > > > > (2 rows) > > > > > > > > > > > > > > > > This is an edge case that can occur under specific conditions > > > > > > > > involving replication slot invalidation when there is a huge lag > > > > > > > > between primary and standby. > > > > > > > > There might be a similar concurrency case for wal_removed too. > > > > > > > > > > > > > > > > > > > > > > Hi Vignesh, > > > > > > > > > > > > > > Thanks for reviewing the patch. > > > > > > > > > > > > Thanks for updating the patch. I have a question related to it. > > > > > > > > > > > > > > > > > > > > I have tested the above scenario and was able to reproduce it. I have fixed it in > > > > > > > the v2 patch. > > > > > > > Currently we are taking a shared lock on ReplicationSlotControlLock. > > > > > > > This issue can be resolved if we take an exclusive lock instead. > > > > > > > Thoughts? > > > > > > > > > > > > It's not clear to me why increasing the lock level can solve it, could you > > > > > > elaborate a bit more on this ? > > > > > > > > > > > In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on > > > > > 'ReplicationSlotControlLock' > > > > > Also in function 'copy_replication_slot' we take a SHARED lock on > > > > > 'ReplicationSlotControlLock' during fetching of source slot. > > > > > > > > > > So, for the case described by Vignesh in [1], first > > > > > InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock > > > > > on 'ReplicationSlotControlLock'. We are now holding the function using > > > > > the sleep > > > > > if (cause == RS_INVAL_WAL_LEVEL) > > > > > { > > > > > while (bsleep) > > > > > sleep(1); > > > > > } > > > > > > > > > > Now we create a copy of the slot since 'copy_replication_slot' takes > > > > > a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock > > > > > and fetch the info of the source slot (the slot is not invalidated > > > > > till now). and the function 'copy_replication_slot' calls function > > > > > 'create_logical_replication_slot' which takes a EXCLUSIVE lock on > > > > > ReplicationSlotControlLock and hence it will wait for function > > > > > InvalidateObsoleteReplicationSlot to release lock. Once the function > > > > > 'InvalidateObsoleteReplicationSlot' releases the lock, the execution > > > > > of 'create_logical_replication_slot' continues and creates a copy of > > > > > the source slot. > > > > > > > > > > Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE > > > > > lock on 'ReplicationSlotControlLock'. to fetch the slot info. Hence, > > > > > it will wait for the 'InvalidateObsoleteReplicationSlot' to release > > > > > the lock and then fetch the source slot info and try to create the > > > > > copied slot (which will fail as source slot is invalidated before we > > > > > fetch its info) > > > > > > > > > > > Besides, do we need one more invalidated check in the following codes after > > > > > > creating the slot ? > > > > > > > > > > > > /* > > > > > > * Check if the source slot still exists and is valid. We regard it as > > > > > > * invalid if the type of replication slot or name has been changed, > > > > > > * or the restart_lsn either is invalid or has gone backward. (The > > > > > > ... > > > > > > > > > > > > > > > > This approach seems more feasible to me. It also resolves the issue > > > > > suggested by Vignesh in [1]. I have made changes for the same in v3 > > > > > patch. > > > > > > > > > > > > > I agree to check if the source slot got invalidated during the copy. > > > > But why do we need to search the slot by the slot name again as > > > > follows? > > > > > > > > + /* Check if source slot was invalidated while copying of slot */ > > > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > > > > + > > > > + for (int i = 0; i < max_replication_slots; i++) > > > > + { > > > > + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i]; > > > > + > > > > + if (s->in_use && strcmp(NameStr(s->data.name), > > > > NameStr(*src_name)) == 0) > > > > + { > > > > + /* Copy the slot contents while holding spinlock */ > > > > + SpinLockAcquire(&s->mutex); > > > > + first_slot_contents = *s; > > > > + SpinLockRelease(&s->mutex); > > > > + src = s; > > > > + break; > > > > + } > > > > + } > > > > + > > > > + LWLockRelease(ReplicationSlotControlLock); > > > > > > > > I think 'src' already points to the source slot. > > > > > > > Hi Sawada san, > > > > > > Thanks for reviewing the patch. > > > I have used the 'src' instead of iterating again. I have attached the > > > updated v4 patch. > > > > Thank you for updating the patch! I have one comment: > > > > + /* Check if source slot was invalidated while copying of slot */ > > + SpinLockAcquire(&src->mutex); > > + first_slot_contents = *src; > > + SpinLockRelease(&src->mutex); > > > > We don't need to copy the source slot contents again since we already > > do as follows: > > > > /* Copy data of source slot again */ > > SpinLockAcquire(&src->mutex); > > second_slot_contents = *src; > > SpinLockRelease(&src->mutex); > > > > I think we can use second_slot_contents for that check. > > > I agree. I have updated the v5 patch to use second_slot_contents > > > I've investigated the slot invalidation and copying slots behaviors. > > We cannot copy a slot if it doesn't reserve WAL, but IIUC the slot's > > restart_lsn is not reset by slot invalidation due to other than > > RS_INVAL_WAL_REMOVED. Therefore, it's possible that we copy a slot > > invalidated by for example RS_INVAL_IDLE_TIMEOUT, and the copied > > slot's restart_lsn might have already been removed, which ultimately > > causes an assertion failure in ocpy_replication_slot(): > > > > #ifdef USE_ASSERT_CHECKING > > /* Check that the restart_lsn is available */ > > { > > XLogSegNo segno; > > > > XLByteToSeg(copy_restart_lsn, segno, wal_segment_size); > > Assert(XLogGetLastRemovedSegno() < segno); > > } > > #endif > > > > I think this issue exists from v16 or later, I've not tested yet > > though. If my understanding is right, this patch has to be > > backpatched. > > > I have tested the above in HEAD, PG 17 and PG 16 and found that we can > hit the above ASSERT condition in all three branches. With the > following steps: > 1. create a physical replication setup > 2. In standby create a logical replication slot. > 3. change wal_level of primary to 'replica' and restart primary. The > slot is invalidated with 'wal_level_insufficient' > 4. change wal_level of primary to 'logical' and restart primary. > 5. In primary insert some records and run checkpoint. Also run a > checkpoint on standby. So, some initial wal files are removed. > 6. Now copy the logical replication slot created in step 2. Then we > can hit the assert. > > I agree that backpatching the patch can resolve this as it prevents > copying of invalidated slots. > > I have attached the following patches: > v5-0001 : for HEAD > v5_PG_17_PG_16-0001 : for PG17 and PG16 I've checked if this issue exists also on v15 or older, but IIUC it doesn't exist, fortunately. Here is the summary: Scenario-1: the source gets invalidated before the copy function fetches its contents for the first time. In this case, since the source slot's restart_lsn is already an invalid LSN it raises an error appropriately. In v15, we have only one slot invaldation reason, WAL removal, therefore we always reset the slot's restart_lsn to InvalidXlogRecPtr. Scenario-2: the source gets invalidated before the copied slot is created (i.e., between first content copy and create_logical/physical_replication_slot()). In this case, the copied slot could have a valid restart_lsn value that however might point to a WAL segment that might have already been removed. However, since copy_restart_lsn will be an invalid LSN (=0), it's caught by the following if condition: if (copy_restart_lsn < src_restart_lsn || src_islogical != copy_islogical || strcmp(copy_name, NameStr(*src_name)) != 0) ereport(ERROR, (errmsg("could not copy replication slot \"%s\"", NameStr(*src_name)), errdetail("The source replication slot was modified incompatibly during the copy operation."))); Scenario-3: the source gets invalidated after creating the copied slot (i.e. after create_logical/physical_replication_slot()). In this case, since the newly copied slot have the same restart_lsn as the source slot, both slots are invalidated. If the above analysis is right, I think the patches are mostly ready. I've made some changes to the patches: - removed src_isinvalidated variable as it's not necessarily necessary. - updated the commit message. Please review them. If there are no further comments on these patches, I'm going to push them. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: