Thread: Re: Restrict copying of invalidated replication slots

Re: Restrict copying of invalidated replication slots

From
vignesh C
Date:
On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> Hi,
>
> Currently, we can copy an invalidated slot using the function
> 'pg_copy_logical_replication_slot'. As per the suggestion in the
> thread [1], we should prohibit copying of such slots.
>
> I have created a patch to address the issue.

This patch does not fix all the copy_replication_slot scenarios
completely, there is a very corner concurrency case where an
invalidated slot still gets copied:
+       /* We should not copy invalidated replication slots */
+       if (src_isinvalidated)
+               ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("cannot copy an invalidated
replication slot")));

Consider the following scenario:
step 1) Set up streaming replication between the primary and standby nodes.
step 2) Create a logical replication slot (test1) on the standby node.
step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
add a sleep in InvalidatePossiblyObsoleteSlot function like below:
if (cause == RS_INVAL_WAL_LEVEL)
{
while (bsleep)
sleep(1);
}
step 4) Reduce wal_level on the primary to replica and restart the primary node.
step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
'test2');  -- It will wait till the lock held by
InvalidatePossiblyObsoleteSlot is released while trying to create a
slot.
step 6) Increase wal_level back to logical on the primary node and
restart the primary.
step 7) Now allow the invalidation to happen (continue the breakpoint
held at step 3), the replication control lock will be released and the
invalidated slot will be copied

After this:
postgres=# SELECT 'copy' FROM
pg_copy_logical_replication_slot('test1', 'test2');
 ?column?
----------
 copy
(1 row)

-- The invalidated slot (test1) is copied successfully:
postgres=# select * from pg_replication_slots ;
 slot_name |    plugin     | slot_type | datoid | database | temporary
| active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
e |          inactive_since          | conflicting |
invalidation_reason   | failover | synced

-----------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+---------
--+----------------------------------+-------------+------------------------+----------+--------
 test1     | test_decoding | logical   |      5 | postgres | f
| f      |            |      |          745 | 0/4029060   | 0/4029098
         | lost       |               | f
  | 2025-02-13 15:26:54.666725+05:30 | t           |
wal_level_insufficient | f        | f
 test2     | test_decoding | logical   |      5 | postgres | f
| f      |            |      |          745 | 0/4029060   | 0/4029098
         | reserved   |               | f
  | 2025-02-13 15:30:30.477836+05:30 | f           |
     | f        | f
(2 rows)

-- A subsequent attempt to decode changes from the invalidated slot
(test2) fails:
postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL, NULL);
WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
ERROR:  logical decoding on standby requires "wal_level" >= "logical"
on the primary

-- Alternatively, the following error may occur:
postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL, NULL);
WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
    data
------------
 BEGIN 744
 COMMIT 744
(2 rows)

This is an edge case that can occur under specific conditions
involving replication slot invalidation when there is a huge lag
between primary and standby.
There might be a similar concurrency case for wal_removed too.

Regards,
Vignesh



RE: Restrict copying of invalidated replication slots

From
"Zhijie Hou (Fujitsu)"
Date:
On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> 
> On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Currently, we can copy an invalidated slot using the function
> > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > thread [1], we should prohibit copying of such slots.
> > >
> > > I have created a patch to address the issue.
> >
> > This patch does not fix all the copy_replication_slot scenarios
> > completely, there is a very corner concurrency case where an
> > invalidated slot still gets copied:
> > +       /* We should not copy invalidated replication slots */
> > +       if (src_isinvalidated)
> > +               ereport(ERROR,
> > +
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > +                                errmsg("cannot copy an invalidated
> > replication slot")));
> >
> > Consider the following scenario:
> > step 1) Set up streaming replication between the primary and standby nodes.
> > step 2) Create a logical replication slot (test1) on the standby node.
> > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > if (cause == RS_INVAL_WAL_LEVEL)
> > {
> > while (bsleep)
> > sleep(1);
> > }
> > step 4) Reduce wal_level on the primary to replica and restart the primary
> node.
> > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > 'test2');  -- It will wait till the lock held by
> > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > slot.
> > step 6) Increase wal_level back to logical on the primary node and
> > restart the primary.
> > step 7) Now allow the invalidation to happen (continue the breakpoint
> > held at step 3), the replication control lock will be released and the
> > invalidated slot will be copied
> >
> > After this:
> > postgres=# SELECT 'copy' FROM
> > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > ----------
> >  copy
> > (1 row)
> >
> > -- The invalidated slot (test1) is copied successfully:
> > postgres=# select * from pg_replication_slots ;
> >  slot_name |    plugin     | slot_type | datoid | database | temporary
> > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > e |          inactive_since          | conflicting |
> > invalidation_reason   | failover | synced
> >
> -----------+---------------+-----------+--------+----------+-----------+
> --------+------------+------+--------------+-------------+---------------
> ------+------------+---------------+---------
> >
> --+----------------------------------+-------------+----------------------
> --+----------+--------
> >  test1     | test_decoding | logical   |      5 | postgres | f
> > | f      |            |      |          745 | 0/4029060   | 0/4029098
> >          | lost       |               | f
> >   | 2025-02-13 15:26:54.666725+05:30 | t           |
> > wal_level_insufficient | f        | f
> >  test2     | test_decoding | logical   |      5 | postgres | f
> > | f      |            |      |          745 | 0/4029060   | 0/4029098
> >          | reserved   |               | f
> >   | 2025-02-13 15:30:30.477836+05:30 | f           |
> >      | f        | f
> > (2 rows)
> >
> > -- A subsequent attempt to decode changes from the invalidated slot
> > (test2) fails:
> > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > NULL);
> > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > on the primary
> >
> > -- Alternatively, the following error may occur:
> > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > NULL);
> > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> >     data
> > ------------
> >  BEGIN 744
> >  COMMIT 744
> > (2 rows)
> >
> > This is an edge case that can occur under specific conditions
> > involving replication slot invalidation when there is a huge lag
> > between primary and standby.
> > There might be a similar concurrency case for wal_removed too.
> >
> 
> Hi Vignesh,
> 
> Thanks for reviewing the patch.

Thanks for updating the patch. I have a question related to it.

> 
> I have tested the above scenario and was able to reproduce it. I have fixed it in
> the v2 patch.
> Currently we are taking a shared lock on ReplicationSlotControlLock.
> This issue can be resolved if we take an exclusive lock instead.
> Thoughts?

It's not clear to me why increasing the lock level can solve it, could you
elaborate a bit more on this ?

Besides, do we need one more invalidated check in the following codes after
creating the slot ?

        /*
         * Check if the source slot still exists and is valid. We regard it as
         * invalid if the type of replication slot or name has been changed,
         * or the restart_lsn either is invalid or has gone backward. (The
         ...

Best Regards,
Hou zj


Re: Restrict copying of invalidated replication slots

From
Shlok Kyal
Date:
On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Currently, we can copy an invalidated slot using the function
> > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > thread [1], we should prohibit copying of such slots.
> > > >
> > > > I have created a patch to address the issue.
> > >
> > > This patch does not fix all the copy_replication_slot scenarios
> > > completely, there is a very corner concurrency case where an
> > > invalidated slot still gets copied:
> > > +       /* We should not copy invalidated replication slots */
> > > +       if (src_isinvalidated)
> > > +               ereport(ERROR,
> > > +
> > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > +                                errmsg("cannot copy an invalidated
> > > replication slot")));
> > >
> > > Consider the following scenario:
> > > step 1) Set up streaming replication between the primary and standby nodes.
> > > step 2) Create a logical replication slot (test1) on the standby node.
> > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > if (cause == RS_INVAL_WAL_LEVEL)
> > > {
> > > while (bsleep)
> > > sleep(1);
> > > }
> > > step 4) Reduce wal_level on the primary to replica and restart the primary
> > node.
> > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > 'test2');  -- It will wait till the lock held by
> > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > slot.
> > > step 6) Increase wal_level back to logical on the primary node and
> > > restart the primary.
> > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > held at step 3), the replication control lock will be released and the
> > > invalidated slot will be copied
> > >
> > > After this:
> > > postgres=# SELECT 'copy' FROM
> > > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > > ----------
> > >  copy
> > > (1 row)
> > >
> > > -- The invalidated slot (test1) is copied successfully:
> > > postgres=# select * from pg_replication_slots ;
> > >  slot_name |    plugin     | slot_type | datoid | database | temporary
> > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > e |          inactive_since          | conflicting |
> > > invalidation_reason   | failover | synced
> > >
> > -----------+---------------+-----------+--------+----------+-----------+
> > --------+------------+------+--------------+-------------+---------------
> > ------+------------+---------------+---------
> > >
> > --+----------------------------------+-------------+----------------------
> > --+----------+--------
> > >  test1     | test_decoding | logical   |      5 | postgres | f
> > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > >          | lost       |               | f
> > >   | 2025-02-13 15:26:54.666725+05:30 | t           |
> > > wal_level_insufficient | f        | f
> > >  test2     | test_decoding | logical   |      5 | postgres | f
> > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > >          | reserved   |               | f
> > >   | 2025-02-13 15:30:30.477836+05:30 | f           |
> > >      | f        | f
> > > (2 rows)
> > >
> > > -- A subsequent attempt to decode changes from the invalidated slot
> > > (test2) fails:
> > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > NULL);
> > > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > > on the primary
> > >
> > > -- Alternatively, the following error may occur:
> > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > NULL);
> > > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> > >     data
> > > ------------
> > >  BEGIN 744
> > >  COMMIT 744
> > > (2 rows)
> > >
> > > This is an edge case that can occur under specific conditions
> > > involving replication slot invalidation when there is a huge lag
> > > between primary and standby.
> > > There might be a similar concurrency case for wal_removed too.
> > >
> >
> > Hi Vignesh,
> >
> > Thanks for reviewing the patch.
>
> Thanks for updating the patch. I have a question related to it.
>
> >
> > I have tested the above scenario and was able to reproduce it. I have fixed it in
> > the v2 patch.
> > Currently we are taking a shared lock on ReplicationSlotControlLock.
> > This issue can be resolved if we take an exclusive lock instead.
> > Thoughts?
>
> It's not clear to me why increasing the lock level can solve it, could you
> elaborate a bit more on this ?
>
In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on
'ReplicationSlotControlLock'
Also in function 'copy_replication_slot' we take a  SHARED lock on
'ReplicationSlotControlLock' during fetching of source slot.

So, for the case described by Vignesh in [1], first
InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock
on 'ReplicationSlotControlLock'. We are now holding the function using
the sleep
if (cause == RS_INVAL_WAL_LEVEL)
 {
while (bsleep)
sleep(1);
}

Now we create a copy of the slot since 'copy_replication_slot'  takes
a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock
and fetch the info of the source slot (the slot is not invalidated
till now). and the function 'copy_replication_slot' calls function
'create_logical_replication_slot' which takes a EXCLUSIVE lock on
ReplicationSlotControlLock and hence it will wait for function
InvalidateObsoleteReplicationSlot to release lock. Once the function
'InvalidateObsoleteReplicationSlot' releases the lock, the execution
of 'create_logical_replication_slot' continues and creates a copy of
the source slot.

Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE
lock on  'ReplicationSlotControlLock'. to fetch the slot info. Hence,
it will wait for the 'InvalidateObsoleteReplicationSlot' to release
the lock and then fetch the source slot info and try to create the
copied slot (which will fail as source slot is invalidated before we
fetch its info)

> Besides, do we need one more invalidated check in the following codes after
> creating the slot ?
>
>                 /*
>                  * Check if the source slot still exists and is valid. We regard it as
>                  * invalid if the type of replication slot or name has been changed,
>                  * or the restart_lsn either is invalid or has gone backward. (The
>                  ...
>

This approach seems more feasible to me. It also resolves the issue
suggested by Vignesh in [1].  I have made changes for the same in v3
patch.

[1]: https://www.postgresql.org/message-id/CALDaNm2rrxO5mg6OKoScw84K5P1Tw_cbjniHm%2BGeyxme8Ei-nQ%40mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachment

Re: Restrict copying of invalidated replication slots

From
Masahiko Sawada
Date:
On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Currently, we can copy an invalidated slot using the function
> > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > > thread [1], we should prohibit copying of such slots.
> > > > >
> > > > > I have created a patch to address the issue.
> > > >
> > > > This patch does not fix all the copy_replication_slot scenarios
> > > > completely, there is a very corner concurrency case where an
> > > > invalidated slot still gets copied:
> > > > +       /* We should not copy invalidated replication slots */
> > > > +       if (src_isinvalidated)
> > > > +               ereport(ERROR,
> > > > +
> > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > +                                errmsg("cannot copy an invalidated
> > > > replication slot")));
> > > >
> > > > Consider the following scenario:
> > > > step 1) Set up streaming replication between the primary and standby nodes.
> > > > step 2) Create a logical replication slot (test1) on the standby node.
> > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > {
> > > > while (bsleep)
> > > > sleep(1);
> > > > }
> > > > step 4) Reduce wal_level on the primary to replica and restart the primary
> > > node.
> > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > > 'test2');  -- It will wait till the lock held by
> > > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > > slot.
> > > > step 6) Increase wal_level back to logical on the primary node and
> > > > restart the primary.
> > > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > > held at step 3), the replication control lock will be released and the
> > > > invalidated slot will be copied
> > > >
> > > > After this:
> > > > postgres=# SELECT 'copy' FROM
> > > > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > > > ----------
> > > >  copy
> > > > (1 row)
> > > >
> > > > -- The invalidated slot (test1) is copied successfully:
> > > > postgres=# select * from pg_replication_slots ;
> > > >  slot_name |    plugin     | slot_type | datoid | database | temporary
> > > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > > e |          inactive_since          | conflicting |
> > > > invalidation_reason   | failover | synced
> > > >
> > > -----------+---------------+-----------+--------+----------+-----------+
> > > --------+------------+------+--------------+-------------+---------------
> > > ------+------------+---------------+---------
> > > >
> > > --+----------------------------------+-------------+----------------------
> > > --+----------+--------
> > > >  test1     | test_decoding | logical   |      5 | postgres | f
> > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > >          | lost       |               | f
> > > >   | 2025-02-13 15:26:54.666725+05:30 | t           |
> > > > wal_level_insufficient | f        | f
> > > >  test2     | test_decoding | logical   |      5 | postgres | f
> > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > >          | reserved   |               | f
> > > >   | 2025-02-13 15:30:30.477836+05:30 | f           |
> > > >      | f        | f
> > > > (2 rows)
> > > >
> > > > -- A subsequent attempt to decode changes from the invalidated slot
> > > > (test2) fails:
> > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > NULL);
> > > > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > > > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > > > on the primary
> > > >
> > > > -- Alternatively, the following error may occur:
> > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > NULL);
> > > > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> > > >     data
> > > > ------------
> > > >  BEGIN 744
> > > >  COMMIT 744
> > > > (2 rows)
> > > >
> > > > This is an edge case that can occur under specific conditions
> > > > involving replication slot invalidation when there is a huge lag
> > > > between primary and standby.
> > > > There might be a similar concurrency case for wal_removed too.
> > > >
> > >
> > > Hi Vignesh,
> > >
> > > Thanks for reviewing the patch.
> >
> > Thanks for updating the patch. I have a question related to it.
> >
> > >
> > > I have tested the above scenario and was able to reproduce it. I have fixed it in
> > > the v2 patch.
> > > Currently we are taking a shared lock on ReplicationSlotControlLock.
> > > This issue can be resolved if we take an exclusive lock instead.
> > > Thoughts?
> >
> > It's not clear to me why increasing the lock level can solve it, could you
> > elaborate a bit more on this ?
> >
> In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on
> 'ReplicationSlotControlLock'
> Also in function 'copy_replication_slot' we take a  SHARED lock on
> 'ReplicationSlotControlLock' during fetching of source slot.
>
> So, for the case described by Vignesh in [1], first
> InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock
> on 'ReplicationSlotControlLock'. We are now holding the function using
> the sleep
> if (cause == RS_INVAL_WAL_LEVEL)
>  {
> while (bsleep)
> sleep(1);
> }
>
> Now we create a copy of the slot since 'copy_replication_slot'  takes
> a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock
> and fetch the info of the source slot (the slot is not invalidated
> till now). and the function 'copy_replication_slot' calls function
> 'create_logical_replication_slot' which takes a EXCLUSIVE lock on
> ReplicationSlotControlLock and hence it will wait for function
> InvalidateObsoleteReplicationSlot to release lock. Once the function
> 'InvalidateObsoleteReplicationSlot' releases the lock, the execution
> of 'create_logical_replication_slot' continues and creates a copy of
> the source slot.
>
> Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE
> lock on  'ReplicationSlotControlLock'. to fetch the slot info. Hence,
> it will wait for the 'InvalidateObsoleteReplicationSlot' to release
> the lock and then fetch the source slot info and try to create the
> copied slot (which will fail as source slot is invalidated before we
> fetch its info)
>
> > Besides, do we need one more invalidated check in the following codes after
> > creating the slot ?
> >
> >                 /*
> >                  * Check if the source slot still exists and is valid. We regard it as
> >                  * invalid if the type of replication slot or name has been changed,
> >                  * or the restart_lsn either is invalid or has gone backward. (The
> >                  ...
> >
>
> This approach seems more feasible to me. It also resolves the issue
> suggested by Vignesh in [1].  I have made changes for the same in v3
> patch.
>

I agree to check if the source slot got invalidated during the copy.
But why do we need to search the slot by the slot name again as
follows?

+       /* Check if source slot was invalidated while copying of slot */
+       LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+
+       for (int i = 0; i < max_replication_slots; i++)
+       {
+           ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+           if (s->in_use && strcmp(NameStr(s->data.name),
NameStr(*src_name)) == 0)
+           {
+               /* Copy the slot contents while holding spinlock */
+               SpinLockAcquire(&s->mutex);
+               first_slot_contents = *s;
+               SpinLockRelease(&s->mutex);
+               src = s;
+               break;
+           }
+       }
+
+       LWLockRelease(ReplicationSlotControlLock);

I think 'src' already points to the source slot.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Restrict copying of invalidated replication slots

From
Shlok Kyal
Date:
On Fri, 21 Feb 2025 at 01:14, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > >
> > > > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> > > > >
> > > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Currently, we can copy an invalidated slot using the function
> > > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > > > thread [1], we should prohibit copying of such slots.
> > > > > >
> > > > > > I have created a patch to address the issue.
> > > > >
> > > > > This patch does not fix all the copy_replication_slot scenarios
> > > > > completely, there is a very corner concurrency case where an
> > > > > invalidated slot still gets copied:
> > > > > +       /* We should not copy invalidated replication slots */
> > > > > +       if (src_isinvalidated)
> > > > > +               ereport(ERROR,
> > > > > +
> > > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > > +                                errmsg("cannot copy an invalidated
> > > > > replication slot")));
> > > > >
> > > > > Consider the following scenario:
> > > > > step 1) Set up streaming replication between the primary and standby nodes.
> > > > > step 2) Create a logical replication slot (test1) on the standby node.
> > > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > > {
> > > > > while (bsleep)
> > > > > sleep(1);
> > > > > }
> > > > > step 4) Reduce wal_level on the primary to replica and restart the primary
> > > > node.
> > > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > > > 'test2');  -- It will wait till the lock held by
> > > > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > > > slot.
> > > > > step 6) Increase wal_level back to logical on the primary node and
> > > > > restart the primary.
> > > > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > > > held at step 3), the replication control lock will be released and the
> > > > > invalidated slot will be copied
> > > > >
> > > > > After this:
> > > > > postgres=# SELECT 'copy' FROM
> > > > > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > > > > ----------
> > > > >  copy
> > > > > (1 row)
> > > > >
> > > > > -- The invalidated slot (test1) is copied successfully:
> > > > > postgres=# select * from pg_replication_slots ;
> > > > >  slot_name |    plugin     | slot_type | datoid | database | temporary
> > > > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > > > e |          inactive_since          | conflicting |
> > > > > invalidation_reason   | failover | synced
> > > > >
> > > > -----------+---------------+-----------+--------+----------+-----------+
> > > > --------+------------+------+--------------+-------------+---------------
> > > > ------+------------+---------------+---------
> > > > >
> > > > --+----------------------------------+-------------+----------------------
> > > > --+----------+--------
> > > > >  test1     | test_decoding | logical   |      5 | postgres | f
> > > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > > >          | lost       |               | f
> > > > >   | 2025-02-13 15:26:54.666725+05:30 | t           |
> > > > > wal_level_insufficient | f        | f
> > > > >  test2     | test_decoding | logical   |      5 | postgres | f
> > > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > > >          | reserved   |               | f
> > > > >   | 2025-02-13 15:30:30.477836+05:30 | f           |
> > > > >      | f        | f
> > > > > (2 rows)
> > > > >
> > > > > -- A subsequent attempt to decode changes from the invalidated slot
> > > > > (test2) fails:
> > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > > NULL);
> > > > > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > > > > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > > > > on the primary
> > > > >
> > > > > -- Alternatively, the following error may occur:
> > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > > NULL);
> > > > > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> > > > >     data
> > > > > ------------
> > > > >  BEGIN 744
> > > > >  COMMIT 744
> > > > > (2 rows)
> > > > >
> > > > > This is an edge case that can occur under specific conditions
> > > > > involving replication slot invalidation when there is a huge lag
> > > > > between primary and standby.
> > > > > There might be a similar concurrency case for wal_removed too.
> > > > >
> > > >
> > > > Hi Vignesh,
> > > >
> > > > Thanks for reviewing the patch.
> > >
> > > Thanks for updating the patch. I have a question related to it.
> > >
> > > >
> > > > I have tested the above scenario and was able to reproduce it. I have fixed it in
> > > > the v2 patch.
> > > > Currently we are taking a shared lock on ReplicationSlotControlLock.
> > > > This issue can be resolved if we take an exclusive lock instead.
> > > > Thoughts?
> > >
> > > It's not clear to me why increasing the lock level can solve it, could you
> > > elaborate a bit more on this ?
> > >
> > In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on
> > 'ReplicationSlotControlLock'
> > Also in function 'copy_replication_slot' we take a  SHARED lock on
> > 'ReplicationSlotControlLock' during fetching of source slot.
> >
> > So, for the case described by Vignesh in [1], first
> > InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock
> > on 'ReplicationSlotControlLock'. We are now holding the function using
> > the sleep
> > if (cause == RS_INVAL_WAL_LEVEL)
> >  {
> > while (bsleep)
> > sleep(1);
> > }
> >
> > Now we create a copy of the slot since 'copy_replication_slot'  takes
> > a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock
> > and fetch the info of the source slot (the slot is not invalidated
> > till now). and the function 'copy_replication_slot' calls function
> > 'create_logical_replication_slot' which takes a EXCLUSIVE lock on
> > ReplicationSlotControlLock and hence it will wait for function
> > InvalidateObsoleteReplicationSlot to release lock. Once the function
> > 'InvalidateObsoleteReplicationSlot' releases the lock, the execution
> > of 'create_logical_replication_slot' continues and creates a copy of
> > the source slot.
> >
> > Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE
> > lock on  'ReplicationSlotControlLock'. to fetch the slot info. Hence,
> > it will wait for the 'InvalidateObsoleteReplicationSlot' to release
> > the lock and then fetch the source slot info and try to create the
> > copied slot (which will fail as source slot is invalidated before we
> > fetch its info)
> >
> > > Besides, do we need one more invalidated check in the following codes after
> > > creating the slot ?
> > >
> > >                 /*
> > >                  * Check if the source slot still exists and is valid. We regard it as
> > >                  * invalid if the type of replication slot or name has been changed,
> > >                  * or the restart_lsn either is invalid or has gone backward. (The
> > >                  ...
> > >
> >
> > This approach seems more feasible to me. It also resolves the issue
> > suggested by Vignesh in [1].  I have made changes for the same in v3
> > patch.
> >
>
> I agree to check if the source slot got invalidated during the copy.
> But why do we need to search the slot by the slot name again as
> follows?
>
> +       /* Check if source slot was invalidated while copying of slot */
> +       LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +
> +       for (int i = 0; i < max_replication_slots; i++)
> +       {
> +           ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> +
> +           if (s->in_use && strcmp(NameStr(s->data.name),
> NameStr(*src_name)) == 0)
> +           {
> +               /* Copy the slot contents while holding spinlock */
> +               SpinLockAcquire(&s->mutex);
> +               first_slot_contents = *s;
> +               SpinLockRelease(&s->mutex);
> +               src = s;
> +               break;
> +           }
> +       }
> +
> +       LWLockRelease(ReplicationSlotControlLock);
>
> I think 'src' already points to the source slot.
>
Hi Sawada san,

Thanks for reviewing the patch.
I have used the 'src' instead of iterating again. I have attached the
updated v4 patch.

Thanks and Regards,
Shlok Kyal

Attachment

Re: Restrict copying of invalidated replication slots

From
Masahiko Sawada
Date:
On Fri, Feb 21, 2025 at 4:30 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Fri, 21 Feb 2025 at 01:14, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Monday, February 17, 2025 7:31 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > >
> > > > > On Thu, 13 Feb 2025 at 15:54, vignesh C <vignesh21@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Currently, we can copy an invalidated slot using the function
> > > > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > > > > thread [1], we should prohibit copying of such slots.
> > > > > > >
> > > > > > > I have created a patch to address the issue.
> > > > > >
> > > > > > This patch does not fix all the copy_replication_slot scenarios
> > > > > > completely, there is a very corner concurrency case where an
> > > > > > invalidated slot still gets copied:
> > > > > > +       /* We should not copy invalidated replication slots */
> > > > > > +       if (src_isinvalidated)
> > > > > > +               ereport(ERROR,
> > > > > > +
> > > > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > > > +                                errmsg("cannot copy an invalidated
> > > > > > replication slot")));
> > > > > >
> > > > > > Consider the following scenario:
> > > > > > step 1) Set up streaming replication between the primary and standby nodes.
> > > > > > step 2) Create a logical replication slot (test1) on the standby node.
> > > > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > > > {
> > > > > > while (bsleep)
> > > > > > sleep(1);
> > > > > > }
> > > > > > step 4) Reduce wal_level on the primary to replica and restart the primary
> > > > > node.
> > > > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > > > > 'test2');  -- It will wait till the lock held by
> > > > > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > > > > slot.
> > > > > > step 6) Increase wal_level back to logical on the primary node and
> > > > > > restart the primary.
> > > > > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > > > > held at step 3), the replication control lock will be released and the
> > > > > > invalidated slot will be copied
> > > > > >
> > > > > > After this:
> > > > > > postgres=# SELECT 'copy' FROM
> > > > > > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > > > > > ----------
> > > > > >  copy
> > > > > > (1 row)
> > > > > >
> > > > > > -- The invalidated slot (test1) is copied successfully:
> > > > > > postgres=# select * from pg_replication_slots ;
> > > > > >  slot_name |    plugin     | slot_type | datoid | database | temporary
> > > > > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > > > > e |          inactive_since          | conflicting |
> > > > > > invalidation_reason   | failover | synced
> > > > > >
> > > > > -----------+---------------+-----------+--------+----------+-----------+
> > > > > --------+------------+------+--------------+-------------+---------------
> > > > > ------+------------+---------------+---------
> > > > > >
> > > > > --+----------------------------------+-------------+----------------------
> > > > > --+----------+--------
> > > > > >  test1     | test_decoding | logical   |      5 | postgres | f
> > > > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > > > >          | lost       |               | f
> > > > > >   | 2025-02-13 15:26:54.666725+05:30 | t           |
> > > > > > wal_level_insufficient | f        | f
> > > > > >  test2     | test_decoding | logical   |      5 | postgres | f
> > > > > > | f      |            |      |          745 | 0/4029060   | 0/4029098
> > > > > >          | reserved   |               | f
> > > > > >   | 2025-02-13 15:30:30.477836+05:30 | f           |
> > > > > >      | f        | f
> > > > > > (2 rows)
> > > > > >
> > > > > > -- A subsequent attempt to decode changes from the invalidated slot
> > > > > > (test2) fails:
> > > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > > > NULL);
> > > > > > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > > > > > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > > > > > on the primary
> > > > > >
> > > > > > -- Alternatively, the following error may occur:
> > > > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > > > NULL);
> > > > > > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> > > > > >     data
> > > > > > ------------
> > > > > >  BEGIN 744
> > > > > >  COMMIT 744
> > > > > > (2 rows)
> > > > > >
> > > > > > This is an edge case that can occur under specific conditions
> > > > > > involving replication slot invalidation when there is a huge lag
> > > > > > between primary and standby.
> > > > > > There might be a similar concurrency case for wal_removed too.
> > > > > >
> > > > >
> > > > > Hi Vignesh,
> > > > >
> > > > > Thanks for reviewing the patch.
> > > >
> > > > Thanks for updating the patch. I have a question related to it.
> > > >
> > > > >
> > > > > I have tested the above scenario and was able to reproduce it. I have fixed it in
> > > > > the v2 patch.
> > > > > Currently we are taking a shared lock on ReplicationSlotControlLock.
> > > > > This issue can be resolved if we take an exclusive lock instead.
> > > > > Thoughts?
> > > >
> > > > It's not clear to me why increasing the lock level can solve it, could you
> > > > elaborate a bit more on this ?
> > > >
> > > In HEAD, InvalidateObsoleteReplicationSlots acquires a SHARED lock on
> > > 'ReplicationSlotControlLock'
> > > Also in function 'copy_replication_slot' we take a  SHARED lock on
> > > 'ReplicationSlotControlLock' during fetching of source slot.
> > >
> > > So, for the case described by Vignesh in [1], first
> > > InvalidateObsoleteReplicationSlot is called and we hold a SHARED lock
> > > on 'ReplicationSlotControlLock'. We are now holding the function using
> > > the sleep
> > > if (cause == RS_INVAL_WAL_LEVEL)
> > >  {
> > > while (bsleep)
> > > sleep(1);
> > > }
> > >
> > > Now we create a copy of the slot since 'copy_replication_slot'  takes
> > > a SHARED lock on 'ReplicationSlotControlLock'. It will take the lock
> > > and fetch the info of the source slot (the slot is not invalidated
> > > till now). and the function 'copy_replication_slot' calls function
> > > 'create_logical_replication_slot' which takes a EXCLUSIVE lock on
> > > ReplicationSlotControlLock and hence it will wait for function
> > > InvalidateObsoleteReplicationSlot to release lock. Once the function
> > > 'InvalidateObsoleteReplicationSlot' releases the lock, the execution
> > > of 'create_logical_replication_slot' continues and creates a copy of
> > > the source slot.
> > >
> > > Now with the patch, 'copy_replication_slot' will take an EXCLUSIVE
> > > lock on  'ReplicationSlotControlLock'. to fetch the slot info. Hence,
> > > it will wait for the 'InvalidateObsoleteReplicationSlot' to release
> > > the lock and then fetch the source slot info and try to create the
> > > copied slot (which will fail as source slot is invalidated before we
> > > fetch its info)
> > >
> > > > Besides, do we need one more invalidated check in the following codes after
> > > > creating the slot ?
> > > >
> > > >                 /*
> > > >                  * Check if the source slot still exists and is valid. We regard it as
> > > >                  * invalid if the type of replication slot or name has been changed,
> > > >                  * or the restart_lsn either is invalid or has gone backward. (The
> > > >                  ...
> > > >
> > >
> > > This approach seems more feasible to me. It also resolves the issue
> > > suggested by Vignesh in [1].  I have made changes for the same in v3
> > > patch.
> > >
> >
> > I agree to check if the source slot got invalidated during the copy.
> > But why do we need to search the slot by the slot name again as
> > follows?
> >
> > +       /* Check if source slot was invalidated while copying of slot */
> > +       LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> > +
> > +       for (int i = 0; i < max_replication_slots; i++)
> > +       {
> > +           ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> > +
> > +           if (s->in_use && strcmp(NameStr(s->data.name),
> > NameStr(*src_name)) == 0)
> > +           {
> > +               /* Copy the slot contents while holding spinlock */
> > +               SpinLockAcquire(&s->mutex);
> > +               first_slot_contents = *s;
> > +               SpinLockRelease(&s->mutex);
> > +               src = s;
> > +               break;
> > +           }
> > +       }
> > +
> > +       LWLockRelease(ReplicationSlotControlLock);
> >
> > I think 'src' already points to the source slot.
> >
> Hi Sawada san,
>
> Thanks for reviewing the patch.
> I have used the 'src' instead of iterating again. I have attached the
> updated v4 patch.

Thank you for updating the patch! I have one comment:

+       /* Check if source slot was invalidated while copying of slot */
+       SpinLockAcquire(&src->mutex);
+       first_slot_contents = *src;
+       SpinLockRelease(&src->mutex);

We don't need to copy the source slot contents again since we already
do as follows:

        /* Copy data of source slot again */
        SpinLockAcquire(&src->mutex);
        second_slot_contents = *src;
        SpinLockRelease(&src->mutex);

I think we can use second_slot_contents for that check.

I've investigated the slot invalidation and copying slots behaviors.
We cannot copy a slot if it doesn't reserve WAL, but IIUC the slot's
restart_lsn is not reset by slot invalidation due to other than
RS_INVAL_WAL_REMOVED. Therefore, it's possible that we copy a slot
invalidated by for example RS_INVAL_IDLE_TIMEOUT, and the copied
slot's restart_lsn might have already been removed, which ultimately
causes an assertion failure in ocpy_replication_slot():

#ifdef USE_ASSERT_CHECKING
        /* Check that the restart_lsn is available */
        {
            XLogSegNo   segno;

            XLByteToSeg(copy_restart_lsn, segno, wal_segment_size);
            Assert(XLogGetLastRemovedSegno() < segno);
        }
#endif

I think this issue exists from v16 or later, I've not tested yet
though. If my understanding is right, this patch has to be
backpatched.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com