Re: Reviving lost replication slots - Mailing list pgsql-hackers

From sirisha chamarthi
Subject Re: Reviving lost replication slots
Date
Msg-id CAKrAKeXaSAW4wgGrZgaons4Z8sBTCy_FCKhvgiB000=FO=gbfw@mail.gmail.com
Whole thread Raw
In response to Re: Reviving lost replication slots  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers


On Tue, Nov 8, 2022 at 1:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi
<sirichamarthi22@gmail.com> wrote:
>
> On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi
>> <sirichamarthi22@gmail.com> wrote:
>> >
>> > A replication slot can be lost when a subscriber is not able to catch up with the load on the primary and the WAL to catch up exceeds max_slot_wal_keep_size. When this happens, target has to be reseeded (pg_dump) from the scratch and this can take longer. I am investigating the options to revive a lost slot.
>> >
>>
>> Why in the first place one has to set max_slot_wal_keep_size if they
>> care for WAL more than that?
>
>  Disk full is a typical use where we can't wait until the logical slots to catch up before truncating the log.
>

Ideally, in such a case the subscriber should fall back to the
physical standby of the publisher but unfortunately, we don't yet have
a functionality where subscribers can continue logical replication
from physical standby. Do you think if we had such functionality it
would serve our purpose?
 
 Don't think  streaming from standby helps as the disk layout is expected to remain the same on physical standby and primary.

 
>> If you have a case where you want to
>> handle this case for some particular slot (where you are okay with the
>> invalidation of other slots exceeding max_slot_wal_keep_size) then the
>> other possibility could be to have a similar variable at the slot
>> level but not sure if that is a good idea because you haven't
>> presented any such case.
>
> IIUC, ability to fetch WAL from the archive as a fall back mechanism should automatically take care of all the lost slots. Do you see a need to take care of a specific slot?
>

No, I was just trying to see if your use case can be addressed in some
other way. BTW, won't copying the WAL again back from archive can lead
to a disk full situation.
The idea is to download the WAL from archive on demand as the slot requires them and throw away the segment once processed.
 

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Locks release order in LogStandbySnapshot
Next
From: sirisha chamarthi
Date:
Subject: Re: Reviving lost replication slots