On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi <sirichamarthi22@gmail.com> wrote: > > A replication slot can be lost when a subscriber is not able to catch up with the load on the primary and the WAL to catch up exceeds max_slot_wal_keep_size. When this happens, target has to be reseeded (pg_dump) from the scratch and this can take longer. I am investigating the options to revive a lost slot. >
Why in the first place one has to set max_slot_wal_keep_size if they care for WAL more than that?
Disk full is a typical use where we can't wait until the logical slots to catch up before truncating the log.
If you have a case where you want to handle this case for some particular slot (where you are okay with the invalidation of other slots exceeding max_slot_wal_keep_size) then the other possibility could be to have a similar variable at the slot level but not sure if that is a good idea because you haven't presented any such case.
IIUC, ability to fetch WAL from the archive as a fall back mechanism should automatically take care of all the lost slots. Do you see a need to take care of a specific slot? If the idea is not to download the wal files in the pg_wal directory, they can be placed in a slot specific folder (data/pg_replslot/<slot>/) until they are needed while decoding and can be removed.