On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> During pg_upgrade, we start the server for the old cluster which can
> allow the checkpointer to remove the WAL files. It has been noticed
> that we do generate certain types of WAL records (e.g
> XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
> even during pg_upgrade for old cluster, so additional WAL records
> could let checkpointer decide that certain WAL segments can be removed
> (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
> the slots. Currently, I can't see any problem with this but for future
> work where we want to migrate logical slots during an upgrade[1], we
> need to decide what to do for such cases. The initial idea we had was
> that if the old cluster has some invalid slots, we won't allow an
> upgrade unless the user removes such slots or uses some option like
> --exclude-slots. It is quite possible that slots got invalidated
> during pg_upgrade due to no user activity. Now, even though the
> possibility of the same is less I think it is worth considering what
> should be the behavior.
Right
> The other possibilities apart from not allowing an upgrade in such a
> case could be (a) Before starting the old cluster, we fetch the slots
> directly from the disk using some tool like [2] and make the decisions
> based on that state;
Okay, so IIUC along with dumping the slot data we also need to dump
the latest checkpoint LSN because during upgrade we do check that the
confirmed flush lsn for all the slots should be the same as the latest
checkpoint. Yeah but I think we could work this out.
(b) During the upgrade, we don't allow WAL to be
> removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> as well but for that, we need to expose an API to invalidate the
> slots;
(d) somehow distinguish the slots that are invalidated during
> an upgrade and then simply copy such slots because anyway we ensure
> that all the WAL required by slot is sent before shutdown.
Yeah this could also be an option, although we need to think the
mechanism of distinguishing those slots looks clean and fit well with
other architecture.
Alternatively can't we just ignore all the invalidated slots and do
not migrate them at all. Because such scenarios are very rare that
some of the segments are getting dropped just during the upgrade time
and that too from the old cluster so in such cases not migrating the
slots which are invalidated should be fine no?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com