Re: pg_upgrade: optimize replication slot caught-up check - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: pg_upgrade: optimize replication slot caught-up check
Date
Msg-id CAA4eK1JtxEGkgonX+9y9OjER54366iJMq78VUScmvtB+JR7boQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_upgrade: optimize replication slot caught-up check  (shveta malik <shveta.malik@gmail.com>)
Responses Re: pg_upgrade: optimize replication slot caught-up check
Re: pg_upgrade: optimize replication slot caught-up check
List pgsql-hackers
On Fri, Jan 30, 2026 at 9:45 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Jan 30, 2026 at 2:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jan 28, 2026 at 10:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jan 28, 2026 at 2:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I missed fixing one place. Attached the new version.
> > > >
> > >
> > > One question/comment on following change:
> > > + bool use_fast_caught_up_check;
> > > +
> > > + logical_slot_infos_query = get_old_cluster_logical_slot_infos_query(cluster,
> > > + &use_fast_caught_up_check);
> > > +
> > >   upgrade_task_add_step(task,
> > >     logical_slot_infos_query,
> > >     process_old_cluster_logical_slot_infos,
> > >     true, NULL);
> > > +
> > > + /*
> > > + * Check whether slots have consumed all WAL records efficiently by
> > > + * using another query, if not during a live_check.
> > > + */
> > > + if (use_fast_caught_up_check && !user_opts.live_check)
> > > + {
> > >
> > > Won't this lead to two steps to set caught_up for slots in PG19 and
> > > following versions? If so, is it possible to use just one step even
> > > for PG19 and following versions?
> >
> > Yes, it seems like a good simplification. I've updated the patch accordingly.
> >
>
> At first glance it looks like a simplification, but on closer look, it
> actually makes the code harder to follow and more prone to errors if
> someone modifies it in the future.
>

I think that is primarily because of the way code is arranged by the
patch. I think it would be better to construct a complete query
separately for fast and non-fast checks. There will be some repeated
parts but the chances of mistakes will be less and it would be easier
to follow.

One minor point:
* Fetch the logical replication slot information. The check whether the
- * slot is considered caught up is done by an upgrade function. This
- * regards the slot as caught up if we don't find any decodable changes.
- * See binary_upgrade_logical_slot_has_caught_up().
+ * slot is considered caught up is done by an upgrade function, unless the
+ * fast check is available on the cluster.

Isn't the caught up check done by an upgrade function both for fast
and non-fast cases? If so, this comment needs to be improved to make
it clear.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: logical apply worker's lock waits in subscriber can stall checkpointer in publisher
Next
From: Michael Paquier
Date:
Subject: Re: AIX support