Home > mailing lists

Re: pg_upgrade: optimize replication slot caught-up check - Mailing list pgsql-hackers

From	Chao Li
Subject	Re: pg_upgrade: optimize replication slot caught-up check
Date	January 23 04:16:17
Msg-id	B098C626-EE7D-4E98-8685-FBB7980C795E@gmail.com Whole thread Raw
In response to	Re: pg_upgrade: optimize replication slot caught-up check (Masahiko Sawada <sawada.mshk@gmail.com>)
List	pgsql-hackers

Tree view


> On Jan 23, 2026, at 04:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 21, 2026 at 7:49 PM shveta malik <shveta.malik@gmail.com> wrote:
>>
>> On Thu, Jan 22, 2026 at 5:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>
>>> On Mon, Jan 19, 2026 at 10:38 PM shveta malik <shveta.malik@gmail.com> wrote:
>>>>
>>>> On Wed, Jan 14, 2026 at 11:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>>
>>>>> I've attached the updated patch.
>>>>>
>>>>
>>>> Thank You for the patch. I like the idea of optimization. Few initial comments:
>>>
>>> Thank you for reviewing the patch!
>>>
>>>>
>>>> 1)
>>>> + * The query returns the slot names and their caught-up status in
>>>> + * the same order as the results collected by
>>>> + * get_old_cluster_logical_slot_infos(). If this query is changed,
>>>>
>>>> I could not find the function get_old_cluster_logical_slot_infos(), do
>>>> you mean get_old_cluster_logical_slot_infos_query()?
>>>
>>> It seems an oversight in commit 6d3d2e8e541f0. I think it should be
>>> get_db_rel_and_slot_infos().
>>>
>>>>
>>>> 2)
>>>> "  WHERE database = current_database() AND "
>>>> "    slot_type = 'logical' AND "
>>>>
>>>> Is there a reason why database = current_database() is placed before
>>>> slot_type = 'logical'? I am not sure how the PostgreSQL optimizer and
>>>> executor will order these predicates, but from the first look,
>>>> slot_type = 'logical' appears cheaper and could be placed first,
>>>> consistent with the ordering used at other places.
>>>
>>> Changed.
>>>
>>>>
>>>> 3)
>>>> Shouldn’t we add a sanity check inside
>>>> get_old_cluster_logical_slot_infos_query() to ensure that when
>>>> skip_caught_up_check is true, we are on PostgreSQL 18 or lower? This
>>>> would make the function safer for future use if it's called elsewhere.
>>>> I understand the caller already performs a similar check, but I think
>>>> it's more appropriate here since we call
>>>> binary_upgrade_logical_slot_has_caught_up() from inside, which doesn’t
>>>> even exist on newer versions.
>>>
>>> What kind of sanity check did you mean? We can have a check with
>>> pg_fatal() but it seems almost the same to me even if pg_upgrade fails
>>> with an error due to missing
>>> binary_upgrade_logical_slot_has_caught_up().
>>
>> I was referring to a development-level sanity check, something like:
>>
>> /* skip_caught_up_check is required iff PG19 or newer */
>> Assert((GET_MAJOR_VERSION(cluster->major_version) >= 1900) ==
>>   skip_caught_up_check);
>>
>> But performing this check requires access to the cluster version (or
>> cluster information), which this function currently does not have.
>> Given that, do you think it would make sense to pass the cluster as an
>> argument to this function in order to perform the sanity check here?
>
> Hmm, I think it's better not to have the same check in multiple
> places, but it might make sense to have
> get_old_cluster_logical_slot_infos_query() decide whether to use the
> fast method. I've updated the patch accordingly, please review it.
>
>>
>>>>
>>>> 4)
>>>> +# Check the file content. While both test_slot1 and test_slot2 should
>>>> be reporting
>>>> +# that they have unconsumed WAL records, test_slot3 should not be reported as
>>>> +# it has caught up.
>>>>
>>>> Can you please elaborate the reason behind test_slot3 not being
>>>> reported? Also mention in the comment if possible.
>>>
>>> We advance test_slot3 to the current WAL LSN before executing
>>> pg_upgrade, so the test_slot3 should have consumed all pending WALs.
>>> Please refer to the following changes:
>>
>> I understand the test, and the comments are clear to me. I also
>> understand that only test_slot3 is expected to be in the caught-up
>> state. My questions were specifically about the following points:
>> 1)  Why do we expect 'slot3 caught-up' not to be mentioned in the LOG?
>> Is it simply because there is no corresponding logging in the code, or
>> is this behavior related to some aspect of your fix that I may have
>> missed?
>>
>> 2) In general, we do not validate the absence of LOG messages in
>> tests. Why is this considered a special case where such a check is
>> appropriate?
>
> What LOG do you refer to? In these tests, we check the
> invalid_logical_slots.txt file where pg_upgrade reports only invalid
> slots (in terms of pg_upgrade). For test_slot3, it should not be
> mentioned in that file as it has caught up. Given that the file has
> only invalid slots, checking the absence of test_slot3 in the file
> makes sense to me.
>
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
> <v7-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch>

I just went through v7, and overall looks good to me.

Only nitpick is:
```
+ *
+ * use_fast_caught_up_check is set to true on return if available in the given
+ * cluster.
```

Strictly speaking, use_fast_caught_up_check is a pointer, it cannot be set, it is *use_fast_caught_up_check that is set
totrue. So, please add a star sign in front of use_fast_caught_up_check. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

pgsql-hackers by date:

From: Chao Li
Date: 23 January, 03:54:33
Subject: Re: tablecmds: reject CLUSTER ON for partitioned tables earlier

From: Andres Freund
Date: 23 January, 04:18:21
Subject: Re: More speedups for tuple deformation

Re: pg_upgrade: optimize replication slot caught-up check - Mailing list pgsql-hackers

Previous

Next