Re: Mention idle_replication_slot_timeout in pg_replication_slots docs - Mailing list pgsql-docs

From Fujii Masao
Subject Re: Mention idle_replication_slot_timeout in pg_replication_slots docs
Date
Msg-id 8147005b-42b5-47d5-84a4-53965568fd6a@oss.nttdata.com
Whole thread Raw
In response to Re: Mention idle_replication_slot_timeout in pg_replication_slots docs  (Nisha Moond <nisha.moond412@gmail.com>)
Responses Re: Mention idle_replication_slot_timeout in pg_replication_slots docs
List pgsql-docs

On 2025/06/30 20:32, Nisha Moond wrote:
> On Fri, Jun 27, 2025 at 5:40 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2025/06/27 15:32, Nisha Moond wrote:
>>> On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2025/06/26 15:46, Nisha Moond wrote:
>>>>> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
>>>>>> as a condition under which the wal_status column can show unreserved or lost.
>>>>>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
>>>>>> cause this behavior when it is set. This has not been documented yet.
>>>>>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
>>>>>>
>>>>>
>>>>> +1 to the doc update.
>>>>
>>>> Thanks for the review!
>>>>
>>>>
>>>>>> So, how about updating the documentation to also mention
>>>>>> idle_replication_slot_timeout as a factor that can cause wal_status to
>>>>>> become unreserved or lost? Patch attached.
>>>>>>
>>>>>
>>>>> Since idle_replication_slot_timeout can only cause wal_status to
>>>>> become 'lost' and not 'unreserved', perhaps we can reword the sentence
>>>>> slightly for clarity, suggestion -
>>>>> "The last two states are seen when max_slot_wal_keep_size is
>>>>> non-negative and, the 'lost' state may also appear when
>>>>> idle_replication_slot_timeout is greater than zero."
>>>>
>>>> I was thinking that when idle_replication_slot_timeout triggers,
>>>> the following functions are called, and that wal_status can become
>>>> "unreserved" before ReplicationSlotRelease() runs. It's very short
>>>> period, though. Am I wrong?
>>>>
>>>> ReplicationSlotMarkDirty();
>>>> ReplicationSlotSave();
>>>> ReplicationSlotRelease();
>>>>
>>>
>>> Thank you for pointing it out.
>>> You are correct that while the checkpointer is in the process of
>>> invalidating a slot, it sets its PID as the slot’s active_pid. During
>>> this short window, if a user queries pg_replication_slot, the
>>> underlying function pg_get_replication_slots will compute the
>>> wal_status as 'unreserved' for the invalidated slot because the slot
>>> has a valid active_pid.
>>>
>>> That said, it's reasonable to mention in the doc that 'unreserved' may
>>> appear when idle_replication_slot_timeout is greater than zero, as
>>> this can indeed happen. So, let's retain the current description.
>>>
>>> However, this behavior isn’t specific to
>>> idle_replication_slot_timeout. For example, when a slot is being
>>> invalidated due to a different cause "wal_level_insufficient",
>>> 'unreserved' may also briefly appear in wal_status.
>>
>> Yes, and "lost" can appear for various reasons, including wal_level_insufficient,
>> so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
>> status in the note. It would probably be better to remove the mention of "lost"
>> from that note.
>>
>
> +1

Is this true starting from v16, when logical replication from standby was introduced?
In other words, in v15 and earlier, only max_slot_wal_keep_size could cause
the wal_status to become "unreserved" or "lost"? I'm wondering where to back-patch
this fix to.


>> As for "unreserved", it can also occur for different reasons, but typically,
>> it happens when max_slot_wal_keep_size is set to a non-negative value.
>> So it might make sense to keep the explanation focused just on "unreserved"
>> and max_slot_wal_keep_size. For example:
>>
>> ----------------------
>>            <listitem>
>>             <para>
>>              <literal>unreserved</literal> means that the slot no longer
>>              retains the required WAL files and some of them are to be removed at
>> -          the next checkpoint.  This state can return
>> +          the next checkpoint.  This can occur when
>> +          <xref linkend="guc-max-slot-wal-keep-size"/> is set to
>> +          a non-negative value.  This state can return
>>              to <literal>reserved</literal> or <literal>extended</literal>.
>>             </para>
>>            </listitem>
>>            <listitem>
>> ----------------------
>>
>> What do you think?
>>
>
> The change LGTM, only a minor suggestion to add "typically", as “This
> can typically occur when…” to indicate that max_slot_wal_keep_size is
> one possible reason, not the only one.

OK.


>> Also, I noticed the note that says “If <structfield>restart_lsn</structfield>
>> is NULL, this field is null” seems inaccurate. For example, when "wal_removed"
>> happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove
>> that note as well?
>
> You're right, the statement is not accurate.
> We could rephrase it as: "If <structfield>restart_lsn</structfield> is
> NULL, this field is either null or lost." But since 'unreserved' can
> also appear briefly during invalidation, it might be better to remove
> it altogether.
I agree with removing the description. Unless I'm missing something,
it has been incorrect since at least v13, so we should back-patch this fix
to all supported versions.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




pgsql-docs by date:

Previous
From: Nisha Moond
Date:
Subject: Re: Mention idle_replication_slot_timeout in pg_replication_slots docs
Next
From: Nisha Moond
Date:
Subject: Re: Mention idle_replication_slot_timeout in pg_replication_slots docs