Re: Mention idle_replication_slot_timeout in pg_replication_slots docs - Mailing list pgsql-docs
From | Fujii Masao |
---|---|
Subject | Re: Mention idle_replication_slot_timeout in pg_replication_slots docs |
Date | |
Msg-id | 8147005b-42b5-47d5-84a4-53965568fd6a@oss.nttdata.com Whole thread Raw |
In response to | Re: Mention idle_replication_slot_timeout in pg_replication_slots docs (Nisha Moond <nisha.moond412@gmail.com>) |
Responses |
Re: Mention idle_replication_slot_timeout in pg_replication_slots docs
|
List | pgsql-docs |
On 2025/06/30 20:32, Nisha Moond wrote: > On Fri, Jun 27, 2025 at 5:40 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2025/06/27 15:32, Nisha Moond wrote: >>> On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2025/06/26 15:46, Nisha Moond wrote: >>>>> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size >>>>>> as a condition under which the wal_status column can show unreserved or lost. >>>>>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also >>>>>> cause this behavior when it is set. This has not been documented yet. >>>>>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html >>>>>> >>>>> >>>>> +1 to the doc update. >>>> >>>> Thanks for the review! >>>> >>>> >>>>>> So, how about updating the documentation to also mention >>>>>> idle_replication_slot_timeout as a factor that can cause wal_status to >>>>>> become unreserved or lost? Patch attached. >>>>>> >>>>> >>>>> Since idle_replication_slot_timeout can only cause wal_status to >>>>> become 'lost' and not 'unreserved', perhaps we can reword the sentence >>>>> slightly for clarity, suggestion - >>>>> "The last two states are seen when max_slot_wal_keep_size is >>>>> non-negative and, the 'lost' state may also appear when >>>>> idle_replication_slot_timeout is greater than zero." >>>> >>>> I was thinking that when idle_replication_slot_timeout triggers, >>>> the following functions are called, and that wal_status can become >>>> "unreserved" before ReplicationSlotRelease() runs. It's very short >>>> period, though. Am I wrong? >>>> >>>> ReplicationSlotMarkDirty(); >>>> ReplicationSlotSave(); >>>> ReplicationSlotRelease(); >>>> >>> >>> Thank you for pointing it out. >>> You are correct that while the checkpointer is in the process of >>> invalidating a slot, it sets its PID as the slot’s active_pid. During >>> this short window, if a user queries pg_replication_slot, the >>> underlying function pg_get_replication_slots will compute the >>> wal_status as 'unreserved' for the invalidated slot because the slot >>> has a valid active_pid. >>> >>> That said, it's reasonable to mention in the doc that 'unreserved' may >>> appear when idle_replication_slot_timeout is greater than zero, as >>> this can indeed happen. So, let's retain the current description. >>> >>> However, this behavior isn’t specific to >>> idle_replication_slot_timeout. For example, when a slot is being >>> invalidated due to a different cause "wal_level_insufficient", >>> 'unreserved' may also briefly appear in wal_status. >> >> Yes, and "lost" can appear for various reasons, including wal_level_insufficient, >> so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost" >> status in the note. It would probably be better to remove the mention of "lost" >> from that note. >> > > +1 Is this true starting from v16, when logical replication from standby was introduced? In other words, in v15 and earlier, only max_slot_wal_keep_size could cause the wal_status to become "unreserved" or "lost"? I'm wondering where to back-patch this fix to. >> As for "unreserved", it can also occur for different reasons, but typically, >> it happens when max_slot_wal_keep_size is set to a non-negative value. >> So it might make sense to keep the explanation focused just on "unreserved" >> and max_slot_wal_keep_size. For example: >> >> ---------------------- >> <listitem> >> <para> >> <literal>unreserved</literal> means that the slot no longer >> retains the required WAL files and some of them are to be removed at >> - the next checkpoint. This state can return >> + the next checkpoint. This can occur when >> + <xref linkend="guc-max-slot-wal-keep-size"/> is set to >> + a non-negative value. This state can return >> to <literal>reserved</literal> or <literal>extended</literal>. >> </para> >> </listitem> >> <listitem> >> ---------------------- >> >> What do you think? >> > > The change LGTM, only a minor suggestion to add "typically", as “This > can typically occur when…” to indicate that max_slot_wal_keep_size is > one possible reason, not the only one. OK. >> Also, I noticed the note that says “If <structfield>restart_lsn</structfield> >> is NULL, this field is null” seems inaccurate. For example, when "wal_removed" >> happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove >> that note as well? > > You're right, the statement is not accurate. > We could rephrase it as: "If <structfield>restart_lsn</structfield> is > NULL, this field is either null or lost." But since 'unreserved' can > also appear briefly during invalidation, it might be better to remove > it altogether. I agree with removing the description. Unless I'm missing something, it has been incorrect since at least v13, so we should back-patch this fix to all supported versions. Regards, -- Fujii Masao NTT DATA Japan Corporation
pgsql-docs by date: