Home > mailing lists

Re: Don't keep closed WAL segment in page cache after replay - Mailing list pgsql-hackers

From	Japin Li
Subject	Re: Don't keep closed WAL segment in page cache after replay
Date	March 4 04:40:30
Msg-id	SY7PR01MB1092120F55B852C32BB22C1FAB67CA@SY7PR01MB10921.ausprd01.prod.outlook.com Whole thread Raw
In response to	Don't keep closed WAL segment in page cache after replay (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Responses	Re: Don't keep closed WAL segment in page cache after replay
List	pgsql-hackers

Tree view

Hi, Anthonin

Date: Wed, 04 Mar 2026 09:37:27 +0800

On Tue, 03 Mar 2026 at 15:13, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote:
> On Mon, Mar 2, 2026 at 9:15 AM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
>> The idea looks good and efficient albeit I have some feedback. The first one is about logical replication slot.
>>
>> Inside the patch, it checks if there is an active walsender
>> process. Is it possible to create a replication slot and wait until
>> a subscriber will connect it. During this time due to patch
>> PostgreSQL will close the WAL segments on the memory and once the
>> subscriber connects it has to read the WAL files from disk. But it's
>> a trade-off and can be decided by others too.
>
> Good point. This is also the case for physical replication slots, if
> there's at least one used replication slot, then it's very likely a
> walsender will start at some point and would need to read the WAL. And
> having to read a large amount of WAL files from disk is likely not
> desirable, so it's probably better to add this as a condition.
>
>> Secondly, when it comes to using spinlock to check the running
>> walsender processes it can lead inefficient recovery
>> process. Because assuming that a database with max_wal_sender set to
>> 20+ and producing more than 4-5TB WAL in a day it can cause
>> additional +100-200 spinlocks each second on walreceiver.  Put
>> simply, WalSndRunning() scans every walsender slot with spinlocks on
>> every segment switch, contending with all active walsenders updating
>> their own slots. On high-throughput standbys this creates
>> unnecessary cross-process spinlock contention in the recovery hot
>> path — the
>> exact path that should be as lean as possible for fast replay. Maybe you can implement a single pg_atomic_uint32
counterin WalSndCtlData and achieve the same result with zero contention. 
>
> True. I think I've assumed that a segment closing is rare enough that
> it would be fine to go through all walsenders, but there are certainly
> clusters that can generate a large amount of segments.
>
> I've updated the patch with the suggested approach:
> - a new atomic counter tracking the number of active walsenders
> - a similar atomic counter for the number of used replication slots.
> Otherwise, we would have a similar issue of going through all
> max_replication_slots to check if one is used.
> - Both are now used as condition to send POSIX_FADV_DONTNEED or not
>
> Regards,
> Anthonin Bonnefoy

I noticed two different comment styles in the v4 patch:

1.
+/*
+ * Return the number of used replication slots
+ */
+int
+ReplicationSlotNumUsed(void)

2.
+/*
+ * Returns the number of active walsender processes
+ */
+int
+WalSndNumActive(void)

Both "Return ..." and "Returns ..." styles exist in the PostgreSQL codebase,
but within the same patch, would it be better to use a consistent style?

I'd like to use the imperative/singular form.  What do you think?

>
> [2. text/x-diff; v4-0001-Don-t-keep-closed-WAL-segments-in-page-cache-afte.patch]...

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

pgsql-hackers by date:

From: Chao Li
Date: 04 March, 04:39:07
Subject: Re: Use allocation macros in the logical replication code

From: Japin Li
Date: 04 March, 04:59:20
Subject: Re: Exit walsender before confirming remote flush in logical replication

Re: Don't keep closed WAL segment in page cache after replay - Mailing list pgsql-hackers

Previous

Next