Home > mailing lists

Re: Don't keep closed WAL segment in page cache after replay - Mailing list pgsql-hackers

From	Anthonin Bonnefoy
Subject	Re: Don't keep closed WAL segment in page cache after replay
Date	March 3 17:13:37
Msg-id	CAO6_Xqp0s3aVyAhx3K_dsdPBqe4_OoD3Wcd__-CgEcvN+HqPxQ@mail.gmail.com Whole thread
In response to	Re: Don't keep closed WAL segment in page cache after replay (Hüseyin Demir <huseyin.d3r@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Mar 2, 2026 at 9:15 AM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> The idea looks good and efficient albeit I have some feedback. The first one is about logical replication slot.
>
> Inside the patch, it checks if there is an active walsender process. Is it possible to create a replication slot and
waituntil a subscriber will connect it. During this time due to patch PostgreSQL will close the WAL segments on the
memoryand once the subscriber connects it has to read the WAL files from disk. But it's a trade-off and can be decided
byothers too. 

Good point. This is also the case for physical replication slots, if
there's at least one used replication slot, then it's very likely a
walsender will start at some point and would need to read the WAL. And
having to read a large amount of WAL files from disk is likely not
desirable, so it's probably better to add this as a condition.

> Secondly, when it comes to using spinlock to check the running walsender processes it can lead inefficient recovery
process.Because assuming that a database with max_wal_sender set to 20+ and producing more than 4-5TB WAL in a day it
cancause additional +100-200 spinlocks each second on walreceiver.  Put simply, WalSndRunning() scans every walsender
slotwith spinlocks on every segment switch, contending with all active walsenders updating their own slots. On
high-throughputstandbys this creates unnecessary cross-process spinlock contention in the recovery hot path — the 
> exact path that should be as lean as possible for fast replay. Maybe you can implement a single pg_atomic_uint32
counterin WalSndCtlData and achieve the same result with zero contention. 

True. I think I've assumed that a segment closing is rare enough that
it would be fine to go through all walsenders, but there are certainly
clusters that can generate a large amount of segments.

I've updated the patch with the suggested approach:
- a new atomic counter tracking the number of active walsenders
- a similar atomic counter for the number of used replication slots.
Otherwise, we would have a similar issue of going through all
max_replication_slots to check if one is used.
- Both are now used as condition to send POSIX_FADV_DONTNEED or not

Regards,
Anthonin Bonnefoy

Attachment

v4-0001-Don-t-keep-closed-WAL-segments-in-page-cache-afte.patch

pgsql-hackers by date:

From: Pavel Stehule
Date: 03 March, 16:45:07
Subject: Re: POC: PLpgSQL FOREACH IN JSON ARRAY

From: "Joel Jacobson"
Date: 03 March, 18:05:29
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

Re: Don't keep closed WAL segment in page cache after replay - Mailing list pgsql-hackers

Attachment

Previous

Next