Re: Multiple full page writes in a single checkpoint? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Multiple full page writes in a single checkpoint?
Date
Msg-id 20210204010019.pyasieusrgi6cead@alap3.anarazel.de
Whole thread Raw
In response to Re: Multiple full page writes in a single checkpoint?  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Multiple full page writes in a single checkpoint?
List pgsql-hackers
Hi,

On 2021-02-03 19:21:25 -0500, Bruce Momjian wrote:
> On Wed, Feb  3, 2021 at 03:29:13PM -0800, Andres Freund wrote:
> > Changing this is *completely* infeasible. In a lot of workloads it'd
> > cause a *massive* explosion of WAL volume. Like quadratically. You'll
> > need to find another way to generate a nonce.
>
> Do we often do multiple writes to the file system of the same page
> during a single checkpoint, particularly only-hint-bit-modified pages?
> I didn't think so.

It can easily happen. Consider ringbuffer using scans (like vacuum,
seqscan) - they'll force the buffer out to disk soon after it's been
dirtied. And often will read the same page again a short bit later. Or
just any workload that's a bit bigger than shared buffers (but data is
in the OS cache).  Subsequent scans will often have new hint bits to
set.


> Is the logical approach here to modify XLogSaveBufferForHint() so if a
> page write is not needed, to create a dummy WAL record that just
> increments the WAL location and updates the page LSN?
> (Is there a small WAL record I should reuse?)

I think an explicit record type would be better. Or a hint record
without an associated FPW.


> I can try to add a hint-bit-page-write page counter, but that might
> overflow, and then we will need a way to change the LSN anyway.

That's just a question of width...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Bruce Momjian
Date:
Subject: Re: Multiple full page writes in a single checkpoint?