Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: WIP: WAL prefetch (another approach)
Date
Msg-id CA+hUKGLtn-309+p6pFpR7AdQ+ypx7Zy=LhF6Bardie_bm-pb8Q@mail.gmail.com
Whole thread Raw
In response to Re: WIP: WAL prefetch (another approach)  (Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>)
Responses Re: WIP: WAL prefetch (another approach)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: WIP: WAL prefetch (another approach)  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> > This is a nice feature if it is safe to turn off full_page_writes.

As other have said/shown, it does also help if a block with FPW is
evicted and then read back in during one checkpoint cycle, in other
words if the working set is larger than shared buffers.

This also provides infrastructure for proposals in the next cycle, as
part of commitfest #3316:
* in direct I/O mode, I/O stalls become more likely due to lack of
kernel prefetching/double-buffering, so prefetching becomes more
essential
* even in buffered I/O mode when benefiting from free
double-buffering, the copy from kernel buffer to user space buffer can
be finished in the background instead of calling pread() when you need
the page, but you need to start it sooner
* adjacent blocks accessed by nearby records can be merged into a
single scatter-read, for example with preadv() in the background
* repeated buffer lookups, pins, locks (and maybe eventually replay)
to the same page can be consolidated

Pie-in-the-sky ideas:
* someone might eventually want to be able to replay in parallel
(hard, but certainly requires lookahead)
* I sure hope we'll eventually use different techniques for torn-page
protection to avoid the high online costs of FPW

> > When is it safe to do that? On which platform?
> >
> > I am not aware of any released software that allows full_page_writes
> > to be safely disabled. Perhaps something has been released recently
> > that allows this? I think we have substantial documentation about
> > safety of other settings, so we should carefully document things here
> > also.
>
> Our WAL reliability docs claim that ZFS is safe against torn pages:
>
> https://www.postgresql.org/docs/current/wal-reliability.html:
>
>     If you have file-system software that prevents partial page writes
>     (e.g., ZFS), you can turn off this page imaging by turning off the
>     full_page_writes parameter.

Unfortunately, posix_fadvise(WILLNEED) doesn't do anything on ZFS
right now :-(.  I have some patches to fix that on Linux[1] and
FreeBSD and it seems like there's a good chance of getting them
committed based on feedback, but it needs some more work on tests and
mmap integration.  If anyone's interested in helping get that landed
faster, please ping me off-list.

[1] https://github.com/openzfs/zfs/pull/9807



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: make MaxBackends available in _PG_init
Next
From: Tom Lane
Date:
Subject: Re: make MaxBackends available in _PG_init