Re: WAL prefetch - Mailing list pgsql-hackers

From Andres Freund
Subject Re: WAL prefetch
Date
Msg-id 20180616190210.pqz42a5nxhqy7jw6@alap3.anarazel.de
Whole thread Raw
In response to Re: WAL prefetch  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: WAL prefetch
Re: WAL prefetch
List pgsql-hackers
On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
>
>
> On 06/15/2018 08:01 PM, Andres Freund wrote:
> > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
> > >
> > >
> > > On 14.06.2018 09:52, Thomas Munro wrote:
> > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > > > <k.knizhnik@postgrespro.ru> wrote:
> > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > > > references in WAL records
> > > > > using posix_fadvise(WILLNEED) system call.
> > > > Hi Konstantin,
> > > >
> > > > Why stop at the page cache...  what about shared buffers?
> > > >
> > >
> > > It is good question. I thought a lot about prefetching directly to shared
> > > buffers.
> >
> > I think that's definitely how this should work.  I'm pretty strongly
> > opposed to a prefetching implementation that doesn't read into s_b.
> >
>
> Could you elaborate why prefetching into s_b is so much better (I'm sure it
> has advantages, but I suppose prefetching into page cache would be much
> easier to implement).

I think there's a number of issues with just issuing prefetch requests
via fadvise etc:

- it leads to guaranteed double buffering, in a way that's just about
  guaranteed to *never* be useful. Because we'd only prefetch whenever
  there's an upcoming write, there's simply no benefit in the page
  staying in the page cache - we'll write out the whole page back to the
  OS.
- reading from the page cache is far from free - so you add costs to the
  replay process that it doesn't need to do.
- you don't have any sort of completion notification, so you basically
  just have to guess how far ahead you want to read. If you read a bit
  too much you suddenly get into synchronous blocking land.
- The OS page is actually not particularly scalable to large amounts of
  data either. Nor are the decisions what to keep cached likley to be
  particularly useful.
- We imo need to add support for direct IO before long, and adding more
  and more work to reach feature parity strikes meas a bad move.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: GCC 8 warnings
Next
From: Andres Freund
Date:
Subject: Re: WAL prefetch