Home > mailing lists

Re: WAL prefetch - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: WAL prefetch
Date	June 17, 2018 06:00:14
Msg-id	20180617000014.dpnevksklxrajufg@alap3.anarazel.de Whole thread Raw
In response to	Re: WAL prefetch (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses	Re: WAL prefetch
List	pgsql-hackers

Tree view

On 2018-06-16 23:25:34 +0300, Konstantin Knizhnik wrote:
> 
> 
> On 16.06.2018 22:02, Andres Freund wrote:
> > On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
> > > 
> > > On 06/15/2018 08:01 PM, Andres Freund wrote:
> > > > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
> > > > > 
> > > > > On 14.06.2018 09:52, Thomas Munro wrote:
> > > > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > > > > > <k.knizhnik@postgrespro.ru> wrote:
> > > > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > > > > > references in WAL records
> > > > > > > using posix_fadvise(WILLNEED) system call.
> > > > > > Hi Konstantin,
> > > > > > 
> > > > > > Why stop at the page cache...  what about shared buffers?
> > > > > > 
> > > > > It is good question. I thought a lot about prefetching directly to shared
> > > > > buffers.
> > > > I think that's definitely how this should work.  I'm pretty strongly
> > > > opposed to a prefetching implementation that doesn't read into s_b.
> > > > 
> > > Could you elaborate why prefetching into s_b is so much better (I'm sure it
> > > has advantages, but I suppose prefetching into page cache would be much
> > > easier to implement).
> > I think there's a number of issues with just issuing prefetch requests
> > via fadvise etc:
> > 
> > - it leads to guaranteed double buffering, in a way that's just about
> >    guaranteed to *never* be useful. Because we'd only prefetch whenever
> >    there's an upcoming write, there's simply no benefit in the page
> >    staying in the page cache - we'll write out the whole page back to the
> >    OS.
> 
> Sorry, I do not completely understand this.

> Prefetch is only needed for partial update of a page - in this case we need
> to first read page from the disk

Yes.


> before been able to perform update. So before "we'll write out the whole
> page back to the OS" we have to read this page.
> And if page is in OS cached (prefetched) then is can be done much faster.

Yes.


> Please notice that at the moment of prefetch there is no double
> buffering.

Sure, but as soon as it's read there is.


> As far as page is not accessed before, it is not present in shared buffers.
> And once page is updated,  there is really no need to keep it in shared
> buffers.  We can use cyclic buffers (like in case  of sequential scan or
> bulk update) to prevent throwing away useful pages from shared  buffers by
> redo process. So once again there will no double buffering.

That's a terrible idea. There's a *lot* of spatial locality of further
WAL records arriving for the same blocks.


> I am not so familiar with current implementation of full page writes
> mechanism in Postgres.
> So may be my idea explained below is stupid or already implemented (but I
> failed to find any traces of this).
> Prefetch is needed only for WAL records performing partial update. Full page
> write doesn't require prefetch.
> Full page write has to be performed when the page is update first time after
> checkpoint.
> But what if slightly extend this rule and perform full page write also when
> distance from previous full page write exceeds some delta
> (which somehow related with size of OS cache)?
> 
> In this case even if checkpoint interval is larger than OS cache size, we
> still can expect that updated pages are present in OS cache.
> And no WAL prefetch is needed at all!

We could do so, but I suspect the WAL volume penalty would be
prohibitive in many cases. Worthwhile to try though.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Konstantin Knizhnik
Date: 17 June 2018, 02:31:49
Subject: Re: WAL prefetch

From: Andres Freund
Date: 17 June 2018, 06:01:26
Subject: Re: WAL prefetch

Re: WAL prefetch - Mailing list pgsql-hackers

Previous

Next