Re: WAL prefetch - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: WAL prefetch |
Date | |
Msg-id | 20180617000014.dpnevksklxrajufg@alap3.anarazel.de Whole thread Raw |
In response to | Re: WAL prefetch (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: WAL prefetch
|
List | pgsql-hackers |
On 2018-06-16 23:25:34 +0300, Konstantin Knizhnik wrote: > > > On 16.06.2018 22:02, Andres Freund wrote: > > On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: > > > > > > On 06/15/2018 08:01 PM, Andres Freund wrote: > > > > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: > > > > > > > > > > On 14.06.2018 09:52, Thomas Munro wrote: > > > > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik > > > > > > <k.knizhnik@postgrespro.ru> wrote: > > > > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block > > > > > > > references in WAL records > > > > > > > using posix_fadvise(WILLNEED) system call. > > > > > > Hi Konstantin, > > > > > > > > > > > > Why stop at the page cache... what about shared buffers? > > > > > > > > > > > It is good question. I thought a lot about prefetching directly to shared > > > > > buffers. > > > > I think that's definitely how this should work. I'm pretty strongly > > > > opposed to a prefetching implementation that doesn't read into s_b. > > > > > > > Could you elaborate why prefetching into s_b is so much better (I'm sure it > > > has advantages, but I suppose prefetching into page cache would be much > > > easier to implement). > > I think there's a number of issues with just issuing prefetch requests > > via fadvise etc: > > > > - it leads to guaranteed double buffering, in a way that's just about > > guaranteed to *never* be useful. Because we'd only prefetch whenever > > there's an upcoming write, there's simply no benefit in the page > > staying in the page cache - we'll write out the whole page back to the > > OS. > > Sorry, I do not completely understand this. > Prefetch is only needed for partial update of a page - in this case we need > to first read page from the disk Yes. > before been able to perform update. So before "we'll write out the whole > page back to the OS" we have to read this page. > And if page is in OS cached (prefetched) then is can be done much faster. Yes. > Please notice that at the moment of prefetch there is no double > buffering. Sure, but as soon as it's read there is. > As far as page is not accessed before, it is not present in shared buffers. > And once page is updated, there is really no need to keep it in shared > buffers. We can use cyclic buffers (like in case of sequential scan or > bulk update) to prevent throwing away useful pages from shared buffers by > redo process. So once again there will no double buffering. That's a terrible idea. There's a *lot* of spatial locality of further WAL records arriving for the same blocks. > I am not so familiar with current implementation of full page writes > mechanism in Postgres. > So may be my idea explained below is stupid or already implemented (but I > failed to find any traces of this). > Prefetch is needed only for WAL records performing partial update. Full page > write doesn't require prefetch. > Full page write has to be performed when the page is update first time after > checkpoint. > But what if slightly extend this rule and perform full page write also when > distance from previous full page write exceeds some delta > (which somehow related with size of OS cache)? > > In this case even if checkpoint interval is larger than OS cache size, we > still can expect that updated pages are present in OS cache. > And no WAL prefetch is needed at all! We could do so, but I suspect the WAL volume penalty would be prohibitive in many cases. Worthwhile to try though. Greetings, Andres Freund
pgsql-hackers by date: