Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
Date
Msg-id 20200829221450.t7omssadp2i6bbcx@development
Whole thread Raw
In response to Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)  (Stephen Frost <sfrost@snowman.net>)
Responses Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
List pgsql-hackers
On Thu, Aug 27, 2020 at 04:28:54PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Robert Haas (robertmhaas@gmail.com) wrote:
>> On Thu, Aug 27, 2020 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote:
>> > > Hm? At least earlier versions didn't do prefetching for records with an fpw, and only for subsequent records
affectingthe same or if not in s_b anymore.
 
>> >
>> > We don't actually read the page when we're replaying an FPW though..?
>> > If we don't read it, and we entirely write the page from the FPW, how is
>> > pre-fetching helping..?
>>
>> Suppose there is a checkpoint. Then we replay a record with an FPW,
>> pre-fetching nothing. Then the buffer gets evicted from
>> shared_buffers, and maybe the OS cache too. Then, before the next
>> checkpoint, we again replay a record for the same page. At this point,
>> pre-fetching should be helpful.
>
>Sure- but if we're talking about 25GB of WAL, on a server that's got
>32GB, then why would those pages end up getting evicted from memory
>entirely?  Particularly, enough of them to end up with such a huge
>difference in replay time..
>
>I do agree that if we've got more outstanding WAL between checkpoints
>than the system's got memory then that certainly changes things, but
>that wasn't what I understood the case to be here.
>

I don't think it's very clear how much WAL there actually was in each
case - the message only said there was more than 25GB, but who knows how
many checkpoints that covers? In the cases with FPW=on this may easily
be much less than one checkpoint (because with scale 45GB an update to
every page will log 45GB of full-page images). It'd be interesting to
see some stats from pg_waldump etc.

>> Admittedly, I don't quite understand whether that is what is happening
>> in this test case, or why SDD vs. HDD should make any difference. But
>> there doesn't seem to be any reason why it doesn't make sense in
>> theory.
>
>I agree that this could be a reason, but it doesn't seem to quite fit in
>this particular case given the amount of memory and WAL.  I'm suspecting
>that it's something else and I'd very much like to know if it's a
>general "this applies to all (most?  a lot of?) SSDs because the
>hardware has a larger than 8KB page size and therefore the kernel has to
>read it", or if it's something odd about this particular system and
>doesn't apply generally.
>

Not sure. I doubt it has anything to do with the hardware page size,
that's mostly transparent to the kernel anyway. But it might be that the
prefetching on a particular SSD has more overhead than what it saves.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: list of extended statistics on psql
Next
From: Alvaro Herrera
Date:
Subject: Re: list of extended statistics on psql