Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach) |
Date | |
Msg-id | 20200829221450.t7omssadp2i6bbcx@development Whole thread Raw |
In response to | Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach) (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: [EXTERNAL] Re: WIP: WAL prefetch (another approach)
|
List | pgsql-hackers |
On Thu, Aug 27, 2020 at 04:28:54PM -0400, Stephen Frost wrote: >Greetings, > >* Robert Haas (robertmhaas@gmail.com) wrote: >> On Thu, Aug 27, 2020 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote: >> > > Hm? At least earlier versions didn't do prefetching for records with an fpw, and only for subsequent records affectingthe same or if not in s_b anymore. >> > >> > We don't actually read the page when we're replaying an FPW though..? >> > If we don't read it, and we entirely write the page from the FPW, how is >> > pre-fetching helping..? >> >> Suppose there is a checkpoint. Then we replay a record with an FPW, >> pre-fetching nothing. Then the buffer gets evicted from >> shared_buffers, and maybe the OS cache too. Then, before the next >> checkpoint, we again replay a record for the same page. At this point, >> pre-fetching should be helpful. > >Sure- but if we're talking about 25GB of WAL, on a server that's got >32GB, then why would those pages end up getting evicted from memory >entirely? Particularly, enough of them to end up with such a huge >difference in replay time.. > >I do agree that if we've got more outstanding WAL between checkpoints >than the system's got memory then that certainly changes things, but >that wasn't what I understood the case to be here. > I don't think it's very clear how much WAL there actually was in each case - the message only said there was more than 25GB, but who knows how many checkpoints that covers? In the cases with FPW=on this may easily be much less than one checkpoint (because with scale 45GB an update to every page will log 45GB of full-page images). It'd be interesting to see some stats from pg_waldump etc. >> Admittedly, I don't quite understand whether that is what is happening >> in this test case, or why SDD vs. HDD should make any difference. But >> there doesn't seem to be any reason why it doesn't make sense in >> theory. > >I agree that this could be a reason, but it doesn't seem to quite fit in >this particular case given the amount of memory and WAL. I'm suspecting >that it's something else and I'd very much like to know if it's a >general "this applies to all (most? a lot of?) SSDs because the >hardware has a larger than 8KB page size and therefore the kernel has to >read it", or if it's something odd about this particular system and >doesn't apply generally. > Not sure. I doubt it has anything to do with the hardware page size, that's mostly transparent to the kernel anyway. But it might be that the prefetching on a particular SSD has more overhead than what it saves. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: