Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: WIP: WAL prefetch (another approach)
Date
Msg-id 20210317214331.GF20766@tamriel.snowman.net
Whole thread Raw
In response to Re: WIP: WAL prefetch (another approach)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: WIP: WAL prefetch (another approach)
List pgsql-hackers
Greetings,

* Tomas Vondra (tomas.vondra@enterprisedb.com) wrote:
> Right, I was just going to point out the FPIs are not necessary - what
> matters is the presence of long streaks of WAL records touching the same
> set of blocks. But people with workloads where this is common likely
> don't need the WAL prefetching at all - the replica can keep up just
> fine, because it doesn't need to do much I/O anyway (and if it can't
> then prefetching won't help much anyway). So just don't enable the
> prefetching, and there'll be no overhead.

Isn't this exactly the common case though..?  Checkpoints happening
every 5 minutes, the replay of the FPI happens first and then the record
is updated and everything's in SB for the later changes?  You mentioned
elsewhere that this would improve 80% of cases but that doesn't seem to
be backed up by anything and certainly doesn't seem likely to be the
case if we're talking about across all PG deployments.  I also disagree
that asking the kernel to go do random I/O for us, even as a prefetch,
is entirely free simply because we won't actually need those pages.  At
the least, it potentially pushes out pages that we might need shortly
from the filesystem cache, no?

> If it was up to me, I'd just get the patch committed as is. Delaying the
> feature because of concerns that it might have some negative effect in
> some cases, when that can be simply mitigated by disabling the feature,
> is not really beneficial for our users.

I don't know that we actually know how many cases it might have a
negative effect on or what the actual amount of such negative case there
might be- that's really why we should probably try to actually benchmark
it and get real numbers behind it, particularly when the chances of
running into such a negative effect with the default configuration (that
is, FPWs enabled) on the more typical platforms (as in, not ZFS) is more
likely to occur in the field than the cases where FPWs are disabled and
someone's running on ZFS.

Perhaps more to the point, it'd be nice to see how this change actually
improves the caes where PG is running with more-or-less the defaults on
the more commonly deployed filesystems.  If it doesn't then maybe it
shouldn't be the default..?  Surely the folks running on ZFS and running
with FPWs disabled would be able to manage to enable it if they
wished to and we could avoid entirely the question of if this has a
negative impact on the more common cases.

Guess I'm just not a fan of pushing out a change that will impact
everyone by default, in a possibly negative way (or positive, though
that doesn't seem terribly likely, but who knows), without actually
measuring what that impact will look like in those more common cases.
Showing that it's a great win when you're on ZFS or running with FPWs
disabled is good and the expected best case, but we should be
considering the worst case too when it comes to performance
improvements.

Anyhow, ultimately I don't know that there's much more to discuss on
this thread with regard to this particular topic, at least.  As I said
before, if everyone else is on board and not worried about it then so be
it; I feel that at least the concern that I raised has been heard.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] Custom compression methods