Re: linux cachestat in file Readv and Prefetch - Mailing list pgsql-hackers

From Robert Haas
Subject Re: linux cachestat in file Readv and Prefetch
Date
Msg-id CA+TgmoZu0ZY7CXxSsgBydYO6HWcG64OEg=L_43i7shHrihQC3g@mail.gmail.com
Whole thread Raw
In response to Re: linux cachestat in file Readv and Prefetch  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Sat, Feb 17, 2024 at 6:10 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I may be missing some important bit behind this idea, but this does not
> seem like a great idea to me. The comment added to FilePrefetch says this:
>
>   /*
>    * last time we visit this file (somewhere), nr_recently_evicted pages
>    * of the range were just removed from vm cache, it's a sign a memory
>    * pressure. so do not prefetch further.
>    * it is hard to guess if it is always the right choice in absence of
>    * more information like:
>    *  - prefetching distance expected overall
>    *  - access pattern/backend maybe
>    */
>
> Firstly, is this even a good way to detect memory pressure? It's clearly
> limited to a single 1GB segment, so what's the chance we'll even see the
> "pressure" on a big database with many files?
>
> If we close/reopen the file (which on large databases we tend to do very
> often) how does that affect the data reported for the file descriptor?
>
> I'm not sure I even agree with the idea that we should stop prefetching
> when there is memory pressure. IMHO it's perfectly fine to keep
> prefetching stuff even if it triggers eviction of unnecessary pages from
> page cache. That's kinda why the eviction exists.

I agree with all of these criticisms. I think it's the job of
pg_prewarm to do what the user requests, not to second-guess whether
the user requested the right thing. One of the things that frustrates
people about the ring-buffer system is that it's hard to get all of
your data cached in shared_buffers by just reading it, e.g. SELECT *
FROM my_table. If pg_prewarm also isn't guaranteed to actually read
your data, and may decide that your data didn't need to be read after
all, then what exactly is a user supposed to do if they're absolutely
sure that they know better than PostgreSQL itself and want to
guarantee that their data actually does get read?

So I think a feature like this would at the very least need to be
optional, but it's unclear to me why we'd want it at all, and I feel
like Cedric's email doesn't really answer that question. I suppose
that if you could detect useless prefetching and skip it, you'd save a
bit of work, but under what circumstances does anyone use pg_prewarm
so aggressively as to make that a problem, and why wouldn't the
solution be for the user to just calm down a little bit? There
shouldn't be any particular reason why the user can't know both the
size of shared_buffers and the approximate size of the OS cache;
indeed, they can probably know the latter much more reliably than
PostgreSQL itself can. So it should be fairly easy to avoid just
prefetching more data than will fit, and then you don't have to worry
about any of this. And you'll probably get a better result, too,
because, along the same lines as Tomas's remarks above, I doubt that
this would be an accurate method anyway.

> Well ... I'd argue at least some basic evaluation of performance is a
> rather important / expected part of a proposal for a patch that aims to
> improve a performance-focused feature. It's impossible to have any sort
> of discussion about such patch without that.

Right.

I'm going to mark this patch as Rejected in the CommitFest application
for now. If in subsequent discussion that comes to seem like the wrong
result, then we can revise accordingly, but right now it looks
extremely unlikely to me that this is something that we'd want.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Recent 027_streaming_regress.pl hangs
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: UUID v7