Thread: linux cachestat in file Readv and Prefetch
Hi, I was testing the index prefetch and streamIO patches and I added cachestat() syscall to get a better view of the prefetching. It's a new linux syscall, it requires 6.5, it provides numerous interesting information from the VM for the range of pages examined. It's way way faster than the old mincore() and provides much more valuable information: uint64 nr_cache; /* Number of cached pages */ uint64 nr_dirty; /* Number of dirty pages */ uint64 nr_writeback; /* Number of pages marked for writeback. */ uint64 nr_evicted; /* Number of pages evicted from the cache. */ /* * Number of recently evicted pages. A page is recently evicted if its * last eviction was recent enough that its reentry to the cache would * indicate that it is actively being used by the system, and that there * is memory pressure on the system. */ uint64 nr_recently_evicted; While here I also added some quick tweaks to suspend prefetching on memory pressure. It's working but I have absolutely not checked the performance impact of my additions. Sharing here for others to tests and adding in CF in case there is interest to go further in this direction. --- Cédric Villemain +33 (0)6 20 30 22 52 https://Data-Bene.io PostgreSQL Expertise, Support, Training, R&D -- --- Cédric Villemain +33 (0)6 20 30 22 52 https://Data-Bene.io PostgreSQL Expertise, Support, Training, R&D
Attachment
On 1/18/24 01:25, Cedric Villemain wrote: > Hi, > > I was testing the index prefetch and streamIO patches and I added > cachestat() syscall to get a better view of the prefetching. > > It's a new linux syscall, it requires 6.5, it provides numerous > interesting information from the VM for the range of pages examined. > It's way way faster than the old mincore() and provides much more > valuable information: > > uint64 nr_cache; /* Number of cached pages */ > uint64 nr_dirty; /* Number of dirty pages */ > uint64 nr_writeback; /* Number of pages marked for writeback. */ > uint64 nr_evicted; /* Number of pages evicted from the cache. */ > /* > * Number of recently evicted pages. A page is recently evicted if its > * last eviction was recent enough that its reentry to the cache would > * indicate that it is actively being used by the system, and that there > * is memory pressure on the system. > */ > uint64 nr_recently_evicted; > > > While here I also added some quick tweaks to suspend prefetching on > memory pressure. I may be missing some important bit behind this idea, but this does not seem like a great idea to me. The comment added to FilePrefetch says this: /* * last time we visit this file (somewhere), nr_recently_evicted pages * of the range were just removed from vm cache, it's a sign a memory * pressure. so do not prefetch further. * it is hard to guess if it is always the right choice in absence of * more information like: * - prefetching distance expected overall * - access pattern/backend maybe */ Firstly, is this even a good way to detect memory pressure? It's clearly limited to a single 1GB segment, so what's the chance we'll even see the "pressure" on a big database with many files? If we close/reopen the file (which on large databases we tend to do very often) how does that affect the data reported for the file descriptor? I'm not sure I even agree with the idea that we should stop prefetching when there is memory pressure. IMHO it's perfectly fine to keep prefetching stuff even if it triggers eviction of unnecessary pages from page cache. That's kinda why the eviction exists. > It's working but I have absolutely not checked the performance impact of > my additions. > Well ... I'd argue at least some basic evaluation of performance is a rather important / expected part of a proposal for a patch that aims to improve a performance-focused feature. It's impossible to have any sort of discussion about such patch without that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 17, 2024 at 6:10 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I may be missing some important bit behind this idea, but this does not > seem like a great idea to me. The comment added to FilePrefetch says this: > > /* > * last time we visit this file (somewhere), nr_recently_evicted pages > * of the range were just removed from vm cache, it's a sign a memory > * pressure. so do not prefetch further. > * it is hard to guess if it is always the right choice in absence of > * more information like: > * - prefetching distance expected overall > * - access pattern/backend maybe > */ > > Firstly, is this even a good way to detect memory pressure? It's clearly > limited to a single 1GB segment, so what's the chance we'll even see the > "pressure" on a big database with many files? > > If we close/reopen the file (which on large databases we tend to do very > often) how does that affect the data reported for the file descriptor? > > I'm not sure I even agree with the idea that we should stop prefetching > when there is memory pressure. IMHO it's perfectly fine to keep > prefetching stuff even if it triggers eviction of unnecessary pages from > page cache. That's kinda why the eviction exists. I agree with all of these criticisms. I think it's the job of pg_prewarm to do what the user requests, not to second-guess whether the user requested the right thing. One of the things that frustrates people about the ring-buffer system is that it's hard to get all of your data cached in shared_buffers by just reading it, e.g. SELECT * FROM my_table. If pg_prewarm also isn't guaranteed to actually read your data, and may decide that your data didn't need to be read after all, then what exactly is a user supposed to do if they're absolutely sure that they know better than PostgreSQL itself and want to guarantee that their data actually does get read? So I think a feature like this would at the very least need to be optional, but it's unclear to me why we'd want it at all, and I feel like Cedric's email doesn't really answer that question. I suppose that if you could detect useless prefetching and skip it, you'd save a bit of work, but under what circumstances does anyone use pg_prewarm so aggressively as to make that a problem, and why wouldn't the solution be for the user to just calm down a little bit? There shouldn't be any particular reason why the user can't know both the size of shared_buffers and the approximate size of the OS cache; indeed, they can probably know the latter much more reliably than PostgreSQL itself can. So it should be fairly easy to avoid just prefetching more data than will fit, and then you don't have to worry about any of this. And you'll probably get a better result, too, because, along the same lines as Tomas's remarks above, I doubt that this would be an accurate method anyway. > Well ... I'd argue at least some basic evaluation of performance is a > rather important / expected part of a proposal for a patch that aims to > improve a performance-focused feature. It's impossible to have any sort > of discussion about such patch without that. Right. I'm going to mark this patch as Rejected in the CommitFest application for now. If in subsequent discussion that comes to seem like the wrong result, then we can revise accordingly, but right now it looks extremely unlikely to me that this is something that we'd want. -- Robert Haas EDB: http://www.enterprisedb.com