Re: Add pg_stat_vfdcache view for VFD cache statistics - Mailing list pgsql-hackers
| From | David Geier |
|---|---|
| Subject | Re: Add pg_stat_vfdcache view for VFD cache statistics |
| Date | |
| Msg-id | c85907b2-1e91-47e3-82dc-dafda295ded4@gmail.com Whole thread Raw |
| In response to | Re: Add pg_stat_vfdcache view for VFD cache statistics (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
| List | pgsql-hackers |
Hi! On 23.03.2026 12:22, Jakub Wartak wrote: > On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote: >> >> Hello hackers, >> >> This comes from Tomas's patch idea from his website[1], i thought this patch makes sense to have. >> >> PostgreSQL's virtual file descriptor (VFD) maintains a >> per-backend cache of open file descriptors, bounded by >> max_files_per_process (default 1000). When the cache is full, the >> least-recently-used entry is evicted so its OS fd is closed, so a new >> file can be opened. On the next access to that file, open() must be >> called again, incurring a syscall that a larger cache would have >> avoided. That's one use-case. The other one that I've recently come across is just knowing how many VFD cache entries there are in the first place. While the number of open files is bounded by max_files_per_process, the number of cache entries is unbounded. Large database can easily have hundreds of thousands of files due to our segmentation scheme. Workloads that access a big portion of these files can end up spending very considerable amounts of memory on the VFD cache. For example, with 100,000 VFD entries per backend * 80 bytes per VFD = ~7.6 MiB. With 1000 backends that almost 10 GiB just for VFD entries; assuming that each backend over time accumulates that many files. A production database I looked recently had ~300,000 files and many thousand backends. It spent close to 30 GiBs on VFD cache. I've looked at struct vfd and some simple changes to the struct would already cut memory consumption in half. I can look into that. Thoughts? >> A trivial example is with partitioned tables: a table with 1500 >> partitions requires even more than 1500 file descriptors per full scan (main >> fork, vm ...), which is more than the default limit, causing potential evictions and reopens. >> >> The problem is well-understood and the fix is straightforward: raise >> max_files_per_process. Tomas showed a 4-5x throughput >> improvement in [1] sometimes, on my end i see something less than that, depending on the query itself, but we get theidea. The question is what the kernel makes out of that, especially in aforementioned case where the number of total files and backends is large. In the Linux kernel each process that open some file gets its own struct file. sizeof(struct file) is ~200 bytes. Hence, increasing max_files_per_process can measurably impact memory consumption if changed lightheartedly. We should document that. But I guess in most cases it's rather about changing it from 1k to 2k, rather than changing it from 1k to 100k. >> AFAIK there is currently no way from inside PostgreSQL to know whether fd cache pressure is occurring. >> >> Implementation is trivial, because the VFD cache is strictly per-backend, the counters are also >> per-backend and require no shared memory or locking. Three macros (pgstat_count_vfd_hit/miss/eviction) update fields inPendingVfdCacheStats directly from fd.c. >> >> I find this a bit useful, I would love to hear about anyone's thoughts whether this is useful or not. > > Hi, > > My $0.02, for that for that to being useful it would need to allow viewing > global vfd cache picture (across all backends), not just from *current* backend. > Applicaiton wouldn't call this function anyway, because they would have to be > modified. +1 > In order to get that you technically should collect the hits/misses in local > pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/ > checkpointer) like you do already with PendingVfdCacheStats, but then copy them > to shared memory pgstat area (with some LWLock* protection) that would be > queryable. I would include here the sum of VFD cache entries across all backend and the total VFD cache size. -- David Geier
pgsql-hackers by date: