Re: Add pg_stat_vfdcache view for VFD cache statistics - Mailing list pgsql-hackers
| From | KAZAR Ayoub |
|---|---|
| Subject | Re: Add pg_stat_vfdcache view for VFD cache statistics |
| Date | |
| Msg-id | CA+K2Ru=WXpG1Sfx=_DQVn7ENZz6C_j-HxVJ1TmkEKowLVQcYwg@mail.gmail.com Whole thread |
| In response to | Re: Add pg_stat_vfdcache view for VFD cache statistics (David Geier <geidav.pg@gmail.com>) |
| Responses |
Re: Add pg_stat_vfdcache view for VFD cache statistics
|
| List | pgsql-hackers |
Hello David and Jakub,
On Mon, Mar 23, 2026 at 1:35 PM David Geier <geidav.pg@gmail.com> wrote:
Hi!
On 23.03.2026 12:22, Jakub Wartak wrote:
> On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>>
>> Hello hackers,
>>
>> This comes from Tomas's patch idea from his website[1], i thought this patch makes sense to have.
>>
>> PostgreSQL's virtual file descriptor (VFD) maintains a
>> per-backend cache of open file descriptors, bounded by
>> max_files_per_process (default 1000). When the cache is full, the
>> least-recently-used entry is evicted so its OS fd is closed, so a new
>> file can be opened. On the next access to that file, open() must be
>> called again, incurring a syscall that a larger cache would have
>> avoided.
That's one use-case. The other one that I've recently come across is
just knowing how many VFD cache entries there are in the first place.
While the number of open files is bounded by max_files_per_process, the
number of cache entries is unbounded. Large database can easily have
hundreds of thousands of files due to our segmentation scheme.
Workloads that access a big portion of these files can end up spending
very considerable amounts of memory on the VFD cache. For example, with
100,000 VFD entries per backend * 80 bytes per VFD = ~7.6 MiB. With 1000
backends that almost 10 GiB just for VFD entries; assuming that each
backend over time accumulates that many files.
A production database I looked recently had ~300,000 files and many
thousand backends. It spent close to 30 GiBs on VFD cache.
I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.
Thoughts?
Looking forward to this.
What also bothers me in that space is if a backend allocates 100K entries in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until the backend dies, although this is useful as entries are reused if free instead of
allocating entries, whether a spike in files openings effects a long living backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common though, what do you think about this issue from your experience ?
In the Linux kernel each process that open some file gets its own struct
file. sizeof(struct file) is ~200 bytes. Hence, increasing
max_files_per_process can measurably impact memory consumption if
changed lightheartedly. We should document that.
But I guess in most cases it's rather about changing it from 1k to 2k,
rather than changing it from 1k to 100k.
Indeed, I agree with this.
>> AFAIK there is currently no way from inside PostgreSQL to know whether fd cache pressure is occurring.
>>
>> Implementation is trivial, because the VFD cache is strictly per-backend, the counters are also
>> per-backend and require no shared memory or locking. Three macros (pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats directly from fd.c.
>>
>> I find this a bit useful, I would love to hear about anyone's thoughts whether this is useful or not.
>
> Hi,
>
> My $0.02, for that for that to being useful it would need to allow viewing
> global vfd cache picture (across all backends), not just from *current* backend.
> Applicaiton wouldn't call this function anyway, because they would have to be
> modified.
+1
Would it be reasonable to have both ? I changed the way i was thinking about it,
Its clear one would want global vfd cache behavior monitoring as its configuration is global but its effect is mostly backend specific,
That leads me to think it could also be useful to ALSO maintain per-backend metrics, to help identify which backends are going crazy with the VFD cache.
> In order to get that you technically should collect the hits/misses in local
> pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/
> checkpointer) like you do already with PendingVfdCacheStats, but then copy them
> to shared memory pgstat area (with some LWLock* protection) that would be
> queryable.
I would include here the sum of VFD cache entries across all backend and
the total VFD cache size.
I'll be doing this soon.
--
David Geier
Regards,
Ayoub Kazar
pgsql-hackers by date: