Add pg_stat_vfdcache view for VFD cache statistics - Mailing list pgsql-hackers

From KAZAR Ayoub
Subject Add pg_stat_vfdcache view for VFD cache statistics
Date
Msg-id CA+K2RumP33Cpj--88E+rNADa8fzSBBiav=rvzyaMM=sYNaOkfA@mail.gmail.com
Whole thread
Responses Re: Add pg_stat_vfdcache view for VFD cache statistics
List pgsql-hackers
Hello hackers,

This comes from Tomas's patch idea from his website[1], i thought this patch makes sense to have. 

PostgreSQL's virtual file descriptor (VFD) maintains a
per-backend cache of open file descriptors, bounded by
max_files_per_process (default 1000).  When the cache is full, the
least-recently-used entry is evicted so its OS fd is closed, so a new
file can be opened. On the next access to that file, open() must be
called again, incurring a syscall that a larger cache would have
avoided.
 
A trivial example is with partitioned tables: a table with 1500
partitions requires even more than 1500 file descriptors per full scan (main
fork, vm ...), which is more than the default limit, causing potential evictions and reopens.
 
The problem is well-understood and the fix is straightforward: raise
max_files_per_process. Tomas showed a 4-5x throughput
improvement in [1] sometimes, on my end i see something less than that, depending on the query itself, but we get the idea.
 
AFAIK there is currently no way from inside PostgreSQL to know whether fd cache pressure is occurring.

Implementation is trivial, because the VFD cache is strictly per-backend, the counters are also
per-backend and require no shared memory or locking. Three macros (pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats directly from fd.c.

I find this a bit useful, I would love to hear about anyone's thoughts whether this is useful or not.


Regards,
Ayoub
Attachment

pgsql-hackers by date:

Previous
From: "Greg Burd"
Date:
Subject: Add RISC-V Zbb popcount optimization
Next
From: Amul Sul
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile