Home > mailing lists

Re: Add pg_stat_vfdcache view for VFD cache statistics - Mailing list pgsql-hackers

From	David Geier
Subject	Re: Add pg_stat_vfdcache view for VFD cache statistics
Date	March 23 15:35:14
Msg-id	c85907b2-1e91-47e3-82dc-dafda295ded4@gmail.com Whole thread Raw
In response to	Re: Add pg_stat_vfdcache view for VFD cache statistics (Jakub Wartak <jakub.wartak@enterprisedb.com>)
List	pgsql-hackers

Tree view

Hi!

On 23.03.2026 12:22, Jakub Wartak wrote:
> On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>>
>> Hello hackers,
>>
>> This comes from Tomas's patch idea from his website[1], i thought this patch makes sense to have.
>>
>> PostgreSQL's virtual file descriptor (VFD) maintains a
>> per-backend cache of open file descriptors, bounded by
>> max_files_per_process (default 1000).  When the cache is full, the
>> least-recently-used entry is evicted so its OS fd is closed, so a new
>> file can be opened. On the next access to that file, open() must be
>> called again, incurring a syscall that a larger cache would have
>> avoided.

That's one use-case. The other one that I've recently come across is
just knowing how many VFD cache entries there are in the first place.

While the number of open files is bounded by max_files_per_process, the
number of cache entries is unbounded. Large database can easily have
hundreds of thousands of files due to our segmentation scheme.

Workloads that access a big portion of these files can end up spending
very considerable amounts of memory on the VFD cache. For example, with
100,000 VFD entries per backend * 80 bytes per VFD = ~7.6 MiB. With 1000
backends that almost 10 GiB just for VFD entries; assuming that each
backend over time accumulates that many files.

A production database I looked recently had ~300,000 files and many
thousand backends. It spent close to 30 GiBs on VFD cache.

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

>> A trivial example is with partitioned tables: a table with 1500
>> partitions requires even more than 1500 file descriptors per full scan (main
>> fork, vm ...), which is more than the default limit, causing potential evictions and reopens.
>>
>> The problem is well-understood and the fix is straightforward: raise
>> max_files_per_process. Tomas showed a 4-5x throughput
>> improvement in [1] sometimes, on my end i see something less than that, depending on the query itself, but we get
theidea.

The question is what the kernel makes out of that, especially in
aforementioned case where the number of total files and backends is large.

In the Linux kernel each process that open some file gets its own struct
file. sizeof(struct file) is ~200 bytes. Hence, increasing
max_files_per_process can measurably impact memory consumption if
changed lightheartedly. We should document that.

But I guess in most cases it's rather about changing it from 1k to 2k,
rather than changing it from 1k to 100k.

>> AFAIK there is currently no way from inside PostgreSQL to know whether fd cache pressure is occurring.
>>
>> Implementation is trivial, because the VFD cache is strictly per-backend, the counters are also
>> per-backend and require no shared memory or locking. Three macros (pgstat_count_vfd_hit/miss/eviction) update fields
inPendingVfdCacheStats directly from fd.c.

>>
>> I find this a bit useful, I would love to hear about anyone's thoughts whether this is useful or not.
> 
> Hi,
> 
> My $0.02, for that for that to being useful it would need to allow viewing
> global vfd cache picture (across all backends), not just from *current* backend.
> Applicaiton wouldn't call this function anyway, because they would have to be
> modified.

+1

> In order to get that you technically should collect the hits/misses in local
> pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/
> checkpointer) like you do already with PendingVfdCacheStats, but then copy them
> to shared memory pgstat area (with some LWLock* protection) that would be
> queryable.

I would include here the sum of VFD cache entries across all backend and
the total VFD cache size.

--
David Geier

pgsql-hackers by date:

From: Anthonin Bonnefoy
Date: 23 March, 15:07:39
Subject: Re: Propagate XLogFindNextRecord error to callers

From: "Greg Burd"
Date: 23 March, 15:50:02
Subject: Re: Trying out

Re: Add pg_stat_vfdcache view for VFD cache statistics - Mailing list pgsql-hackers

Previous

Next