On 2/11/25 21:18, Tom Lane wrote:
> Tomas Vondra <tomas@vondra.me> writes:
>> I did run into bottlenecks due to "too few file descriptors" during a
>> recent experiments with partitioning, which made it pretty trivial to
>> get into a situation when we start trashing the VfdCache. I have a
>> half-written draft of a blog post about that somewhere.
>
>> But my conclusion was that it's damn difficult to even realize that's
>> happening, especially if you don't have access to the OS / perf, etc.
>
> Yeah. fd.c does its level best to keep going even with only a few FDs
> available, and it's hard to tell that you have a performance problem
> arising from that. (Although I recall old war stories about Postgres
> continuing to chug along just fine after it'd run the kernel out of
> FDs, although every other service on the system was crashing left and
> right, making it difficult e.g. even to log in. That scenario is why
> I'm resistant to pushing our allowed number of FDs to the moon...)
>
>> So
>> my takeaway was we should improve that first, so that people have a
>> chance to realize they have this issue, and can do the tuning. The
>> improvements I thought about were:
>
>> - track hits/misses for the VfdCache (and add a system view for that)
>
> I think what we actually would like to know is how often we have to
> close an open FD in order to make room to open a different file.
> Maybe that's the same thing you mean by "cache miss", but it doesn't
> seem like quite the right terminology. Anyway, +1 for adding some way
> to discover how often that's happening.
>
We can count the evictions (i.e. closing a file so that we can open a
new one) too, but AFAICS that's about the same as counting "misses"
(opening a file after not finding it in the cache). After the cache
warms up, those counts should be about the same, I think.
Or am I missing something?
>> - maybe have wait event for opening/closing file descriptors
>
> Not clear that that helps, at least for this specific issue.
>
I don't think Jelte described any specific issue, but the symptoms I've
observed were that a query was accessing a table with ~1000 relations
(partitions + indexes), trashing the vfd cache, getting ~0% cache hits.
And the open/close calls were taking a lot of time (~25% CPU time).
That'd be very visible as a wait event, I believe.
>> - show max_safe_fds value somewhere, not just max_files_per_process
>> (which we may silently override and use a lower value)
>
> Maybe we should just assign max_safe_fds back to max_files_per_process
> after running set_max_safe_fds? The existence of two variables is a
> bit confusing anyhow. I vaguely recall that we had a reason for
> keeping them separate, but I can't think of the reasoning now.
>
That might work. I don't know what were the reasons for not doing that,
I suppose there were reasons not to do that.
regards
--
Tomas Vondra