Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup
Date
Msg-id a4c0388f-02f8-4e5a-9638-616aabf3f9e3@vondra.me
Whole thread Raw
In response to Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup
List pgsql-hackers

On 2/11/25 21:18, Tom Lane wrote:
> Tomas Vondra <tomas@vondra.me> writes:
>> I did run into bottlenecks due to "too few file descriptors" during a
>> recent experiments with partitioning, which made it pretty trivial to
>> get into a situation when we start trashing the VfdCache. I have a
>> half-written draft of a blog post about that somewhere.
> 
>> But my conclusion was that it's damn difficult to even realize that's
>> happening, especially if you don't have access to the OS / perf, etc.
> 
> Yeah.  fd.c does its level best to keep going even with only a few FDs
> available, and it's hard to tell that you have a performance problem
> arising from that.  (Although I recall old war stories about Postgres
> continuing to chug along just fine after it'd run the kernel out of
> FDs, although every other service on the system was crashing left and
> right, making it difficult e.g. even to log in.  That scenario is why
> I'm resistant to pushing our allowed number of FDs to the moon...)
> 
>> So
>> my takeaway was we should improve that first, so that people have a
>> chance to realize they have this issue, and can do the tuning. The
>> improvements I thought about were:
> 
>> - track hits/misses for the VfdCache (and add a system view for that)
> 
> I think what we actually would like to know is how often we have to
> close an open FD in order to make room to open a different file.
> Maybe that's the same thing you mean by "cache miss", but it doesn't
> seem like quite the right terminology.  Anyway, +1 for adding some way
> to discover how often that's happening.
> 

We can count the evictions (i.e. closing a file so that we can open a
new one) too, but AFAICS that's about the same as counting "misses"
(opening a file after not finding it in the cache). After the cache
warms up, those counts should be about the same, I think.

Or am I missing something?

>> - maybe have wait event for opening/closing file descriptors
> 
> Not clear that that helps, at least for this specific issue.
> 

I don't think Jelte described any specific issue, but the symptoms I've
observed were that a query was accessing a table with ~1000 relations
(partitions + indexes), trashing the vfd cache, getting ~0% cache hits.
And the open/close calls were taking a lot of time (~25% CPU time).
That'd be very visible as a wait event, I believe.

>> - show max_safe_fds value somewhere, not just max_files_per_process
>>   (which we may silently override and use a lower value)
> 
> Maybe we should just assign max_safe_fds back to max_files_per_process
> after running set_max_safe_fds?  The existence of two variables is a
> bit confusing anyhow.  I vaguely recall that we had a reason for
> keeping them separate, but I can't think of the reasoning now.
> 

That might work. I don't know what were the reasons for not doing that,
I suppose there were reasons not to do that.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: describe special values in GUC descriptions more consistently
Next
From: Daniel Gustafsson
Date:
Subject: Re: describe special values in GUC descriptions more consistently