Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Date
Msg-id 20240208215920.myib264liwbkdndp@awork3.anarazel.de
Whole thread Raw
In response to Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi,

On 2024-02-08 12:04:24 -0500, Tom Lane wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > IIUC I think the assertion failure was caused by an
> > error-during-error-recovery loop caused by the "epoll_create1 failed:
> > Too many open files" error raised in WaitLatchOrSocket called from
> > pgfdw_get_cleanup_result, which is called during abort cleanup.  I
> > think a simple fix to avoid such a loop is to modify the PG_CATCH
> > block in pgfdw_get_cleanup_result so that it just ignores the passed
> > error, not re-throwing it, and restores InterruptHoldoffCount and the
> > memory context, like the attached.  In the patch I also modified
> > callers of pgfdw_get_cleanup_result to issue a warning when ignoring
> > the error.  I might be missing something, though.
> 
> I do not think ignoring the passed error is *ever* acceptable.
> You have no idea what the error condition is or whether your
> hack is sufficient to recover from it.

+1


I think we ought to understand *why* we are getting the "Too many open
files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
that.

One annoying bit is that AcquireExternalFD() failing emits the same error as
if epoll_create1() itself failing, including the same errno. So we don't know
if the problem is that there are too many connections and because of that we
are running out of "file descriptor slots", or whether something was using up
file descriptors outside of our system, or ...

I also wonder if postgres_fdw should strive to use a longer lived wait event
set. For efficiency, if nothing else? That'd avoid the need to create one
during error handling.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Next
From: Tom Lane
Date:
Subject: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds