Hi,
I ran something which triggered the error in $subject. Except that it turns
out that
a) epoll_create1() was not being called
b) we didn't actually hit EMFILE or even max_safe_fds
The reason for the failure is that we have:
if (!AcquireExternalFD())
{
/* treat this as though epoll_create1 itself returned EMFILE */
elog(ERROR, "epoll_create1 failed: %m");
}
and
bool
AcquireExternalFD(void)
{
/*
* We don't want more than max_safe_fds / 3 FDs to be consumed for
* "external" FDs.
*/
if (numExternalFDs < max_safe_fds / 3)
{
ReserveExternalFD();
return true;
}
errno = EMFILE;
return false;
}
I think it's rather confusing to claim that epoll_create1() failed when we
didn't even call it.
Why are we misattributing the failure to a system call that we didn't make?
The current behaviour was introduced in
commit 3d475515a15f70a4a3f36fbbba93db6877ff8346
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2020-02-24 17:28:33 -0500
Account explicitly for long-lived FDs that are allocated outside fd.c.
I also wish we wouldn't report EMFILE when we didn't actually reach any hard
limit - that makes the system behaviour unnecessarily confusing. But that's
not quite so easy to fix.
How about making the error message something like
elog(ERROR, "AcquireExternalFD, for epoll_create1, failed: %m");
Greetings,
Andres Freund