Misleading "epoll_create1 failed: Too many open files" - Mailing list pgsql-hackers

From Andres Freund
Subject Misleading "epoll_create1 failed: Too many open files"
Date
Msg-id xjjx7r4xa7beixuu4qtkdhnwdbchrrpo3gaeb3jsbinvvdiat5@cwjw55mna5of
Whole thread Raw
Responses Re: Misleading "epoll_create1 failed: Too many open files"
List pgsql-hackers
Hi,

I ran something which triggered the error in $subject. Except that it turns
out that
a) epoll_create1() was not being called
b) we didn't actually hit EMFILE or even max_safe_fds

The reason for the failure is that we have:
    if (!AcquireExternalFD())
    {
        /* treat this as though epoll_create1 itself returned EMFILE */
        elog(ERROR, "epoll_create1 failed: %m");
    }

and

bool
AcquireExternalFD(void)
{
    /*
     * We don't want more than max_safe_fds / 3 FDs to be consumed for
     * "external" FDs.
     */
    if (numExternalFDs < max_safe_fds / 3)
    {
        ReserveExternalFD();
        return true;
    }
    errno = EMFILE;
    return false;
}

I think it's rather confusing to claim that epoll_create1() failed when we
didn't even call it.

Why are we misattributing the failure to a system call that we didn't make?

The current behaviour was introduced in

commit 3d475515a15f70a4a3f36fbbba93db6877ff8346
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2020-02-24 17:28:33 -0500

    Account explicitly for long-lived FDs that are allocated outside fd.c.



I also wish we wouldn't report EMFILE when we didn't actually reach any hard
limit - that makes the system behaviour unnecessarily confusing. But that's
not quite so easy to fix.


How about making the error message something like
                elog(ERROR, "AcquireExternalFD, for epoll_create1, failed: %m");

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Dmitry Nikitin
Date:
Subject: Re: [PATCH] Missing Assert in the code
Next
From: Japin Li
Date:
Subject: Re: UUID v7