pgsql: Account explicitly for long-lived FDs that are allocated outside - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Account explicitly for long-lived FDs that are allocated outside
Date
Msg-id E1j6MDR-00049W-Ca@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Account explicitly for long-lived FDs that are allocated outside fd.c.

The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.

This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could.  The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.

To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c.  Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state.  The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections.  There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.

Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.)  I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now.  To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.

This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.

Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3d475515a15f70a4a3f36fbbba93db6877ff8346

Modified Files
--------------
contrib/dblink/dblink.c                       |  53 ++++++++++++
contrib/postgres_fdw/connection.c             |  30 ++++++-
src/backend/access/transam/xlog.c             |  13 +++
src/backend/postmaster/pgstat.c               |   3 +
src/backend/postmaster/postmaster.c           |  33 +++++++-
src/backend/postmaster/syslogger.c            |   5 ++
src/backend/storage/file/fd.c                 | 116 +++++++++++++++++++++++---
src/backend/storage/ipc/dsm_impl.c            |  15 +++-
src/backend/storage/ipc/latch.c               |  44 ++++++++++
src/backend/utils/misc/guc.c                  |   2 +-
src/backend/utils/misc/postgresql.conf.sample |   2 +-
src/include/storage/fd.h                      |  11 ++-
12 files changed, 304 insertions(+), 23 deletions(-)


pgsql-committers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pgsql: Change client-side fsync_fname() to report errors fatally
Next
From: Michael Paquier
Date:
Subject: pgsql: Issue properly WAL record for CID of first catalog tuple in mult