While investigating code paths that use system() and popen() and
trying to write latch-multiplexing replacements, which I'll write
about separately, leaks of $SUBJECT became obvious. On a FreeBSD box,
you can see the sockets, pipes and data and WAL files that the
subprocess inherits:
create table t (x text);
copy t from program 'procstat -f $$';
select * from t;
(On Linux, you might try something like 'ls -slap /proc/self/fd'. Not
sure how to do it on a Mac but note that 'lsof' is no good for this
purpose because the first thing it does is close all descriptors > 2;
maybe 'lsof -p $PPID' inside a shell wrapper so you're analysing the
shell process that sits between postgres and lsof, rather than lsof
itself?)
Since we've started assuming a few other bits of SUSv3 (POSIX 2008)
functionality, the standard that specified O_CLOEXEC, and since in
practice Linux, *BSD, macOS, AIX, Solaris, illumos all have it, I
think we can unconditionally just use it on all files we open. That
is, if we were to make fallback code, it would be untested, and if we
were to do it with fcntl() always it would be a frequent extra system
call that we don't need to support a computer that doesn't exist. For
sockets and pipes, much more rarely created, some systems have
non-standard extensions along the same lines, but I guess we should
stick with standards and call fcntl(FD_CLOEXEC) for now.
There is a place in fd.c that already referenced O_CLOEXEC (it wasn't
really using it, just making an assertion that flags don't collide),
with #ifdef around it, but that was only conditional because at the
time of commit 04cad8f7 we had a macOS 10.4 system (released 2005) in
the 'farm which obviously didn't know about POSIX 2008 interfaces. We
can just remove that #ifdef. (It's probably OK to remove the test of
O_DSYNC too but I'll think about that another time.)
On Windows, handles, at least as we create them, are not inherited so
the problem doesn't come up AFAICS. I *think* if we were to use
Windows' own open(), that would be an issue, but we have our own
CreateFile() thing and it doesn't turn on inheritance IIUC. So I just
gave O_CLOEXEC a zero definition there. It would be interesting to
know what handles a subprocess sees. If someone who knows how to
drive Windows could run a subprogram that just does the equivalent of
'sleep 60' they might be able to see that in one of those handle spy
tools, to visually check the above. (If I'm wrong about that, it
might be possible for a subprocess to interfere with a
ProcSignalBarrier command to close all files, so I'd love to know
about it.)
We were already doing FD_CLOEXEC on the latch self-pipe with comment
"we surely do not want any child processes messing with them", so it's
not like this wasn't a well-known issue before, but I guess it just
never bothered anyone enough to do anything about the more general
problem.
With the attached, the test at the top of this email shows only in,
out, error, and one thing that procstat opened itself.
Are there any more descriptors we need to think about?