pgsql: Fix incorrect order of lock file removal and failure to close() - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix incorrect order of lock file removal and failure to close()
Date
Msg-id E1ZLyPq-0007L8-VF@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix incorrect order of lock file removal and failure to close() sockets.

Commit c9b0cbe98bd783e24a8c4d8d8ac472a494b81292 accidentally broke the
order of operations during postmaster shutdown: it resulted in removing
the per-socket lockfiles after, not before, postmaster.pid.  This creates
a race-condition hazard for a new postmaster that's started immediately
after observing that postmaster.pid has disappeared; if it sees the
socket lockfile still present, it will quite properly refuse to start.
This error appears to be the explanation for at least some of the
intermittent buildfarm failures we've seen in the pg_upgrade test.

Another problem, which has been there all along, is that the postmaster
has never bothered to close() its listen sockets, but has just allowed them
to close at process death.  This creates a different race condition for an
incoming postmaster: it might be unable to bind to the desired listen
address because the old postmaster is still incumbent.  This might explain
some odd failures we've seen in the past, too.  (Note: this is not related
to the fact that individual backends don't close their client communication
sockets.  That behavior is intentional and is not changed by this patch.)

Fix by adding an on_proc_exit function that closes the postmaster's ports
explicitly, and (in 9.3 and up) reshuffling the responsibility for where
to unlink the Unix socket files.  Lock file unlinking can stay where it
is, but teach it to unlink the lock files in reverse order of creation.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d73d14c271653dff10c349738df79ea03b85236c

Modified Files
--------------
src/backend/libpq/pqcomm.c          |   54 ++++++++++++++++-------------------
src/backend/postmaster/postmaster.c |   47 ++++++++++++++++++++++++++++++
src/backend/utils/init/miscinit.c   |    6 +++-
src/include/libpq/libpq.h           |    1 +
4 files changed, 78 insertions(+), 30 deletions(-)


pgsql-committers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: pgsql: Fix race condition that lead to WALInsertLock deadlock with comm
Next
From: Tom Lane
Date:
Subject: pgsql: Fix incorrect order of lock file removal and failure to close()