Order of operations in SubPostmasterMain() - Mailing list pgsql-hackers

From Tom Lane
Subject Order of operations in SubPostmasterMain()
Date
Msg-id 4157.1475178360@sss.pgh.pa.us
Whole thread Raw
Responses Re: Order of operations in SubPostmasterMain()  (Greg Stark <stark@mit.edu>)
Re: Order of operations in SubPostmasterMain()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I noticed that buildfarm member culicidae, which is running an
EXEC_BACKEND build on Linux, occasionally falls over like this:

FATAL:  could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): Invalid argument

That's probably because Andres failed to disable ASLR on that machine,
but the exact cause isn't too important right now.  What I'm finding
distressing is that that message doesn't show up on the client side,
making it look like a postmaster crash:

*** /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/contrib/intarray/expected/_int.out    2016-09-15
22:02:39.512951557+0000
 
--- /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/contrib/intarray/results/_int.out    2016-09-29
17:52:56.921948612+0000
 
***************
*** 1,568 ****
! .... lots ...
--- 1,3 ----
! psql: server closed the connection unexpectedly
!     This probably means the server terminated abnormally
!     before or while processing the request.

The reason we don't see anything is that backend libpq hasn't been
initialized yet, so it won't send the ereport message to the client.

The original design of SubPostmasterMain() intended that such cases
would be reported, because it runs BackendInitialize (which calls
pq_init()) before CreateSharedMemoryAndSemaphores (which is where
the reattach used to happen --- the comments at postmaster.c 4730ff
and 4747ff reflect this expectation).  We broke it when we added
the earlier reattach call at lines 4669ff.

We could probably refactor things enough so that we do pq_init()
before PGSharedMemoryReAttach().  It would be a little bit ugly,
and it would fractionally increase the chance of a reattach failure
because pq_init() palloc's a few KB worth of buffers.  I'm not quite
sure if it's worth it; thoughts?  In any case the mentioned comments
are obsolete and need to be moved/rewritten.

Another thing that I'm finding distressing while I look at this is
the placement of ClosePostmasterPorts().  That needs to happen *early*,
because as long as the child process is holding open the postmaster side
of the postmaster death pipe, we are risking unhappiness.  Not only is
it not particularly early, it's after process_shared_preload_libraries()
which could invoke arbitrarily stupid stuff.  Whether or not we do
anything about moving pq_init(), I'm thinking we'd better move the
ClosePostmasterPorts() call so that it's done as soon as possible,
probably right after read_backend_variables() which loads the data
it needs.

Comments?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pageinspect: Hash index support
Next
From: Peter Eisentraut
Date:
Subject: Re: pageinspect: Hash index support