Re: BUG #15804: Assertion failure when using logging_collector with EXEC_BACKEND - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15804: Assertion failure when using logging_collector with EXEC_BACKEND
Date
Msg-id 1664.1558325417@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #15804: Assertion failure when using logging_collector withEXEC_BACKEND  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #15804: Assertion failure when using logging_collector with EXEC_BACKEND  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Michael Paquier <michael@paquier.xyz> writes:
> I have not tested on Windows this one, but on Linux with EXEC_BACKEND
> the test is still not able to detect correctly the failures of the
> syslogger if one reverts 8334515 to re-enable the early syslogger
> startup, so that's a bit disappointing,

[ pokes at that... ]  Hah, it proves the syslogger restart logic
works anyway.  Because when we restart the crashed syslogger,
we're doing so after shmem exists, so the asserts don't fire.


However, I had a sudden realization about this, which is that
we need to think harder about the question of how the startup
sequence interlocks with the possibility of a pre-existing
postmaster or orphan backends.  There's code down inside
CreateDataDirLockFile that attempts to detect a pre-existing
postmaster, but if the postmaster died leaving orphan backends,
that interlock will not detect them.  Where we will notice
surviving backends is where we look for a pre-existing shared
memory segment, which is down inside reset_shared.

And: we really should not do anything much to the data directory
until we know that no such old processes remain.  Otherwise we
risk problems such as deleting active temp files.

This line of thought suggests that trying to fix things so that
we can launch child processes before creating shared memory
is the wrong thing, because it seriously risks creating problems
in the leftover-child-processes scenario.

This means that the change that 57431a911 wanted to make is only
going to be safe if we're willing to re-order things so that the
startup sequence is

    * create datadir lock file
    * create shmem
    * launch syslogger
    * create sockets

Historically we've opened the sockets before making shmem.  I'm
not sure offhand if there's any compelling reason for that order
... but if there is, getting 57431a911 to work is a whole lot
trickier than we've been thinking.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #15804: Assertion failure when using logging_collector withEXEC_BACKEND
Next
From: König, Monika (62-24)
Date:
Subject: problem with latin09 encoding after upgrade to 11.3