Re: assertion at postmaster start - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: assertion at postmaster start |
Date | |
Msg-id | 27707.1566680146@sss.pgh.pa.us Whole thread Raw |
In response to | Re: assertion at postmaster start (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: assertion at postmaster start
|
List | pgsql-hackers |
I wrote: > I think what this demonstrates is that that Assert is just wrong: > we *can* reach PM_RUN with the flag still set, so we should do > StartupStatus = STARTUP_NOT_RUNNING; > FatalError = false; > - Assert(AbortStartTime == 0); > + AbortStartTime = 0; > ReachedNormalRunning = true; > pmState = PM_RUN; > Probably likewise for the similar Assert in sigusr1_handler. Poking further at this, I noticed that the code just above here completely fails to do what the comments say it intends to do, which is restart the startup process after we've SIGQUIT'd it. That's because the careful manipulation of StartupStatus in reaper (lines 2943ff in HEAD) is stomped on by HandleChildCrash, which will just unconditionally reset it to STARTUP_CRASHED (line 3507). So we end up shutting down the database after all, which is not what the intention seems to be. Hence, commit 45811be94 was still a few bricks shy of a load :-(. I propose the attached. I'm inclined to think that the risk/benefit of back-patching this is not very good, so I just want to stick it in HEAD, unless somebody can explain why dead_end children are likely to crash in the field. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 3339804..ecb7892 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2920,7 +2920,9 @@ reaper(SIGNAL_ARGS) * during PM_STARTUP is treated as catastrophic. There are no * other processes running yet, so we can just exit. */ - if (pmState == PM_STARTUP && !EXIT_STATUS_0(exitstatus)) + if (pmState == PM_STARTUP && + StartupStatus != STARTUP_SIGNALED && + !EXIT_STATUS_0(exitstatus)) { LogChildExit(LOG, _("startup process"), pid, exitstatus); @@ -2937,11 +2939,24 @@ reaper(SIGNAL_ARGS) * then we previously sent the startup process a SIGQUIT; so * that's probably the reason it died, and we do want to try to * restart in that case. + * + * This stanza also handles the case where we sent a SIGQUIT + * during PM_STARTUP due to some dead_end child crashing: in that + * situation, if the startup process dies on the SIGQUIT, we need + * to transition to PM_WAIT_BACKENDS state which will allow + * PostmasterStateMachine to restart the startup process. (On the + * other hand, the startup process might complete normally, if we + * were too late with the SIGQUIT. In that case we'll fall + * through and commence normal operations.) */ if (!EXIT_STATUS_0(exitstatus)) { if (StartupStatus == STARTUP_SIGNALED) + { StartupStatus = STARTUP_NOT_RUNNING; + if (pmState == PM_STARTUP && Shutdown == NoShutdown) + pmState = PM_WAIT_BACKENDS; + } else StartupStatus = STARTUP_CRASHED; HandleChildCrash(pid, exitstatus, @@ -2954,7 +2969,7 @@ reaper(SIGNAL_ARGS) */ StartupStatus = STARTUP_NOT_RUNNING; FatalError = false; - Assert(AbortStartTime == 0); + AbortStartTime = 0; ReachedNormalRunning = true; pmState = PM_RUN; @@ -3504,7 +3519,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname) if (pid == StartupPID) { StartupPID = 0; - StartupStatus = STARTUP_CRASHED; + /* Caller adjusts StartupStatus, so don't touch it here */ } else if (StartupPID != 0 && take_action) { @@ -5100,7 +5115,7 @@ sigusr1_handler(SIGNAL_ARGS) { /* WAL redo has started. We're out of reinitialization. */ FatalError = false; - Assert(AbortStartTime == 0); + AbortStartTime = 0; /* * Crank up the background tasks. It doesn't matter if this fails,
pgsql-hackers by date: