Re: Race conditions with checkpointer and shutdown - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Race conditions with checkpointer and shutdown
Date
Msg-id 7164.1555646568@sss.pgh.pa.us
Whole thread Raw
In response to Re: Race conditions with checkpointer and shutdown  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Race conditions with checkpointer and shutdown
List pgsql-hackers
>>> Maybe what we should be looking for is "why doesn't the walreceiver
>>> shut down"?  But the dragonet log you quote above shows the walreceiver
>>> exiting, or at least starting to exit.  Tis a puzzlement.

huh ... take a look at this little stanza in PostmasterStateMachine:

    if (pmState == PM_SHUTDOWN_2)
    {
        /*
         * PM_SHUTDOWN_2 state ends when there's no other children than
         * dead_end children left. There shouldn't be any regular backends
         * left by now anyway; what we're really waiting for is walsenders and
         * archiver.
         *
         * Walreceiver should normally be dead by now, but not when a fast
         * shutdown is performed during recovery.
         */
        if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 &&
            WalReceiverPID == 0)
        {
            pmState = PM_WAIT_DEAD_END;
        }
    }

I'm too tired to think through exactly what that last comment might be
suggesting, but it sure seems like it might be relevant to our problem.
If the walreceiver *isn't* dead yet, what's going to ensure that we
can move forward later?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: bug in update tuple routing with foreign partitions
Next
From: Amit Langote
Date:
Subject: Re: bug in update tuple routing with foreign partitions