Archiver behavior at shutdown - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Archiver behavior at shutdown |
Date | |
Msg-id | 20074.1198787370@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Archiver behavior at shutdown
Re: Archiver behavior at shutdown |
List | pgsql-hackers |
The problem complained of in bug #3843 was something I'd noticed a few days ago and meant to fix. ISTM the recent change to have the archiver outlive the postmaster was incompletely thought out, and we really need to take two steps back and reconsider, if we want to fix it so it works. As of CVS HEAD, the behavior after the postmaster receives a shutdown request and has seen its last regular-backend child die is: 1. Issue SIGUSR2 to the bgwriter to make it start a shutdown checkpoint. 2. Immediately SIGQUIT the archiver. 3. Back at the main loop, restart the archiver, if it exits before the bgwriter finishes the checkpoint (as is highly likely). 4. After postmaster exits, archiver eventually notices it's gone, but that takes a good while since we are guaranteed to be just starting the delay loop inside the fresh archiver process. This is just plain dumb. Aside from the uselessness of killing a process only to immediately re-fork it, we should not be SIGQUIT'ing the archiver during normal operation --- that might abort an archive copy partway through, and it's anybody's guess whether the archive_command script is smart enough to deal with that situation. ISTM the postmaster should leave the archiver alone at the PM_WAIT_BACKENDS -> PM_SHUTDOWN transition, and instead send it a WAKEN signal (SIGUSR1) when it sees normal exit of the bgwriter. That will afford an opportunity to archive anything that was pushed out during the shutdown checkpoint. A possibly better alternative, since the archiver isn't using SIGUSR2, is to send SIGUSR2 which would be defined as "archive what you can and then quit". (In that case, the !PostmasterIsAlive exit would be taken only in the event of a true postmaster crash, which is improbable.) Another case that seems not to have been thought about very much is whether the archiver should behave differently in a "mode fast" shutdown as opposed to "mode smart". I would argue that it should not, since both cases are supposed to be equally safe for your data. I notice though that the postmaster suppresses forwarding of WAKEN signals after entering FastShutdown mode; that doesn't seem like a good idea. Another case that needs some revisiting is the archiver's response to SIGTERM, which is currently SIG_IGN. Since the postmaster will never send it SIGTERM, we should assume that receipt of SIGTERM means that init is telling us we have N seconds left before system shutdown. Is it a good idea to continue archiving in that situation? I doubt it --- it seems like we are just asking to get SIGKILL'd partway through a copy step. I suggest that the response to SIGTERM ought to be to finish out the current copy operation (if possible) but then quit without initiating any new ones. And while I'm griping: I see that the pgstats process is SIGQUIT'ed at the entry to PM_SHUTDOWN state, same as the archiver. This likewise seems out of step with current reality, since the bgwriter now sends messages to the stats collector. This step needs to be moved to after bgwriter termination, too. Comments? Anyone see any other bugs here? regards, tom lane
pgsql-hackers by date: