Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Refactoring postmaster's code to cleanup after child exit |
Date | |
Msg-id | CA+hUKGKWqSFoJXzRC9D0owFj1Xby0qvWr-k6p8z9gQzT7+nr7w@mail.gmail.com Whole thread Raw |
In response to | Re: Refactoring postmaster's code to cleanup after child exit (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Refactoring postmaster's code to cleanup after child exit
|
List | pgsql-hackers |
On Fri, Aug 2, 2024 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > * v3-0001-Make-BackgroundWorkerList-doubly-linked.patch LGTM. > [v3-0002-Refactor-code-to-handle-death-of-a-backend-or-bgw.patch] Currently, when a child process exits, the postmaster first scans through BackgroundWorkerList, to see if it the child process was a background worker. If not found, then it scans through BackendList to see if it was a regular backend. That leads to some duplication between the bgworker and regular backend cleanup code, as both have an entry in the BackendList that needs to be cleaned up in the same way. Refactor that so that we scan just the BackendList to find the child process, and if it was a background worker, do the additional bgworker-specific cleanup in addition to the normal Backend cleanup. Makes sense. On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's now logged with that exit code, instead of 0. Also, if a bgworker exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is restarted. Previously it was treated as a normal exit. Interesting. So when that error was first specially handled in this thread: https://www.postgresql.org/message-id/flat/AANLkTimCTkNKKrHCd3Ot6kAsrSS7SeDpOTcaLsEP7i%2BM%40mail.gmail.com#41f60947571b75377f04af67ba6baf40 ... it went from being considered a crash, to being considered like exit(0). It's true that shared memory can't be corrupted by a process that never enters main(), but it's better not to hide the true reason for the failure (if it is still possible -- I don't find many references to that phenomenon in recent times). Clobbering exitstatus with 0 doesn't seem right at all, now that we have background workers whose restart behaviour is affected by that. If a child process is not found in the BackendList, the log message now calls it "untracked child process" rather than "server process". Arguably that should be a PANIC, because we do track all the child processes in the list, so failing to find a child process is highly unexpected. But if we want to change that, let's discuss and do that as a separate commit. Yeah, it would be highly unexpected if waitpid() told you about some random other process (or we screwed up the bookkeeping and didn't recognise it). So at least having a different message seems good. > * v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patch +1 > * v3-0004-Consolidate-postmaster-code-to-launch-background-.patch Much of the code in process_pm_child_exit() to launch replacement processes when one exits or when progressing to next postmaster state was unnecessary, because the ServerLoop will launch any missing background processes anyway. Remove the redundant code and let ServerLoop handle it. +1, makes sense. In ServerLoop, move the code to launch all the processes to a new subroutine, to group it all together. +1, makes sense. More soon...
pgsql-hackers by date: