Home > mailing lists

Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Refactoring postmaster's code to cleanup after child exit
Date	August 8, 2024 10:47:42
Msg-id	CA+hUKGKWqSFoJXzRC9D0owFj1Xby0qvWr-k6p8z9gQzT7+nr7w@mail.gmail.com Whole thread
In response to	Re: Refactoring postmaster's code to cleanup after child exit (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses	Re: Refactoring postmaster's code to cleanup after child exit
List	pgsql-hackers

Tree view

On Fri, Aug 2, 2024 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> * v3-0001-Make-BackgroundWorkerList-doubly-linked.patch

LGTM.

> [v3-0002-Refactor-code-to-handle-death-of-a-backend-or-bgw.patch]

Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.

Makes sense.

On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.

Interesting. So when that error was first specially handled in this thread:

https://www.postgresql.org/message-id/flat/AANLkTimCTkNKKrHCd3Ot6kAsrSS7SeDpOTcaLsEP7i%2BM%40mail.gmail.com#41f60947571b75377f04af67ba6baf40

... it went from being considered a crash, to being considered like
exit(0). It's true that shared memory can't be corrupted by a process
that never enters main(), but it's better not to hide the true reason
for the failure (if it is still possible -- I don't find many
references to that phenomenon in recent times). Clobbering exitstatus
with 0 doesn't seem right at all, now that we have background workers
whose restart behaviour is affected by that.

If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.

Yeah, it would be highly unexpected if waitpid() told you about some
random other process (or we screwed up the bookkeeping and didn't
recognise it). So at least having a different message seems good.

> * v3-0003-Fix-comment-on-processes-being-kept-over-a-restar.patch

> * v3-0004-Consolidate-postmaster-code-to-launch-background-.patch

Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.

+1, makes sense.

In ServerLoop, move the code to launch all the processes to a new
subroutine, to group it all together.

+1, makes sense.

More soon...

pgsql-hackers by date:

From: Peter Eisentraut
Date: 08 August 2024, 10:11:38
Subject: Re: Enable data checksums by default

From: Shlok Kyal
Date: 08 August 2024, 10:54:22
Subject: Re: long-standing data loss bug in initial sync of logical replication

Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

Previous

Next