Re: Trap errors from streaming child in pg_basebackup to exit early - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Trap errors from streaming child in pg_basebackup to exit early
Date
Msg-id 1CCAA16A-4A58-4023-928C-598AB865FC56@yesql.se
Whole thread Raw
In response to Re: Trap errors from streaming child in pg_basebackup to exit early  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Trap errors from streaming child in pg_basebackup to exit early
List pgsql-hackers

> On 30 Aug 2021, at 12:31, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:

> Here are some comments on the patch:
> 1) Do we need volatile keyword here to read the value of the variables
> always from the memory?
> +static volatile sig_atomic_t bgchild_exited = false;

Yes, fixed.

> 2) Do we need #ifndef WIN32 ... #endif around sigchld_handler function
> definition?

Ah yes, good point. Fixed.

> 3) I'm not sure if the new value of bgchild_exited being set in the
> child thread will reflect in the main process on Windows? But
> theoretically, I can understand that the memory will be shared between
> the main process thread and child thread.

The child does not have it’s own copy of bgchild_exited.

> 4) How about "set the same flag that we use on Unix to signal
> SIGCHLD."  instead of "* set the flag used on Unix to signal
> SIGCHLD."?

Fixed.

> 5) How about "background WAL receiver terminated unexpectedly" instead
> of "log streamer child terminated unexpectedly"? This will be in sync
> with the existing message "starting background WAL receiver". "log
> streamer" is the word used internally in the code, user doesn't know
> it with that name.

Good point, that’s better.

> 6) How about giving the exit code (like postmaster's reaper function
> does) instead of just a message saying unexpected termination? It will
> be useful to know for what reason the process exited. For Windows, we
> can use GetExitCodeThread (I'm referring to the code around waitpid in
> pg_basebackup) and for Unix we can use waitpid.

The rest of the program is doing exit(1) regardless of the failure of the
child/thread, so it seems more consistent to keep doing that for this class of
error as well.

A v2 with the above fixes is attached.

--
Daniel Gustafsson        https://vmware.com/


Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Failure of subscription tests with topminnow
Next
From: Amit Kapila
Date:
Subject: Re: Returning to Postgres community work