On Sat, Sep 07, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> With extra logging added, I got:
> ### Stopping node "CIC_2PC_test" using mode fast
> # Running: pg_ctl -D
C:\src\postgresql\build/testrun/amcheck_3/003_cic_2pc\data/t_003_cic_2pc_CIC_2PC_test_data/pgdata
> -m fast stop
> waiting for server to shut down......!!!pgkill| GetLastError(): 231
> postmaster (9596) died untimely? res: -1, errno: 22
> failed
>
> Thus, CallNamedPipe() in pgkill() returned ERROR_PIPE_BUSY (All pipe
> instances are busy) and it was handled as an unexpected error.
> (The error code 231 returned 10 times out of 10 failures of this ilk for
> me.)
Thanks for discovering that.
> Noah, what do you think of handling this error in line with handling of
> ERROR_BROKEN_PIPE and ERROR_BAD_PIPE (which was done in 0ea1f2a3a)?
>
> I tried the following change:
> switch (GetLastError())
> {
> case ERROR_BROKEN_PIPE:
> case ERROR_BAD_PIPE:
> + case ERROR_PIPE_BUSY:
> and saw no issues.
That would be a strict improvement over returning EINVAL like we do today. We
do use PIPE_UNLIMITED_INSTANCES, so I expect the causes of ERROR_PIPE_BUSY are
process exit and ENOMEM-like situations. While that change is the best thing
if the process is exiting, it could silently drop the signal in ENOMEM-like
situations. Consider the following alternative. If sig==0, just return 0
like you propose, because the process isn't completely gone. Otherwise, sleep
and retry the signal, like pgwin32_open_handle() retries after certain errors.
What do you think of that?