Re: pg_upgrade test failure - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: pg_upgrade test failure
Date
Msg-id CA+hUKGJMynw7BNpsaF3c7hPBEBpzP8T_8WbO-93s=XzcoEsoxw@mail.gmail.com
Whole thread Raw
In response to Re: pg_upgrade test failure  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: pg_upgrade test failure  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Wed, Feb 1, 2023 at 10:08 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Feb 1, 2023 at 10:04 AM Andres Freund <andres@anarazel.de> wrote:
> > Maybe we should just handle it by sleeping and retrying, if on windows? Sad to even propose...
>
> Yeah, that's what that code I posted would do automatically, though
> it's a bit hidden.  The second attempt to unlink() would see delete
> already pending, and activate its secret internal sleep/retry loop.

OK, I pushed that.  Third time lucky?

I tracked down the discussion of that existing comment about pg_ctl,
which comes from the 9.2 days:

https://www.postgresql.org/message-id/flat/5044DE59.5020500%40dunslane.net

I guess maybe back then fopen() was Windows' own fopen() that wouldn't
allow two handles to a file at the same time?  These days we redirect
it to a wrapper with the magic "shared" flags, so the kluge installed
by commit f8c81c5dde2 may not even be needed anymore.  It does
demonstrate that there are long standing timing races around log
files, process exit and wait-for-shutdown logic, though.

Someone who develops for Windows could probably chase this right down,
and make sure that we do certain things in the right order, and/or
find better kernel facilities; at a wild guess, something like
OpenProcess() before you initiate shutdown, so you can then wait on
its handle, for example.  The docs for ExitProcess() make it clear
that handles are synchronously closed, so I think it's probably just
that our tests for when processes have exited are too fuzzy.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Next
From: Michael Paquier
Date:
Subject: Weird failure with latches in curculio on v15