Re: Random pg_upgrade 004_subscription test failure on drongo - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Random pg_upgrade 004_subscription test failure on drongo
Date
Msg-id aPX09h2QbrPDKrS5@paquier.xyz
Whole thread Raw
In response to Re: Random pg_upgrade 004_subscription test failure on drongo  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Mon, Sep 22, 2025 at 02:28:35PM +0530, vignesh C wrote:
> CFBot reported an issue in one of the machines, here is an updated
> version for the same.

@@ -235,6 +248,9 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
[...]
+#if defined(WIN32) && !defined(__CYGWIN__)
+            if (!retryattempted && pg_RtlGetLastNtStatus() == STATUS_DELETE_PENDING)
+            {
+                retryattempted = true;
+                WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+                goto retry;
+            }

Adding more WIN32-specific awesomeness into a single backend code path
that we try to make POSIX-consistent does not seem right to me,
because it may apply to more FDs opened than this one, no?  One code
path would be to enforce a signal in pgwin32_open_handle(), only when
we see a STATUS_DELETE_PENDING.  And there is a retry loop in
src/port/open.c in our wrapper, partially for this reason.

Like any failures of this type, how can we reliably make sure that
these issues are gone for sure?  Perhaps it would be time to have a
test module specified in concurrent file-system operations?  We could
hold into FDs while making backends wait, for example, with various
concurrent in-core calls stepping on each other..  Perhaps that would
be beneficial in the long-term knowing the number of platforms we need
to support.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: Executing pg_createsubscriber with a non-compatible control file
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add Windows support for backtrace_functions (MSVC only)