Thread: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Thomas Munro
Date:
Fix unlink() for STATUS_DELETE_PENDING on Windows.

Commit f357233c assumed that it was OK to return ENOENT directly if
lstat() failed that way.  If we got STATUS_DELETE_PENDING while trying
to unlink a file that we had already unlinked successfully once before
but someone else still had open (on a kernel version that has "pending"
unlinks by default), then we would no longer reach the retry loop in
pgunlink().  That loop claims to be only for handling sharing violations
(a different phenomenon), but the errno is the same.

Restore that behavior with an explicit check, to see if it fixes the
occasional 'directory not empty' failures seen in the pg_upgrade tests
on CI.  Further improvements are possible with proposed upgrades to
modern Windows APIs that would replace this convoluted code.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20220920013122.GA31833%40telsasoft.com
Discussion: https://postgr.es/m/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/e109e43921d21d069c03f18d7c9d8f4e5cb6a0c3

Modified Files
--------------
src/port/dirmod.c | 38 ++++++++++++++++++++++++++++++++++++--
1 file changed, 36 insertions(+), 2 deletions(-)


Re: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Justin Pryzby
Date:
On Tue, Oct 25, 2022 at 03:29:42AM +0000, Thomas Munro wrote:
> Fix unlink() for STATUS_DELETE_PENDING on Windows.
> 
> Commit f357233c assumed that it was OK to return ENOENT directly if
> lstat() failed that way.  If we got STATUS_DELETE_PENDING while trying
> to unlink a file that we had already unlinked successfully once before
> but someone else still had open (on a kernel version that has "pending"
> unlinks by default), then we would no longer reach the retry loop in
> pgunlink().  That loop claims to be only for handling sharing violations
> (a different phenomenon), but the errno is the same.
> 
> Restore that behavior with an explicit check, to see if it fixes the
> occasional 'directory not empty' failures seen in the pg_upgrade tests
> on CI.  Further improvements are possible with proposed upgrades to
> modern Windows APIs that would replace this convoluted code.
> 
> Reported-by: Justin Pryzby <pryzby@telsasoft.com>
> Reviewed-by: Michael Paquier <michael@paquier.xyz>
> Discussion: https://postgr.es/m/20220920013122.GA31833%40telsasoft.com
> Discussion: https://postgr.es/m/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
...
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/e109e43921d21d069c03f18d7c9d8f4e5cb6a0c3

If I'm not wrong, this didn't fix the issue you said it fixed.

cfbot says this ran 1h ago, and its HEAD^3 is e109e43.
https://cirrus-ci.com/task/5939314583404544



Re: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Justin Pryzby
Date:
On Tue, Oct 25, 2022 at 04:21:02PM -0500, Justin Pryzby wrote:
> > Restore that behavior with an explicit check, to see if it fixes the
> > occasional 'directory not empty' failures seen in the pg_upgrade tests
> > on CI.  Further improvements are possible with proposed upgrades to
> > modern Windows APIs that would replace this convoluted code.
> > 
> > Reported-by: Justin Pryzby <pryzby@telsasoft.com>
> > Reviewed-by: Michael Paquier <michael@paquier.xyz>
> > Discussion: https://postgr.es/m/20220920013122.GA31833%40telsasoft.com
> > Discussion: https://postgr.es/m/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
> ...
> > Details
> > -------
> > https://git.postgresql.org/pg/commitdiff/e109e43921d21d069c03f18d7c9d8f4e5cb6a0c3
> 
> If I'm not wrong, this didn't fix the issue you said it fixed.

s/said/hoped/sorry



Re: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Thomas Munro
Date:
On Wed, Oct 26, 2022 at 10:31 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Tue, Oct 25, 2022 at 04:21:02PM -0500, Justin Pryzby wrote:
> > > Restore that behavior with an explicit check, to see if it fixes the
> > > occasional 'directory not empty' failures seen in the pg_upgrade tests
> > > on CI.  Further improvements are possible with proposed upgrades to
> > > modern Windows APIs that would replace this convoluted code.

> > If I'm not wrong, this didn't fix the issue you said it fixed.
>
> s/said/hoped/sorry

Drat.  More theories needed then.  Perhaps it has nothing to do with
my recent changes.  I am starting to wonder if we should have an
rmdir() wrapper that waits for zombie files to go away, and spits out
the name of the file that's in the way, and also the name/pid of the
process that has it open[1] if it times out, so we have a fighting
chance of debugging this type of stuff.

[1] https://devblogs.microsoft.com/oldnewthing/20120217-00/?p=8283



Re: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Justin Pryzby
Date:
On Wed, Oct 26, 2022 at 11:15:16AM +1300, Thomas Munro wrote:
> On Wed, Oct 26, 2022 at 10:31 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Tue, Oct 25, 2022 at 04:21:02PM -0500, Justin Pryzby wrote:
> > > > Restore that behavior with an explicit check, to see if it fixes the
> > > > occasional 'directory not empty' failures seen in the pg_upgrade tests
> > > > on CI.  Further improvements are possible with proposed upgrades to
> > > > modern Windows APIs that would replace this convoluted code.
> 
> > > If I'm not wrong, this didn't fix the issue you said it fixed.
> >
> > s/said/hoped/sorry
> 
> Drat.  More theories needed then.  Perhaps it has nothing to do with
> my recent changes.  I am starting to wonder if we should have an
> rmdir() wrapper that waits for zombie files to go away, and spits out
> the name of the file that's in the way, and also the name/pid of the
> process that has it open[1] if it times out, so we have a fighting
> chance of debugging this type of stuff.
> 
> [1] https://devblogs.microsoft.com/oldnewthing/20120217-00/?p=8283

FWIW, your description of the problem sounded a bit off to me.

You seemed to say that the problem is with rmdir() on a directory, which
contains a file which was "recently removed", but still opened by
something.

But as I recall, at the point that pg_upgrade is running rmdir(), the
contained files have been unlinked, and the postgres process has ended:

| # Running: pg_upgrade --no-sync -d
C:\cirrus\build/testrun/pg_upgrade/002_pg_upgrade\data/t_002_pg_upgrade_old_node_data/pgdata-D
C:\cirrus\build/testrun/pg_upgrade/002_pg_upgrade\data/t_002_pg_upgrade_new_node_data/pgdata-b
C:/cirrus/build/tmp_install/bin-B C:/cirrus/build/tmp_install/bin -s
C:/Users/ContainerAdministrator/AppData/Local/Temp/f67bWmckRH-p 57611 -P 57612 --check
 
| Performing Consistency Checks
| -----------------------------
| Checking cluster versions                                   ok
| Checking database user is the install user                  ok
| Checking database connection settings                       ok
| Checking for prepared transactions                          ok
| Checking for system-defined composite types in user tables  ok
| Checking for reg* data types in user tables                 ok
| Checking for contrib/isn with bigint-passing mismatch       ok
| Checking for presence of required libraries                 ok
| Checking database user is the install user                  ok
| Checking for prepared transactions                          ok
| Checking for new cluster tablespace directories             ok
| 
| *Clusters are compatible*
| pg_upgrade: warning: could not remove file or directory
"C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/t_002_pg_upgrade_new_node_data/pgdata/pg_upgrade_output.d/20220925T193346.158/log":
Directorynot empty
 
| pg_upgrade: warning: could not remove file or directory
"C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/t_002_pg_upgrade_new_node_data/pgdata/pg_upgrade_output.d/20220925T193346.158":
Directorynot empty
 

My thinking has always been that this is essentially a bug/deficiency in
the windows FS.  It seems absurd that unlink/rmdir would be
asynchronous.

It'd be swell if we could use a separate device in CI, to be used for
running tests.  Bonus points if it supports COW.

-- 
Justin



Re: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Thomas Munro
Date:
On Wed, Oct 26, 2022 at 12:57 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Wed, Oct 26, 2022 at 11:15:16AM +1300, Thomas Munro wrote:
> > On Wed, Oct 26, 2022 at 10:31 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > If I'm not wrong, this didn't fix the issue you said it fixed.
> > >
> > > s/said/hoped/sorry
> >
> > Drat.  More theories needed then.  Perhaps it has nothing to do with
> > my recent changes.  I am starting to wonder if we should have an
> > rmdir() wrapper that waits for zombie files to go away, and spits out
> > the name of the file that's in the way, and also the name/pid of the
> > process that has it open[1] if it times out, so we have a fighting
> > chance of debugging this type of stuff.
> >
> > [1] https://devblogs.microsoft.com/oldnewthing/20120217-00/?p=8283
>
> FWIW, your description of the problem sounded a bit off to me.
>
> You seemed to say that the problem is with rmdir() on a directory, which
> contains a file which was "recently removed", but still opened by
> something.

Right, that's what I was guessing because that is a known phenomenon,
and we can see that the error is "Directory not empty".  If, on the
other hand, that's caused by a file that's never been unlinked, we
might expect to see occasional failures on non-Windows systems too.

Wait a minute.  Even if that hunch were correct, what I changed would
not be enough to fix it.  Looking at the code that presumably performs
the recursive unlink and emits the warning, namely
src/common/rmtree.c, I see that it calls lstat() itself before calling
unlink().  So a zombie pending-deleted file would be skipped (because
ENOENT) before even reaching the code in question, and that was
already the case before anything I changed in the past few months in
this area AFAIK.

Which is a clue that we've been looking in the wrong place.  Perhaps
it has to do with file handles opened by pg_upgrade itself, which did
indeed recently change to a new path, though you'd think that'd be
more deterministic.  Now I'm tempted to write the patch that would
tell us the names of the files that are in the way to get more
visibility here.

> But as I recall, at the point that pg_upgrade is running rmdir(), the
> contained files have been unlinked, and the postgres process has ended:

Oh.  Hmm.  I wondered if it might be a logger process (they shut down
slightly after the postmaster IIRC), but it doesn't look like we start
one.

> My thinking has always been that this is essentially a bug/deficiency in
> the windows FS.  It seems absurd that unlink/rmdir would be
> asynchronous.

It certainly doesn't make life easy for open source projects from
planet Unix.  I wonder if those semantics came from the VMS orbit.

> It'd be swell if we could use a separate device in CI, to be used for
> running tests.  Bonus points if it supports COW.

I think it might be possible to create a ReFS filesystem inside a
loopback file.  On that filesystem, the tests I propose in CF #3951
take the !have_posix_unlink_semantics path, even if we commit the
patch to turn on POSIX semantics (which has useful effect only on
NTFS, according to experiments so far).  If we decided to commit that
without also setting up coverage for non-POSIX-semantics filesystems,
I predict that our support for non-POSIX unlink semantics would very
soon decay and become unsalvageable, which is why it amounts to a
policy decision about future support...



Re: pgsql: Fix unlink() for STATUS_DELETE_PENDING on Windows.

From
Andres Freund
Date:
Hi,

On 2022-10-25 18:57:54 -0500, Justin Pryzby wrote:
> It'd be swell if we could use a separate device in CI, to be used for
> running tests.  Bonus points if it supports COW.

My colleague Bilal just got the generation of windows VMs and running the test
with cirrus working, not done by any means, but progress. I'm pretty sure that
we could repartition during so that there's space for a separate refs
partition.

Both start and test times are vastly improved: https://cirrus-ci.com/build/5239781717180416

Greetings,

Andres Freund