Re: stat() vs ERROR_DELETE_PENDING, round N + 1 - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: stat() vs ERROR_DELETE_PENDING, round N + 1
Date
Msg-id CA+hUKGJMPVJ9-vGRYjd1tBqC9yoiiuBzteX5gBudTwuDVxxwdA@mail.gmail.com
Whole thread Raw
In response to Re: stat() vs ERROR_DELETE_PENDING, round N + 1  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: stat() vs ERROR_DELETE_PENDING, round N + 1
List pgsql-hackers
On Thu, Sep 2, 2021 at 11:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I'm no expert, but not AFAICS.  We managed to delete the file while
> > some other backend had it open, which FILE_SHARE_DELETE allowed.  We
> > just can't open it or create a new file with the same name until it's
> > really gone (all handles closed).
>
> Right, but we shouldn't ever need to access such a file --- we couldn't do
> so on Unix, certainly.  So for the open() case, it's sufficient to return
> ENOENT, and the problem is only to make sure that that's what we return if
> the underlying error is ERROR_DELETE_PENDING.

Yeah.  The problem is that it still shows up in directory listings
AFAIK, so something like basebackup.c sees it, and even if it didn't,
it reads the directory, and then stats the files, and then opens the
files at different times.  The non-public API way to ask for the real
reason after such a failure would apparently be to call
NtFileCreate(), which can return STATUS_DELETE_PENDING.  I don't know
what complications that might involve, but I see now that we have code
that digs such non-public APIs out of ntdll.dll already (for long dead
OS versions only).  Hmm.

(Another thing you can't do is delete the directory that contains such
a file, which is a problem for DROP TABLESPACE and the reason I
developed the global barrier thing.)

> It's harder if the desire is to create a new file of the same name.
> I'm inclined to think that the best answer might be "if it hurts,
> don't do that".  We should not have such a case for ordinary relation
> files or WAL files, but maybe there are individual other cases where
> some redesign is indicated?

I guess GetNewRelFileNode()’s dilemma branch is an example; it'd
probably be nicer to users to treat a pending-deleted file as a
collision.

        if (access(rpath, F_OK) == 0)
        {
            /* definite collision */
            collides = true;
        }
        else
        {
            /*
             * Here we have a little bit of a dilemma: if errno is something
             * other than ENOENT, should we declare a collision and loop? In
             * practice it seems best to go ahead regardless of the errno.  If
             * there is a colliding file we will get an smgr failure when we
             * attempt to create the new relation file.
             */
            collides = false;
        }



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
Next
From: Fujii Masao
Date:
Subject: Re: Fix typo in comments