Thread: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
Hello,

In tablespace.c, a comment explains that DROP TABLESPACE can fail
bogusly because of Windows file semantics:

         * XXX On Windows, an unlinked file persists in the directory listing
         * until no process retains an open handle for the file.  The DDL
         * commands that schedule files for unlink send invalidation messages
         * directing other PostgreSQL processes to close the files.  DROP
         * TABLESPACE should not give up on the tablespace becoming empty
         * until all relevant invalidation processing is complete.

While trying to get the AIO patchset working on more operating
systems, this turned out to be a problem.  Andres mentioned the new
ProcSignalBarrier stuff as a good way to tackle this, so I tried it
and it seems to work well so far.

The idea in this initial version is to tell every process in the
cluster to close all fds, and then try again.  That's a pretty large
hammer, but it isn't reached on Unix, and with slightly more work it
could be made to happen only after 2 failures on Windows.  It was
tempting to try to figure out how to use the sinval mechanism to close
precisely the right files instead, but it doesn't look safe to run
sinval at arbitrary CFI() points.  It's easier to see that the
pre-existing closeAllVfds() function has an effect that is local to
fd.c and doesn't affect the VFDs or SMgrRelations, so any CFI() should
be an OK time to run that.

While reading the ProcSignalBarrier code, I couldn't resist replacing
its poll/sleep loop with condition variables.

Thoughts?

Attachment

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
> While reading the ProcSignalBarrier code, I couldn't resist replacing
> its poll/sleep loop with condition variables.

Oops, that version accidentally added and then removed an unnecessary
change due to incorrect commit squashing.  Here's a better pair of
patches.

Attachment

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Andres Freund
Date:
Hi,

Thanks for developing this.

On 2021-01-31 02:11:06 +1300, Thomas Munro wrote:
> --- a/src/backend/commands/tablespace.c
> +++ b/src/backend/commands/tablespace.c
> @@ -520,15 +520,23 @@ DropTableSpace(DropTableSpaceStmt *stmt)
>           * but we can't tell them apart from important data files that we
>           * mustn't delete.  So instead, we force a checkpoint which will clean
>           * out any lingering files, and try again.
> -         *
> -         * XXX On Windows, an unlinked file persists in the directory listing
> -         * until no process retains an open handle for the file.  The DDL
> -         * commands that schedule files for unlink send invalidation messages
> -         * directing other PostgreSQL processes to close the files.  DROP
> -         * TABLESPACE should not give up on the tablespace becoming empty
> -         * until all relevant invalidation processing is complete.
>           */
>          RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
> +        /*
> +         * On Windows, an unlinked file persists in the directory listing until
> +         * no process retains an open handle for the file.  The DDL
> +         * commands that schedule files for unlink send invalidation messages
> +         * directing other PostgreSQL processes to close the files, but nothing
> +         * guarantees they'll be processed in time.  So, we'll also use a
> +         * global barrier to ask all backends to close all files, and wait
> +         * until they're finished.
> +         */
> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
> +        LWLockRelease(TablespaceCreateLock);
> +        WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
> +        LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
> +#endif
> +        /* And now try again. */
>          if (!destroy_tablespace_directories(tablespaceoid, false))
>          {
>              /* Still not empty, the files must be important then */

It's probably rare enough to care, but this still made me thing whether
we could avoid the checkpoint at all somehow. Requiring an immediate
checkpoint for dropping relations is quite a heavy hammer that
practically cannot be used in production without causing performance
problems. But it seems hard to process the fsync deletion queue in
another way.


> diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
> index 4dc24649df..0f8548747c 100644
> --- a/src/backend/storage/smgr/smgr.c
> +++ b/src/backend/storage/smgr/smgr.c
> @@ -298,6 +298,12 @@ smgrcloseall(void)
>          smgrclose(reln);
>  }
>  
> +void
> +smgrrelease(void)
> +{
> +    mdrelease();
> +}

Probably should be something like
    for (i = 0; i < NSmgr; i++)
    {
        if (smgrsw[i].smgr_release)
            smgrsw[i].smgr_release();
    }

Greetings,

Andres Freund



Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
On Tue, Feb 2, 2021 at 8:02 AM Andres Freund <andres@anarazel.de> wrote:
> It's probably rare enough to care, but this still made me thing whether
> we could avoid the checkpoint at all somehow. Requiring an immediate
> checkpoint for dropping relations is quite a heavy hammer that
> practically cannot be used in production without causing performance
> problems. But it seems hard to process the fsync deletion queue in
> another way.

Right, the checkpoint itself is probably worse than this
"close-all-your-files!" thing in some cases (though it seems likely
that once we start using ProcSignalBarrier we're going to find out
about places that take a long time to get around to processing them
and that's going to be a thing to work on).  As a separate project,
perhaps we should find some other way to stop GetNewRelFileNode() from
recycling the relfilenode until the next checkpoint, so that we can
unlink the file eagerly at commit time, while still avoiding the
hazard described in the comment for mdunlink().  A straw-man idea
would be to touch a file under PGDATA/pg_dropped and fsync it so it
survives a power outage, have checkpoints clean that out, and have
GetNewRelFileNode() to try access() it.  Then we wouldn't need the
checkpoint here, I think; we'd just need this ProcSignalBarrier for
Windows.

> > +void
> > +smgrrelease(void)
> > +{
> > +     mdrelease();
> > +}
>
> Probably should be something like
>         for (i = 0; i < NSmgr; i++)
>         {
>                 if (smgrsw[i].smgr_release)
>                         smgrsw[i].smgr_release();
>         }

Fixed.  Thanks!

Attachment

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
On Tue, Feb 2, 2021 at 11:16 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> ...  A straw-man idea
> would be to touch a file under PGDATA/pg_dropped and fsync it so it
> survives a power outage, have checkpoints clean that out, and have
> GetNewRelFileNode() to try access() it.  ...

I should add, the reason I mentioned fsyncing it is that in another
thread we've also discussed making the end-of-crash-recovery
checkpoint optional, and then I think you'd need to be sure you can
avoid reusing the relfilenode even after crash recovery, because if
you recycle the relfilenode and then crash again you'd be exposed to
that hazard during the 2nd run thought recovery.  But perhaps it's
enough to recreate the hypothetical pg_dropped file while replaying
the drop-relation record.  Not sure, would need more thought.



Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
On Sat, Mar 6, 2021 at 12:10 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 3 Mar 2021, at 23:19, Thomas Munro <thomas.munro@gmail.com> wrote:
> > That's why I release and reacquire that LWLock.  But does that break some
> > logic?
>
> One clear change to current behavior is naturally that a concurrent
> TablespaceCreateDbspace can happen while barrier absorption is performed.
> Given where we are that might not be a problem, but I don't have enough
> caffeine at the moment to conclude anything there.  Testing nu inducing
> concurent calls while absorption was stalled didn't trigger anything, but I'm
> sure I didn't test every scenario. Do you see anything off the cuff?

Now I may have the opposite problem (too much coffee) but it looks
like it should work about as well as it does today.  At this new point
where we released the LWLock, all we've really done is possibly unlink
some empty database directories in destroy_tablespace_directories(),
and that's harmless, they'll be recreated on demand if we abandon
ship.  If TablespaceCreateDbspace() happened while we were absorbing
the barrier and not holding the lock in this new code, then a
concurrent mdcreate() is running and so we have a race where we'll
again try to drop all empty directories, and it'll try to create its
relfile in the new empty directory, and one of us will fail (possibly
with an ugly ENOENT error message).  But that's already the case in
the master branch: mdcreate() could have run TablespaceCreateDbspace()
before we acquire the lock in the master branch, and (with
pathological enough scheduling) it could reach its attempt to create
its relfile after DropTableSpace() has unlinked the empty directory.

The interlocking here is hard to follow.  I wonder why we don't use
heavyweight locks to do per-tablespace interlocking between
DefineRelation() and DropTableSpace().  I'm sure this question is
hopelessly naive and I should probably go and read some history.



Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
Just as an FYI: this entry currently fails with "Timed out!" on cfbot
because of an oversight in the master branch[1], AFAICS.  It should
pass again once that's fixed.

[1] https://www.postgresql.org/message-id/CA%2BhUKGLah2w1pWKHonZP_%2BEQw69%3Dq56AHYwCgEN8GDzsRG_Hgw%40mail.gmail.com



Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Robert Haas
Date:
On Sun, Jun 13, 2021 at 8:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Just as an FYI: this entry currently fails with "Timed out!" on cfbot
> because of an oversight in the master branch[1], AFAICS.  It should
> pass again once that's fixed.
>
> [1] https://www.postgresql.org/message-id/CA%2BhUKGLah2w1pWKHonZP_%2BEQw69%3Dq56AHYwCgEN8GDzsRG_Hgw%40mail.gmail.com

That's fixed now. So what should we do about this patch? This is a
bug, so it would be nice to do *something*. I don't really like the
fact that this makes the behavior contingent on USE_ASSERT_CHECKING,
and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE
which by default gets defined on WIN32, but can be defined elsewhere
if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h).
Furthermore, I can't see back-patching this, given that it would be
the very first use of the barrier machinery. But I think it would be
good to get something into master, because then we'd actually be using
this procsignalbarrier stuff for something. On a good day we've fixed
a bug. On a bad day we'll learn something new about how
procsignalbarrier needs to work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

From
Thomas Munro
Date:
On Thu, Jan 6, 2022 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
> That's fixed now. So what should we do about this patch? This is a
> bug, so it would be nice to do *something*. I don't really like the
> fact that this makes the behavior contingent on USE_ASSERT_CHECKING,
> and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE
> which by default gets defined on WIN32, but can be defined elsewhere
> if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h).

Ok, done like that.

> Furthermore, I can't see back-patching this, given that it would be
> the very first use of the barrier machinery. But I think it would be
> good to get something into master, because then we'd actually be using
> this procsignalbarrier stuff for something. On a good day we've fixed
> a bug. On a bad day we'll learn something new about how
> procsignalbarrier needs to work.

Agreed.

Pushed.  The basic Windows/tablespace bug seen occasionally in CI[1]
should now be fixed.

For the sake of the archives, here's a link to the ongoing discussion
about further potential uses of this mechanism:

https://www.postgresql.org/message-id/flat/20220209220004.kb3dgtn2x2k2gtdm%40alap3.anarazel.de

[1] https://www.postgresql.org/message-id/CA%2BhUKGJp-m8uAD_wS7%2BdkTgif013SNBSoJujWxvRUzZ1nkoUyA%40mail.gmail.com