Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier? - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date
Msg-id CA+hUKGJ8gSaCcu8ky-UBtdAfyHRGwU9zEgsXQH5SuV3iOLaMGQ@mail.gmail.com
Whole thread Raw
In response to Fix DROP TABLESPACE on Windows with ProcSignalBarrier?  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Sat, Mar 6, 2021 at 12:10 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 3 Mar 2021, at 23:19, Thomas Munro <thomas.munro@gmail.com> wrote:
> > That's why I release and reacquire that LWLock.  But does that break some
> > logic?
>
> One clear change to current behavior is naturally that a concurrent
> TablespaceCreateDbspace can happen while barrier absorption is performed.
> Given where we are that might not be a problem, but I don't have enough
> caffeine at the moment to conclude anything there.  Testing nu inducing
> concurent calls while absorption was stalled didn't trigger anything, but I'm
> sure I didn't test every scenario. Do you see anything off the cuff?

Now I may have the opposite problem (too much coffee) but it looks
like it should work about as well as it does today.  At this new point
where we released the LWLock, all we've really done is possibly unlink
some empty database directories in destroy_tablespace_directories(),
and that's harmless, they'll be recreated on demand if we abandon
ship.  If TablespaceCreateDbspace() happened while we were absorbing
the barrier and not holding the lock in this new code, then a
concurrent mdcreate() is running and so we have a race where we'll
again try to drop all empty directories, and it'll try to create its
relfile in the new empty directory, and one of us will fail (possibly
with an ugly ENOENT error message).  But that's already the case in
the master branch: mdcreate() could have run TablespaceCreateDbspace()
before we acquire the lock in the master branch, and (with
pathological enough scheduling) it could reach its attempt to create
its relfile after DropTableSpace() has unlinked the empty directory.

The interlocking here is hard to follow.  I wonder why we don't use
heavyweight locks to do per-tablespace interlocking between
DefineRelation() and DropTableSpace().  I'm sure this question is
hopelessly naive and I should probably go and read some history.



pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Next
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings