Re: Race in "tablespace" test on Windows - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Race in "tablespace" test on Windows
Date
Msg-id 20141113031619.GB781371@tornado.leadboat.com
Whole thread Raw
In response to Re: Race in "tablespace" test on Windows  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Race in "tablespace" test on Windows  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Nov 11, 2014 at 10:21:26AM +0530, Amit Kapila wrote:
> On Sat, Nov 8, 2014 at 10:34 AM, Noah Misch <noah@leadboat.com> wrote:
> > Here is a briefer command sequence exhibiting the same problem:
> >
> > CREATE TABLESPACE testspace LOCATION '...somewhere...';
> > CREATE TABLE atable (c int) tablespace testspace;
> > SELECT COUNT(*) FROM atable;    -- open heap
> > \c -
> > ALTER TABLE atable SET TABLESPACE pg_default;
> > DROP TABLESPACE testspace;      -- bug: fails sometimes
> > DROP TABLESPACE testspace;      -- second one ~always works
> > DROP TABLE atable;
> >
> 
> For me, it doesn't get success even second time, I am getting
> the same error until I execute some command on first session
> which means till first session has processed the invalidation
> messages.
> 
> postgres=# Drop tablespace tbs;
> ERROR:  tablespace "tbs" is not empty
> postgres=# Drop tablespace tbs;
> ERROR:  tablespace "tbs" is not empty
> 
> I have tested this on Windows 7.

The behavior you see makes sense if you have a third, idle backend.  I had
only the initial backend and the "\c"-created second one.

> > To make this work as well on Windows as it does elsewhere, DROP TABLESPACE
> > would need to wait for other backends to close relevant unlinked files.
> > Perhaps implement "wait_unlinked_files(const char *dirname)" to poll
> unlinked,
> > open files until they disappear.  (An attempt to open an unlinked file
> reports
> > ERROR_ACCESS_DENIED.  It might be tricky to reliably distinguish this
> cause
> > from other causes of that error, but it should be possible.)
> 
> I think the proposed mechanism can work but the wait can be very long
> (untill the backend holding descriptor executes another command).

The DROP TABLESPACE could send a catchup interrupt.

> Can we think of some other solution like in Drop Tablespace instead of
> checking if directory is empty, check if there is no object that belongs
> to database/cluster, then allow to forcibly delete that directory someway.

I'm not aware of a way to forcibly delete the directory.  One could rename
files to the tablespace top-level directory just before unlinking them.  Since
DROP TABLESPACE never removes that directory, their continued presence there
would not pose a problem.  (Compare use of the rename-before-unlink trick in
RemoveOldXlogFiles().)  That adds the overhead of an additional system call to
every unlink, which might be acceptable.  It may be possible to rename after
unlink, as-needed in DROP TABLESPACE.

> > I propose to add
> > this as a TODO, then bandage the test case with s/^\\c -$/RESET ROLE;/.
> 
> Yeah, this make sense.

Done.



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index
Next
From: Fujii Masao
Date:
Subject: Re: pg_receivexlog --status-interval add fsync feedback