DROP DATABASE vs patch to not remove files right away - Mailing list pgsql-hackers

From Tom Lane
Subject DROP DATABASE vs patch to not remove files right away
Date
Msg-id 18026.1208300591@sss.pgh.pa.us
Whole thread Raw
Responses Re: DROP DATABASE vs patch to not remove files right away  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: DROP DATABASE vs patch to not remove files right away  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: DROP DATABASE vs patch to not remove files right away  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
Over the last couple days I twice saw complaints like this during
DROP DATABASE:

WARNING:  could not remove file or directory "base/80750/80825": No such file or directory
WARNING:  could not remove database directory "base/80750"

I poked at it for awhile and was eventually able to extract a
repeatable test case:

while true
do   psql -c "create database foo;" postgres || exit 1   psql -c "create table foo(f1 int primary key);" foo || exit 1
psql -c "drop table foo;" foo || exit 1   psql -c "checkpoint" postgres &   psql -c "drop database foo;" postgres ||
exit1
 
done

On my machine this fairly consistently draws warnings in both 8.3 and
HEAD.  I believe what is happening is that the bgwriter has a
PendingUnlinkEntry for table foo, and completion of the checkpoint
prompts it to exercise that.  Meanwhile in the DROP DATABASE, rmtree is
working through a list of files to drop, and when it hits the
already-deleted one it complains --- and not only does it complain,
it stops trying to delete any more.  (The second WARNING is quite
misleading, because what it really means is "I stopped trying".)

Without the CHECKPOINT, what we get instead is that each cycle builds up
some more PendingUnlinkEntrys, which will all fail when the checkpoint
comes.  The bgwriter is coded to not report ENOENT, so you don't see any
evidence of that, but it's clearly a possible case and the comment
saying it shouldn't happen is misleading.

Actually ... what if the same DB OID and relfilenode get re-made before
the checkpoint?  Then we'd be unlinking live data.  This is improbable
but hardly less so than the scenario the PendingUnlinkEntry code was
put in to prevent.

ISTM that we must fix the bgwriter so that ForgetDatabaseFsyncRequests
causes PendingUnlinkEntrys for the doomed DB to be thrown away too.
This should prevent the unlink-live-data scenario, I think.
Even then, concurrent deletion attempts are probably possible (since
ForgetDatabaseFsyncRequests is asynchronous) and rmtree() is being far
too fragile about dealing with them.  I think that it should be coded
to ignore ENOENT the same as the bgwriter does, and that it should press
on and keep trying to delete things even if it gets a failure.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_terminate_backend() issues
Next
From: Gregory Stark
Date:
Subject: Re: pg_terminate_backend() idea