Re: DROP DATABASE vs patch to not remove files right away - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: DROP DATABASE vs patch to not remove files right away
Date
Msg-id 4805B282.3090203@enterprisedb.com
Whole thread Raw
In response to DROP DATABASE vs patch to not remove files right away  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: DROP DATABASE vs patch to not remove files right away  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> ISTM that we must fix the bgwriter so that ForgetDatabaseFsyncRequests
> causes PendingUnlinkEntrys for the doomed DB to be thrown away too.
> This should prevent the unlink-live-data scenario, I think.
> Even then, concurrent deletion attempts are probably possible (since
> ForgetDatabaseFsyncRequests is asynchronous) and rmtree() is being far
> too fragile about dealing with them.  I think that it should be coded
> to ignore ENOENT the same as the bgwriter does, and that it should press
> on and keep trying to delete things even if it gets a failure.

Yep. I can write a patch for that, unless you're onto it already?

However, this makes me reconsider Florian's suggestion to just make
relfilenode larger and avoid reusing them altogether. It would simplify
the code quite a bit, and make it more robust. That is good because even 
if we fix these problems per your suggestion, I'm left wondering if 
we've missed some even weirder corner-cases.

Florian suggested a scheme where the xid and epoch is embedded in the 
filename, but that's unnecessarily complex. We could just make 
relfilenode a 64-bit integer. 2^64 should be enough for everyone.

You listed these problems with Florian's suggestion back then:

> 1. Zero chance of ever backpatching.  (I know I said I wasn't excited
>    about that, but it's still a strike against a proposed fix.)

Still true. We would need to do what you suggested for 8.3, but 
simplifying the code would be good thing in the long run.

> 2. Adds new fields to RelFileNode, which will be a major code change,
>    and possibly a noticeable performance hit (bigger hashtable keys).

We talked about this wrt. map forks, and concluded that it's not an 
issue. If we add the map forks as well, BufferTag struct would grow from  16 bytes to 24 bytes. It's worth doing some
moremicro-benchmarking 
 
with that, but it's probably acceptable. Or we could allocate a few bits 
of the 64-bit relfilenode field in RelFileNode to indicate the map fork.

> 3. Adds new columns to pg_class, which is a real PITA ...

We would only have to change relfilenode from oid to int64.

> 4. Breaks oid2name and all similar code that knows about relfilenode.

True, but they're not hard to fix.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Problem with site doc search
Next
From: Magnus Hagander
Date:
Subject: Re: Problem with site doc search