Re: [PATCHES] Cleaning up unreferenced table files - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] Cleaning up unreferenced table files
Date
Msg-id 12069.1115570122@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCHES] Cleaning up unreferenced table files  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: [PATCHES] Cleaning up unreferenced table files
Re: [PATCHES] Cleaning up unreferenced table files
List pgsql-hackers
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Consider the variant with extra marker files. In that case, backend B 
> doesn't have to know about the .notcommitted status to flush the buffers.

[ shrug ]  It's still broken, and the reason is that there's no
equivalent of fsync for directory operations.  Consider
A creates 1234 and 1234.notcommitted
A commits
B performs a checkpoint
crash

all before A manages to delete 1234.notcommitted, or at least before
that deletion has made its way to disk.  Upon restart, only WAL
events after the checkpoint will be replayed, so 1234.notcommitted
doesn't go away, and then you've got a problem.

To fix this there would need to be a way (1) for B to be aware of the
pending file deletion and (2) for B to delay committing the checkpoint
until the directory update is surely down on disk.  Your proposal
doesn't provide for (1), and even if we fixed that, I know of no
portable kernel API for (2).  fsync isn't applicable.

While your original patch is buggy, it's at least fixable and has
localized, limited impact.  I don't think these schemes are safe
at all --- they put a great deal more weight on the semantics of
the filesystem than I care to do.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Views, views, views! (long)
Next
From: Tom Lane
Date:
Subject: Re: Will new release require an initdb?