On Mon, Apr 25, 2011 at 8:26 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 04/24/2011 10:06 PM, Daniel Farina wrote:
>>
>> On Thu, Apr 21, 2011 at 8:51 PM, Greg Smith<greg@2ndquadrant.com> wrote:
>>
>>>
>>> There's still the "fsync'd a data block but not the directory entry yet"
>>> issue as fall-out from this too. Why doesn't PostgreSQL run into this
>>> problem? Because the exact code sequence used is this one:
>>>
>>> open
>>> write
>>> fsync
>>> close
>>>
>>> And Linux shouldn't ever screw that up, or the similar rename path.
>>> Here's
>>> what the close man page says, from http://linux.die.net/man/2/close :
>>>
>>
>> Theodore Ts'o addresses this *exact* sequence of events, and suggests
>> if you want that rename to definitely stick that you must fsync the
>> directory:
>>
>>
>> http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync
>>
>
> Not exactly. That's talking about the sequence used for creating a file,
> plus a rename. When new WAL files are being created, I believe the ugly
> part of this is avoided. The path when WAL files are recycled using rename
> does seem to be the one with the most likely edge case.
Hmm, how do we avoid this in the creation case? My current
anticipation is there are cases where you can do open(afile), write(),
fsync(), crash and the file will not be linked, or at the very least,
is *entitled* to not be linked to its parent directory.
The recycling case also sucks.
Would it be insane to use the MTA approach and just use chattr +D? That also
models the behavior on other systems with synchronous directory
modifications, of which (maybe? could very well be wrong) BSD is
included.
--
fdr