Re: subscriptionCheck failures on nightjar - Mailing list pgsql-hackers

From Tom Lane
Subject Re: subscriptionCheck failures on nightjar
Date
Msg-id 30608.1550080759@sss.pgh.pa.us
Whole thread Raw
In response to Re: subscriptionCheck failures on nightjar  (Andres Freund <andres@anarazel.de>)
Responses Re: subscriptionCheck failures on nightjar
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2019-02-13 12:37:35 -0500, Tom Lane wrote:
>> Bleah.  But in any case, the rename should not create a situation
>> in which we need to fsync the file data again.

> Well, it's not super well defined which of either you need to make the
> rename durable, and it appears to differ between OSs. Any argument
> against fixing it up like I suggested, by using an fd from before the
> rename?

I'm unimpressed.  You're speculating about the filesystem having random
deviations from POSIX behavior, and using that weak argument to justify a
totally untested technique having its own obvious portability hazards.
Who's to say that an fsync on a file opened before a rename is going to do
anything good after the rename?  (On, eg, NFS there are obvious reasons
why it might not.)

Also, I wondered why this is coming out as a PANIC.  I thought originally
that somebody must be causing this code to run in a critical section,
but it looks like the real issue is just that fsync_fname() uses
data_sync_elevel, which is

int
data_sync_elevel(int elevel)
{
    return data_sync_retry ? elevel : PANIC;
}

I really really don't want us doing questionably-necessary fsyncs with a
PANIC as the result.  Perhaps more to the point, the way this was coded,
the PANIC applies to open() failures in fsync_fname_ext() not just fsync()
failures; that's painting with too broad a brush isn't it?

            regards, tom lane


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: reducing isolation tests runtime
Next
From: Tom Lane
Date:
Subject: Re: reducing isolation tests runtime