Thread: pendingOps table is not cleared with fsync=off

pendingOps table is not cleared with fsync=off

From

Heikki Linnakangas

Date:

08 May 2020, 21:21:18

Hi!

I noticed that commit 3eb77eba5a changed the logic in 
ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off, 
the entries are not removed from the pendingOps hash table. I don't 
think that was intended.

I propose the attached patch to move the test for enableFsync so that 
the entries are removed from pendingOps again. It looks larger than it 
really is because it re-indents the block of code that is now inside the 
"if (enableFsync)" condition.

- Heikki

Attachment

clear-pendingOps-with-fsync-off-1.patch

Re: pendingOps table is not cleared with fsync=off

From

Thomas Munro

Date:

08 May 2020, 23:53:13

On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I noticed that commit 3eb77eba5a changed the logic in
> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
> the entries are not removed from the pendingOps hash table. I don't
> think that was intended.

Perhaps we got confused about what the comment "... so that changing
fsync on the fly behaves sensibly" means.  Fsyncing everything you
missed when you turn it back on after a period running with it off
does sound a bit like behaviour that someone might want or expect,
though it probably isn't really enough to guarantee durability,
because requests queued here aren't the only fsyncs you missed while
you had it off, among other problems.  Unfortunately, if you try that
on an assertion build, you get a failure anyway, so that probably
wasn't deliberate:

TRAP: FailedAssertion("(CycleCtr) (entry->cycle_ctr + 1) ==
sync_cycle_ctr", File: "sync.c", Line: 335)

> I propose the attached patch to move the test for enableFsync so that
> the entries are removed from pendingOps again. It looks larger than it
> really is because it re-indents the block of code that is now inside the
> "if (enableFsync)" condition.

Yeah, I found that git diff/show -w made it easier to understand that
change.  LGTM, though I'd be tempted to use "goto skip" instead of
incurring that much indentation but up to you ...

Re: pendingOps table is not cleared with fsync=off

From

Heikki Linnakangas

Date:

14 May 2020, 05:41:26

On 09/05/2020 02:53, Thomas Munro wrote:
> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I noticed that commit 3eb77eba5a changed the logic in
>> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
>> the entries are not removed from the pendingOps hash table. I don't
>> think that was intended.
> 
> Perhaps we got confused about what the comment "... so that changing
> fsync on the fly behaves sensibly" means.  Fsyncing everything you
> missed when you turn it back on after a period running with it off
> does sound a bit like behaviour that someone might want or expect,
> though it probably isn't really enough to guarantee durability,
> because requests queued here aren't the only fsyncs you missed while
> you had it off, among other problems.

Yeah, you need to run "sync" after turning fsync=on to be safe, there's 
no way around it.

> Unfortunately, if you try that on an assertion build, you get a
> failure anyway, so that probably wasn't deliberate:
> 
> TRAP: FailedAssertion("(CycleCtr) (entry->cycle_ctr + 1) ==
> sync_cycle_ctr", File: "sync.c", Line: 335)

Ah, I didn't notice that.

>> I propose the attached patch to move the test for enableFsync so that
>> the entries are removed from pendingOps again. It looks larger than it
>> really is because it re-indents the block of code that is now inside the
>> "if (enableFsync)" condition.
> 
> Yeah, I found that git diff/show -w made it easier to understand that
> change.  LGTM, though I'd be tempted to use "goto skip" instead of
> incurring that much indentation but up to you ...

I considered a goto too, but I found it confusing. If we need any more 
nesting here in the future, I think extracting the inner parts into a 
function would be good.

Committed, thanks!

- Heikki

Re: pendingOps table is not cleared with fsync=off

From

Tom Lane

Date:

06 August 2020, 15:42:00

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 09/05/2020 02:53, Thomas Munro wrote:
>> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> I noticed that commit 3eb77eba5a changed the logic in
>>> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
>>> the entries are not removed from the pendingOps hash table. I don't
>>> think that was intended.

I'm looking at this commit in connection with writing release notes
for next week's releases.  Am I right in thinking that this bug leads
to indefinite bloat of the pendingOps hash table when fsync is off?
If so, that seems much more worth documenting than the assertion
failure.

            regards, tom lane

Re: pendingOps table is not cleared with fsync=off

From

Heikki Linnakangas

Date:

06 August 2020, 17:35:54

On 06/08/2020 18:42, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 09/05/2020 02:53, Thomas Munro wrote:
>>> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>>> I noticed that commit 3eb77eba5a changed the logic in
>>>> ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
>>>> the entries are not removed from the pendingOps hash table. I don't
>>>> think that was intended.
> 
> I'm looking at this commit in connection with writing release notes
> for next week's releases.  Am I right in thinking that this bug leads
> to indefinite bloat of the pendingOps hash table when fsync is off?
> If so, that seems much more worth documenting than the assertion
> failure.

That's correct.

- Heikki

Re: pendingOps table is not cleared with fsync=off

From

Tom Lane

Date:

06 August 2020, 18:10:15

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 06/08/2020 18:42, Tom Lane wrote:
>> I'm looking at this commit in connection with writing release notes
>> for next week's releases.  Am I right in thinking that this bug leads
>> to indefinite bloat of the pendingOps hash table when fsync is off?
>> If so, that seems much more worth documenting than the assertion
>> failure.

> That's correct.

OK, thanks for confirming.

            regards, tom lane

Re: pendingOps table is not cleared with fsync=off

From

Shawn Debnath

Date:

10 August 2020, 16:35:56

On Sat, May 09, 2020 at 11:53:13AM +1200, Thomas Munro wrote:

> On Sat, May 9, 2020 at 9:21 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > I noticed that commit 3eb77eba5a changed the logic in
> > ProcessSyncRequests() (formerly mdsync()) so that if you have fsync=off,
> > the entries are not removed from the pendingOps hash table. I don't
> > think that was intended.
> 
> Perhaps we got confused about what the comment "... so that changing
> fsync on the fly behaves sensibly" means.  Fsyncing everything you
> missed when you turn it back on after a period running with it off
> does sound a bit like behaviour that someone might want or expect,
> though it probably isn't really enough to guarantee durability,
> because requests queued here aren't the only fsyncs you missed while
> you had it off, among other problems.

Good catch. Question is, are the users aware of the requirement to do a 
manual fsync if they flip the fsync GUC off and then on? Should we do 
this on their behalf to make a good faith attempt to ensure things are 
flushed properly via an assign hook?


-- 
Shawn Debnath
Amazon Web Services (AWS)

Re: pendingOps table is not cleared with fsync=off

From

Tom Lane

Date:

10 August 2020, 18:50:26

Shawn Debnath <sdn@amazon.com> writes:
> Good catch. Question is, are the users aware of the requirement to do a 
> manual fsync if they flip the fsync GUC off and then on? Should we do 
> this on their behalf to make a good faith attempt to ensure things are 
> flushed properly via an assign hook?

No.  Or at least, expecting that you can do that from an assign hook
is impossibly wrong-headed.  GUC assign hooks can't have failure modes.

            regards, tom lane

Re: pendingOps table is not cleared with fsync=off

From

Shawn Debnath

Date:

10 August 2020, 21:02:35

On Mon, Aug 10, 2020 at 02:50:26PM -0400, Tom Lane wrote:
>
> Shawn Debnath <sdn@amazon.com> writes:
> > Good catch. Question is, are the users aware of the requirement to do a
> > manual fsync if they flip the fsync GUC off and then on? Should we do
> > this on their behalf to make a good faith attempt to ensure things are
> > flushed properly via an assign hook?
> 
> No.  Or at least, expecting that you can do that from an assign hook
> is impossibly wrong-headed.  GUC assign hooks can't have failure modes.

Okay agree, will remind myself to drink more coffee next time.

If we think a fsync should be performed in this case, assign hook
could set a value to indicate parameter was reset via SIGHUP. Next call
to ProcessSyncRequests() could check for this, do a fsync prior to
absorbing the newly submitted sync requests, and reset the flag.
fsync_pgdata() comes to mind to be inclusive.

If folks are not inclined to do the fsync, the change is good as is.

-- 
Shawn Debnath
Amazon Web Services (AWS)