Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Date
Msg-id CA+U5nM+GTCjEcpSvb=L-=bZ+6ngnhCsRmVuXEYih_4q2CcMcBw@mail.gmail.com
Whole thread Raw
In response to Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 17 July 2012 23:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> BTW, while we are on the subject: hasn't this split completely broken
>>> the statistics about backend-initiated writes?
>
>> Yes, it seems to have done just that.
>
> So I went to fix this in the obvious way (attached), but while testing
> it I found that the number of buffers_backend events reported during
> a regression test run barely changed; which surprised the heck out of
> me, so I dug deeper.  The cause turns out to be extremely scary:
> ForwardFsyncRequest isn't getting called at all in the bgwriter process,
> because the bgwriter process has a pendingOpsTable.  So it just queues
> its fsync requests locally, and then never acts on them, since it never
> runs any checkpoints anymore.
>
> This implies that nobody has done pull-the-plug testing on either HEAD
> or 9.2 since the checkpointer split went in (2011-11-01), because even
> a modicum of such testing would surely have shown that we're failing to
> fsync a significant fraction of our write traffic.

That problem was reported to me on list some time ago, and I made note
to fix that after last CF.

I added a note to 9.2 open items about it myself, but it appears my
fix was too simple and fixed only the reported problem not the
underlying issue. Reading your patch gave me strong deja vu, so not
sure what happened there.

Not very good from me. Feel free to thwack me to fix such things if I
seem not to respond quickly enough.

I'm now looking at the other open items in my area.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: bgwriter, regression tests, and default shared_buffers settings
Next
From: Samuel Vogel
Date:
Subject: Re: b-tree index search algorithms