bgwriter idle-mode behavior (was Re: Latch for the WAL writer) - Mailing list pgsql-hackers

From Tom Lane
Subject bgwriter idle-mode behavior (was Re: Latch for the WAL writer)
Date
Msg-id 14938.1336599266@sss.pgh.pa.us
Whole thread Raw
In response to Re: Latch for the WAL writer - further reducing idle wake-ups.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: bgwriter idle-mode behavior (was Re: Latch for the WAL writer)
List pgsql-hackers
I wrote:
> I believe the bgwriter code needs to be fixed similarly to the way
> I changed the walwriter patch, namely that there needs to be a separate
> flag (not the latch's isSet state) advertising whether the bgwriter is
> currently in hibernation mode.

After further study of the bgwriter hibernation patch (commit
6d90eaaa89a007e0d365f49d6436f35d2392cfeb), I think that my worries about
race conditions in the use of the bgwriter's latch are really the least
of its problems.  BgBufferSync embodies a rather carefully tuned
feedback control loop, and I think these changes broke it.  In the first
place, that feedback loop is built on the assumption that BgBufferSync
is executed at fixed intervals, which isn't true anymore; and in the
second place, the loop is not driven so much by the rate of buffers
being dirtied as it is by the rate of buffers being allocated.  To be
concrete, if there is a constant but slow rate of buffer consumption,
the bgwriter will eventually lap the strategy scan and then stay there,
resulting in BgBufferSync returning true every time even though actually
the system is doing things.  This results in bgwriter.c deciding it's in
hibernation mode, whereupon we have a scenario where backends will be
signaling it all the time.  The way BgWriterNap is coded, that means
BgBufferSync is not executed at a fixed BgWriterDelay interval, but at
variable intervals from BgWriterDelay up to BGWRITER_HIBERNATE_MS, which
pretty much destroys the predictability of the feedback loop.

My proposal for fixing this is that

(1) BgBufferSync should return true (OK to hibernate) only if it's
lapped the strategy scan *and* recent_alloc is zero, meaning no new
buffers were allocated anywhere since last time.

(2) We should remove the bgwriter wakening calls from MarkBufferDirty
and SetBufferCommitInfoNeedsSave, and instead place one in buffer
allocation.

(3) The bottom-of-loop logic in bgwriter should be along the lines of
rc = WaitLatch(..., BgWriterDelay);if (rc == WL_TIMEOUT && can_hibernate){    set global flag to tell backends to kick
bgwriter       if they allocate a buffer;    WaitLatch(..., BGWRITER_HIBERNATE_MS);    clear global flag;}
 

In comparison to the existing code, this method guarantees immediate
response to any signal (latch-setting event), and it ensures that
if we extend the normal sleep time for the bgwriter, the extra sleep
covers only an interval in which no new buffer allocations happened.
That provision seems to me to justify pretending that that interval
simply didn't exist for the purposes of the feedback control loop,
which allows us to not have to rethink how that loop works.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: synchronous_commit and remote_write
Next
From: Bruce Momjian
Date:
Subject: Re: synchronous_commit and remote_write