Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpointer continuous flushing
Date
Msg-id 20160119214321.GE10447@awork2.anarazel.de
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2016-01-19 12:58:38 -0500, Robert Haas wrote:
> This seems like a problem with the WAL writer quite independent of
> anything else.  It seems likely to be inadvertent fallout from this
> patch:
> 
> Author: Simon Riggs <simon@2ndQuadrant.com>
> Branch: master Release: REL9_2_BR [4de82f7d7] 2011-11-13 09:00:57 +0000
> 
>     Wakeup WALWriter as needed for asynchronous commit performance.
>     Previously we waited for wal_writer_delay before flushing WAL. Now
>     we also wake WALWriter as soon as a WAL buffer page has filled.
>     Significant effect observed on performance of asynchronous commits
>     by Robert Haas, attributed to the ability to set hint bits on tuples
>     earlier and so reducing contention caused by clog lookups.

In addition to that the "powersaving" effort also plays a role - without
the latch we'd not wake up at any meaningful rate at all atm.


> If I understand correctly, prior to that commit, WAL writer woke up 5
> times per second and flushed just that often (unless you changed the
> default settings).    But as the commit message explained, that turned
> out to suck - you could make performance go up very significantly by
> radically decreasing wal_writer_delay.  This commit basically lets it
> flush at maximum velocity - as fast as we finish one flush, we can
> start the next.  That must have seemed like a win at the time from the
> way the commit message was written, but you seem to now be seeing the
> opposite effect, where performance is suffering because flushes are
> too frequent rather than too infrequent.  I wonder if there's an ideal
> flush rate and what it is, and how much it depends on what hardware
> you have got.

I think the problem isn't really that it's flushing too much WAL in
total, it's that it's flushing WAL in a too granular fashion. I suspect
we want something where we attempt a minimum number of flushes per
second (presumably tied to wal_writer_delay) and, once exceeded, a
minimum number of pages per flush. I think we even could continue to
write() the data at the same rate as today, we just would need to reduce
the number of fdatasync()s we issue. And possibly could make the
eventual fdatasync()s cheaper by hinting the kernel to write them out
earlier.

Now the question what the minimum number of pages we want to flush for
(setting wal_writer_delay triggered ones aside) isn't easy to answer. A
simple model would be to statically tie it to the size of wal_buffers;
say, don't flush unless at least 10% of XLogBuffers have been written
since the last flush. More complex approaches would be to measure the
continuous WAL writeout rate.

By tying it to both a minimum rate under activity (ensuring things go to
disk fast) and a minimum number of pages to sync (ensuring a reasonable
number of cache flush operations) we should be able to mostly accomodate
the different types of workloads. I think.

Andres



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Next
From: David Rowley
Date:
Subject: Re: Combining Aggregates