On Wed, Jan 11, 2012 at 12:13 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> At the moment, double-writes are done in one batch, fsyncing the
> double-write area first and the data files immediately after that. That's
> probably beneficial if you have a BBU, and/or a fairly large shared_buffers
> setting, so that pages don't get swapped between OS and PostgreSQL cache too
> much. But when those assumptions don't hold, it would be interesting to
> treat the double-write buffers more like a 2nd WAL for full-page images.
> Whenever a dirty page is evicted from shared_buffers, write it to the
> double-write area, but don't fsync it or write it back to the data file yet.
> Instead, let it sit in the double-write area, and grow the double-write
> file(s) as necessary, until the next checkpoint comes along.
>
> In general, I must say that I'm pretty horrified by all these extra fsync's
> this introduces. You really need a BBU to absorb them, and even then, you're
> fsyncing data files to disk much more frequently than you otherwise would.
Agreed. Almost exactly the design I've been mulling over while waiting
for the patch to get tidied up.
Interestingly, you use the term double write buffer, which is a
concept that doesn't exist in the patch, and should.
You don't say it, but presumably the bgwriter would flush double write
buffers as needed. Perhaps the checkpointer could do that when not, so
we wouldn't need to send as many fsync messages.
Bottom line is that an increased number of fsyncs on main data files
will throw the balance of performance out, so other performance tuning
will go awry.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services