Re: BBU Cache vs. spindles - Mailing list pgsql-performance

From Rob Wultsch
Subject Re: BBU Cache vs. spindles
Date
Msg-id AANLkTinkCRczRWwPFnkyYaE+vX1tRPuB2fmF8=kZooNc@mail.gmail.com
Whole thread Raw
In response to Re: BBU Cache vs. spindles  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: BBU Cache vs. spindles
List pgsql-performance
On Tue, Oct 26, 2010 at 5:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Rob Wultsch <wultsch@gmail.com> wrote:
>>
>>> I would think full_page_writes=off + double write buffer should be
>>> far superior, particularly given that the WAL is shipped over the
>>> network to slaves.
>>
>> For a reasonably brief description of InnoDB double write buffers, I
>> found this:
>>
>> http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/
>>
>> One big question before even considering this would by how to
>> determine whether a potentially torn page "is inconsistent".
>> Without a page CRC or some such mechanism, I don't see how this
>> technique is possible.
>
> There are two sides to this problem: figuring out when to write a page
> to the double write buffer, and figuring out when to read it back from
> the double write buffer.  The first seems easy: we just do it whenever
> we would XLOG a full page image.  As to the second, when we write the
> page out to the double write buffer, we could also write to the double
> write buffer the LSN of the WAL record which depends on that full page
> image.  Then, at the start of recovery, we scan the double write
> buffer and remember all those LSNs.  When we reach one of them, we
> replay the full page image.
>
> The good thing about this is that it would reduce WAL volume; the bad
> thing about it is that it would probably mean doing two fsyncs where
> we only now do one.
>

The double write buffer is one of the few areas where InnoDB does more
IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
checkpoints (which help to keep dirty pages in memory longer),
buffering of writing out changes to secondary indexes, and recently
tunable page level compression.

Given that InnoDB is not shipping its logs across the wire, I don't
think many users would really care if it used the double writer or
full page writes approach to the redo log (other than the fact that
the log files would be bigger). PG on the other hand *is* pushing its
logs over the wire...

--
Rob Wultsch
wultsch@gmail.com

pgsql-performance by date:

Previous
From: Brad Nicholson
Date:
Subject: Re: AIX slow buffer reads
Next
From: Robert Haas
Date:
Subject: Re: BBU Cache vs. spindles