Re: Keeping separate WAL segments for each database - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Keeping separate WAL segments for each database
Date
Msg-id AANLkTimPTN_iNnudLTh_htzdWUNs5MWEhBNiHYgwfD5A@mail.gmail.com
Whole thread Raw
In response to Re: Keeping separate WAL segments for each database  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sat, Jul 3, 2010 at 10:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-06-30 at 22:21 -0400, Tom Lane wrote:
>
>> What about having a single WAL stream for all commit records (thereby
>> avoiding any possible xact-serialization funnies) and other WAL
>> records
>> divided up among multiple streams in some fashion or other?  A commit
>> record would bear minimum-LSN pointers for all the streams that its
>> transaction had written to.  Things like HEAP_CLEAN records would bear
>> minimum-LSN pointers for the commit stream.  Workable?
>
> I'm interested in the idea of putting full page writes into one stream
> and all other WAL records into another.
>
> That would allow us to stream less data for log shipping.

Yeah, that would be great.  Heikki and I were discussing this a bit.
I think the standby can have problems with torn pages, too, but maybe
the full page writes could be moved completely outside the xlog
system.  In other words, you write the full-page writes to a separate
log which is maintained locally, and independently, on the master and
standby, and which (I think) can be recycled after each
checkpoint/restartpoint.  You'd have to figure out when to refer back
to that log during redo, of course; I'm not sure if that would require
fencepost records of some kind or just careful accounting.  One
disadvantage of this approach is that you'd be fsync-ing the WAL and
the full-page-log separately - I'm not sure whether that would suck or
not.

An even more radical idea is to try to find a way to reduce the need
for full page writes or even eliminate them altogether.  If a
particular WAL record can be replayed without reference to the
existing page contents, it's invincible to any problem that might be
caused by a torn page.  So I suppose if all of our records had that
property, we wouldn't need this.  Or maybe we could decide that xlog
is allowed to rely on the first n bytes of the page but not the full
contents, and then instead of full page writes, just xlog that much of
it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Keepalive for max_standby_delay
Next
From: Tom Lane
Date:
Subject: Re: Keepalive for max_standby_delay