Re: tackling full page writes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: tackling full page writes
Date
Msg-id BANLkTimnv6SeMGeK9HaukDc=X=VU4aj7=g@mail.gmail.com
Whole thread Raw
In response to Re: tackling full page writes  (Bruce Momjian <bruce@momjian.us>)
Responses Re: tackling full page writes  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Tue, May 24, 2011 at 11:52 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> 2. The other fairly obvious alternative is to adjust our existing WAL
>> record types to be idempotent - i.e. to not rely on the existing page
>> contents.  For XLOG_HEAP_INSERT, we currently store the target tid and
>> the tuple contents.  I'm not sure if there's anything else, but we
>> would obviously need the offset where the new tuple should be written,
>> which we currently infer from reading the existing page contents.  For
>> XLOG_HEAP_DELETE, we store just the TID of the target tuple; we would
>> certainly need to store its offset within the block, and maybe the
>> infomask.  For XLOG_HEAP_UPDATE, we'd need the old and new offsets and
>> perhaps also the old and new infomasks.  Assuming that's all we need
>> and I'm not missing anything (which I won't bet on), that means we'd
>> be adding, say, 4 bytes per insert or delete and 8 bytes per update.
>> So, if checkpoints are spread out widely enough that there will be
>> more than ~2K operations per page between checkpoints, then it makes
>> more sense to just do a full page write and call it good.  If not,
>> this idea might have legs.
>
> I vote for "wal_level = idempotent" because so few people will know what
> idempotent means.  ;-)

That idea has the additional advantage of confusing the level of
detail of our WAL logging (minimal vs. archive vs. hot standby) with
the mechanism used to protect against torn pages (full page writes vs.
idempotent WAL records vs. prayer).  When they set it wrong and
destroy their system, we can tell them it's their own fault for not
configuring the system properly!  Bwahahahaha!

In all seriousness, I can't imagine that we'd make this
user-configurable in the first place, since that would amount to
having two sets of WAL records each of which would be even less well
tested than what we have now; and for a project this complex, we
probably shouldn't even consider changing things that seem to work now
unless the new system is clearly better than the old.

> Idempotent does seem like the most promising idea.

I tend to agree with you, but I'm worried it won't actually work out
to a win.  By the time we augment the records with enough additional
information we may have eaten up a lot of the benefit we were hoping
to get.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: tackling full page writes
Next
From: Robert Haas
Date:
Subject: Re: Reducing overhead of frequent table locks