Re: [PATCHES] Full page writes improvement, code update - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PATCHES] Full page writes improvement, code update
Date
Msg-id 1176478259.3635.176.camel@silverbirch.site
Whole thread Raw
In response to Re: [PATCHES] Full page writes improvement, code update  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCHES] Full page writes improvement, code update  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PATCHES] Full page writes improvement, code update  (Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp>)
List pgsql-hackers
On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:
> "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:
> > But you also turn off the optimization that avoids writing regular
> > WAL records when the info is already contained in a full-page image
> > (increasing the uncompressed size of WAL).
> > It was that part I questioned.

I think its right to question it, certainly.

> That's what bothers me about this patch, too.  It will be increasing
> the cost of writing WAL (more data -> more CRC computation and more
> I/O, not to mention more contention for the WAL locks) which translates
> directly to a server slowdown.

I don't really understand this concern. Koichi-san has included a
parameter setting that would prevent any change at all in the way WAL is
written. If you don't want this slight increase in WAL, don't enable it.
If you do enable it, you'll also presumably be compressing the xlog too,
which works much better than gzip using less CPU. So overall it saves
more than it costs, ISTM, and nothing at all if you choose not to use
it.

> The main arguments that I could see against Andreas' alternative are:
>
> 1. Some WAL record types are arranged in a way that actually would not
> permit the reconstruction of the short form from the long form, because
> they throw away too much data when the full-page image is substituted.
> An example that's fresh in my mind is that the current format of the
> btree page split WAL record discards newitemoff in that case, so you
> couldn't identify the inserted item in the page image.  Now this is only
> saving two bytes in what's usually going to be a darn large record
> anyway, and it complicates the code to do it, so I wouldn't cry if we
> changed btree split to include newitemoff always.  But there might be
> some other cases where more data is involved.  In any case, someone
> would have to look through every single WAL record type to determine
> whether reconstruction is possible and fix it if not.
>
> 2. The compresslog utility would have to have specific knowledge about
> every compressible WAL record type, to know how to convert it to the
> short format.  That means an ongoing maintenance commitment there.
> I don't think this is unacceptable, simply because we need only teach
> it about a few common record types, not everything under the sun ---
> anything it doesn't know how to fix, just leave alone, and if it's an
> uncommon record type it really doesn't matter.  (I guess that means
> that we don't really have to do #1 for every last record type, either.)
>
> So I don't think either of these is a showstopper.  Doing it this way
> would certainly make the patch more acceptable, since the argument that
> it might hurt rather than help performance in some cases would go away.

Yeh, its additional code paths, but it sounds like Koichi-san and
colleagues are going to be trail blazing any bugs there and will be
around to fix any more that emerge.

> > What about disconnecting WAL LSN from physical WAL record position
> > during replay ?
> > Add simple short WAL records in pg_compresslog like: advance LSN by 8192
> > bytes.
>
> I don't care for that, as it pretty much destroys some of the more
> important sanity checks that xlog replay does.  The page boundaries
> need to match the records contained in them.  So I think we do need
> to have pg_decompresslog insert dummy WAL entries to fill up the
> space saved by omitting full pages.

Agreed. I don't want to start touching something that works so well.


We've been thinking about doing this for at least 3 years now, so I
don't see any reason to baulk at it now. I'm happy with Koichi-san's
patch as-is, assuming further extensive testing will be carried out on
it during beta.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Hiroshi Saito"
Date:
Subject: Re: Vista/IPv6
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] Full page writes improvement, code update