Re: [HACKERS] Full page writes improvement, code update - Mailing list pgsql-patches

From Koichi Suzuki
Subject Re: [HACKERS] Full page writes improvement, code update
Date
Msg-id 46288450.1050703@oss.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Full page writes improvement, code update  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-patches
Hi,

I agree that pg_compresslog should be aware of all the WAL records'
details so that it can optimize archive log safely.   In my patch, I've
examined 8.2's WAL records to make pg_compresslog/pg_decompresslog safe.

Also I agree further pg_compresslog maintenance needs to examine changes
in WAL record format.   Because the number of such format will be
limited, I think the amount of work will be reasonable enough.

Regards;

Simon Riggs wrote:
> On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:
>> "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:
>>> But you also turn off the optimization that avoids writing regular
>>> WAL records when the info is already contained in a full-page image
>>> (increasing the uncompressed size of WAL).
>>> It was that part I questioned.
>
> I think its right to question it, certainly.
>
>> That's what bothers me about this patch, too.  It will be increasing
>> the cost of writing WAL (more data -> more CRC computation and more
>> I/O, not to mention more contention for the WAL locks) which translates
>> directly to a server slowdown.
>
> I don't really understand this concern. Koichi-san has included a
> parameter setting that would prevent any change at all in the way WAL is
> written. If you don't want this slight increase in WAL, don't enable it.
> If you do enable it, you'll also presumably be compressing the xlog too,
> which works much better than gzip using less CPU. So overall it saves
> more than it costs, ISTM, and nothing at all if you choose not to use
> it.
>
>> The main arguments that I could see against Andreas' alternative are:
>>
>> 1. Some WAL record types are arranged in a way that actually would not
>> permit the reconstruction of the short form from the long form, because
>> they throw away too much data when the full-page image is substituted.
>> An example that's fresh in my mind is that the current format of the
>> btree page split WAL record discards newitemoff in that case, so you
>> couldn't identify the inserted item in the page image.  Now this is only
>> saving two bytes in what's usually going to be a darn large record
>> anyway, and it complicates the code to do it, so I wouldn't cry if we
>> changed btree split to include newitemoff always.  But there might be
>> some other cases where more data is involved.  In any case, someone
>> would have to look through every single WAL record type to determine
>> whether reconstruction is possible and fix it if not.
>>
>> 2. The compresslog utility would have to have specific knowledge about
>> every compressible WAL record type, to know how to convert it to the
>> short format.  That means an ongoing maintenance commitment there.
>> I don't think this is unacceptable, simply because we need only teach
>> it about a few common record types, not everything under the sun ---
>> anything it doesn't know how to fix, just leave alone, and if it's an
>> uncommon record type it really doesn't matter.  (I guess that means
>> that we don't really have to do #1 for every last record type, either.)
>>
>> So I don't think either of these is a showstopper.  Doing it this way
>> would certainly make the patch more acceptable, since the argument that
>> it might hurt rather than help performance in some cases would go away.
>
> Yeh, its additional code paths, but it sounds like Koichi-san and
> colleagues are going to be trail blazing any bugs there and will be
> around to fix any more that emerge.
>
>>> What about disconnecting WAL LSN from physical WAL record position
>>> during replay ?
>>> Add simple short WAL records in pg_compresslog like: advance LSN by 8192
>>> bytes.
>> I don't care for that, as it pretty much destroys some of the more
>> important sanity checks that xlog replay does.  The page boundaries
>> need to match the records contained in them.  So I think we do need
>> to have pg_decompresslog insert dummy WAL entries to fill up the
>> space saved by omitting full pages.
>
> Agreed. I don't want to start touching something that works so well.
>
>
> We've been thinking about doing this for at least 3 years now, so I
> don't see any reason to baulk at it now. I'm happy with Koichi-san's
> patch as-is, assuming further extensive testing will be carried out on
> it during beta.
>


--
-------------
Koichi Suzuki

pgsql-patches by date:

Previous
From: Zoltan Boszormenyi
Date:
Subject: Re: [HACKERS] parser dilemma
Next
From: Koichi Suzuki
Date:
Subject: Re: [HACKERS] Full page writes improvement, code update