Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id CAA4eK1LRXQ8TJ2gGUA3kip0gAZsMpu5AR4CpNpP=r1kAvYMMTw@mail.gmail.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Performance Improvement by reducing WAL for Update Operation  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Tue, Feb 11, 2014 at 10:07 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Feb  5, 2014 at 10:57:57AM -0800, Peter Geoghegan wrote:
>> On Wed, Feb 5, 2014 at 12:50 AM, Heikki Linnakangas
>> <hlinnakangas@vmware.com> wrote:
>> >> I think there's zero overlap. They're completely complimentary features.
>> >> It's not like normal WAL records have an irrelevant volume.
>> >
>> >
>> > Correct. Compressing a full-page image happens on the first update after a
>> > checkpoint, and the diff between old and new tuple is not used in that case.
>>
>> Uh, I really just meant that one thing that might overlap is
>> considerations around the choice of compression algorithm. I think
>> that there was some useful discussion of that on the other thread as
>> well.
>
> Yes, that was my point.  I though the compression of full-page images
> was a huge win and that compression was pretty straight-forward, except
> for the compression algorithm.  If the compression algorithm issue is
> resolved,

By issue, I assume you mean to say, which compression algorithm is
best for this patch.
For this patch, currently we have 2 algorithm's for which results have been
posted. As far as I understand Heikki is pretty sure that the latest algorithm
(compression using prefix-suffix match in old and new tuple) used for this
patch is better than the other algorithm in terms of CPU gain or overhead.
The performance data taken by me for the worst case for this algorithm
shows there is a CPU overhead for this algorithm as well.

OTOH the another algorithm (compression using old tuple as history) can be
a bigger win in terms I/O reduction in more number of cases.

In short, it is still not decided which algorithm to choose and whether
it can be enabled by default or it is better to have table level switch
to enable/disable it.

So I think the decision to be taken here is about below points:
1.  Are we okay with I/O reduction at the expense of CPU for *worst* cases    and I/O reduction without impacting CPU
(betteroverall tps) for    *favourable* cases?
 
2.  If we are not okay with worst case behaviour, then can we provide    a table-level switch, so that it can be
decidedby user?
 
3.  If none of above, then is there any other way to mitigate the worst    case behaviour or shall we just reject this
patchand move on.
 

Given a choice to me, I would like to go with option-2, because I think
for most cases UPDATE statement will have same data for old and
new tuples except for some part of tuple (generally column's having large
text data are not modified), so we will be end up mostly in favourable cases
and surely for worst cases we don't want user to suffer from CPU overhead,
so a table-level switch is also required.

I think here one might argue that for some users it is not feasible to
decide whether their tuples data for UPDATE is going to be similar
or completely different and they are not at all ready for any risk for
CPU overhead, but they would be happy to see I/O reduction in which
case it is difficult to decide what should be the value of table-level
switch. Here I think the only answer is "nothing is free" in this world,
so either make sure about the application's behaviour for UPDATE
statement before going to production or just don't enable this switch and
be happy with the current behaviour.

On the other side there will be users who will be pretty certain about their
usage of UPDATE statement or atleast are ready to evaluate their
application if they can get such a huge gain, so it would be quite useful
feature for such users.

>can we move move forward with the full-page compression patch?

In my opinion, it is not certain that whatever compression algorithm got
decided for this patch (if any) can be directly used for full-page
compression, some ideas could be used or may be the algorithm could be
tweaked a bit to make it usable for full-page compression.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Inoue, Hiroshi"
Date:
Subject: Re: narwhal and PGDLLIMPORT
Next
From: Amit Kapila
Date:
Subject: Re: narwhal and PGDLLIMPORT