Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 20140212144918.GB12551@momjian.us
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Feb 12, 2014 at 10:02:32AM +0530, Amit Kapila wrote:
> By issue, I assume you mean to say, which compression algorithm is
> best for this patch.
> For this patch, currently we have 2 algorithm's for which results have been
> posted. As far as I understand Heikki is pretty sure that the latest algorithm
> (compression using prefix-suffix match in old and new tuple) used for this
> patch is better than the other algorithm in terms of CPU gain or overhead.
> The performance data taken by me for the worst case for this algorithm
> shows there is a CPU overhead for this algorithm as well.
> 
> OTOH the another algorithm (compression using old tuple as history) can be
> a bigger win in terms I/O reduction in more number of cases.
> 
> In short, it is still not decided which algorithm to choose and whether
> it can be enabled by default or it is better to have table level switch
> to enable/disable it.
> 
> So I think the decision to be taken here is about below points:
> 1.  Are we okay with I/O reduction at the expense of CPU for *worst* cases
>      and I/O reduction without impacting CPU (better overall tps) for
>      *favourable* cases?
> 2.  If we are not okay with worst case behaviour, then can we provide
>      a table-level switch, so that it can be decided by user?
> 3.  If none of above, then is there any other way to mitigate the worst
>      case behaviour or shall we just reject this patch and move on.
> 
> Given a choice to me, I would like to go with option-2, because I think
> for most cases UPDATE statement will have same data for old and
> new tuples except for some part of tuple (generally column's having large
> text data are not modified), so we will be end up mostly in favourable cases
> and surely for worst cases we don't want user to suffer from CPU overhead,
> so a table-level switch is also required.

I think 99.9% of users are never going to adjust this so we had better
choose something we are happy to enable for effectively everyone.  In my
reading, prefix/suffix seemed safe for everyone.  We can always revisit
this if we think of something better later, as WAL format changes are not
a problem for pg_upgrade.

I also think making it user-tunable is so hard for users to know when to
adjust as to be almost not worth the user interface complexity it adds.

I suggest we go with always-on prefix/suffix mode, then add some check
so the worst case is avoided by just giving up on compression.

As I said previously, I think compressing the page images is the next
big win in this area.

> I think here one might argue that for some users it is not feasible to
> decide whether their tuples data for UPDATE is going to be similar
> or completely different and they are not at all ready for any risk for
> CPU overhead, but they would be happy to see I/O reduction in which
> case it is difficult to decide what should be the value of table-level
> switch. Here I think the only answer is "nothing is free" in this world,
> so either make sure about the application's behaviour for UPDATE
> statement before going to production or just don't enable this switch and
> be happy with the current behaviour.

Again, can't set do a minimal attempt at prefix/suffix compression so
there is no measurable overhead?

> On the other side there will be users who will be pretty certain about their
> usage of UPDATE statement or atleast are ready to evaluate their
> application if they can get such a huge gain, so it would be quite useful
> feature for such users.
> 
> >can we move move forward with the full-page compression patch?
> 
> In my opinion, it is not certain that whatever compression algorithm got
> decided for this patch (if any) can be directly used for full-page
> compression, some ideas could be used or may be the algorithm could be
> tweaked a bit to make it usable for full-page compression.

Thanks, I understand that now.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary
Next
From: Kohei KaiGai
Date:
Subject: Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)