Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Gavin Flower
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 52F31990.6060003@archidevsys.co.nz
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 06/02/14 16:59, Robert Haas wrote:
> On Wed, Feb 5, 2014 at 6:43 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> So, I came up with the attached worst case test, modified from your latest
>> test suite.
>>
>> unpatched:
>>
>>
>>                 testname               | wal_generated |     duration
>> --------------------------------------+---------------+------------------
>>   ten long fields, all but one changed |     343385312 | 2.20806908607483
>>   ten long fields, all but one changed |     336263592 | 2.18997097015381
>>   ten long fields, all but one changed |     336264504 | 2.17843413352966
>> (3 rows)
>>
>>
>> pgrb_delta_encoding_v8.patch:
>>
>>                 testname               | wal_generated |     duration
>> --------------------------------------+---------------+------------------
>>   ten long fields, all but one changed |     338356944 | 3.33501315116882
>>   ten long fields, all but one changed |     344059272 | 3.37364101409912
>>   ten long fields, all but one changed |     336257840 | 3.36244201660156
>> (3 rows)
>>
>> So with this test, the overhead is very significant.
> Yuck.  Well that sucks.
>
>> With the skipping logic, another kind of "worst case" case is that you have
>> a lot of similarity between the old and new tuple, but you miss it because
>> you skip. For example, if you change the first few columns, but leave a
>> large text column at the end of the tuple unchanged.
> I suspect there's no way to have our cake and eat it, too.  Most of
> the work that Amit has done on this patch in the last few revs is to
> cut back CPU overhead in the cases where the patch can't help because
> the tuple has been radically modified.  If we're trying to get maximum
> compression, we need to go the other way: for example, we could just
> feed both the old and new tuples through pglz (or snappy, or
> whatever).  That would allow us to take advantage not only of
> similarity between the old and new tuples but also internal
> duplication within either the old or the new tuple, but it would also
> cost more CPU.  The concern with minimizing overhead in cases where
> the compression doesn't help has thus far pushed us in the opposite
> direction, namely passing over compression opportunities that a more
> aggressive algorithm could find in order to keep the overhead low.
>
> Off-hand, I'm wondering why we shouldn't apply the same skipping
> algorithm that Amit is using at the beginning of the string for the
> rest of it as well.  It might be a little too aggressive (maybe the
> skip distance shouldn't increase by quite as much as doubling every
> time, or not beyond 16/32 bytes?) but I don't see why the general
> principle isn't sound wherever we are in the tuple.
>
> Unfortunately, despite changing things to make a history entry only
> every 4th character, building the history is still pretty expensive.
> By the time we even begin looking at the tuple we're gonna compress,
> we've already spent something like half the total effort, and of
> course we have to go further than that before we know whether our
> attempt to compress is actually going anywhere.  I think that's the
> central problem here.  pglz has several safeguards to ensure that it
> doesn't do too much work in vain: we abort if we find nothing
> compressible within first_success_by bytes, or if we emit enough total
> output to be certain that we won't meet the need_rate threshold.
> Those safeguards are a lot less effective here because they can't be
> applied until *after* we've already paid the cost of building the
> history.  If we could figure out some way to apply those guards, or
> other guards, earlier in the algorithm, we could do a better job
> mitigating the worst-case scenarios, but I don't have a good idea.
>
Surely the weighting should be done according to the relative scarcity 
of processing power vs I/O bandwidth? I get the impression that 
different workloads and hardware configurations may favour conserving 
one of processor or I/O resources.  Would it be feasible to have 
different logic, depending on the trade-offs identified?


Cheers,
Gavin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Add CREATE support to event triggers
Next
From: Marti Raudsepp
Date:
Subject: Re: PoC: Partial sort