Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id CAA4eK1JwMaYZUYh8N+TsTnVRO-XZ-fpg22a_WqRRdo2RjpU_MA@mail.gmail.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Feb 5, 2014 at 8:56 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Feb 5, 2014 at 5:13 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 02/05/2014 07:54 AM, Amit Kapila wrote:
>>
>> That's not the worst case, by far.
>>
>> First, note that the skipping while scanning new tuple is only performed in
>> the first loop. That means that as soon as you have a single match, you fall
>> back to hashing every byte. So for the worst case, put one 4-byte field as
>> the first column, and don't update it.
>>
>> Also, I suspect the runtimes in your test were dominated by I/O. When I
>> scale down the number of rows involved so that the whole test fits in RAM, I
>> get much bigger differences with and without the patch. You might also want
>> to turn off full_page_writes, to make the effect clear with less data.
>>
>> So with this test, the overhead is very significant.
>>
>> With the skipping logic, another kind of "worst case" case is that you have
>> a lot of similarity between the old and new tuple, but you miss it because
>> you skip.
>
> This is exactly the reason why I have not kept skipping logic in second
> pass(loop), but I think may be it would have been better to keep it not
> as aggressive as in first pass.

I have tried to merge pass-1 and pass-2 and kept skipping logic as same,
and it have reduced the overhead to a good extent but not completely for
the new case you have added. This change is to check if it can reduce
overhead, if we want to proceed, may be we can limit the skip factor, so
that chance of skipping some match data is reduced.

New version of patch is attached with mail

Unpatched

           testname           | wal_generated |     duration
------------------------------+---------------+------------------
 ten long fields, all changed |     348842856 | 6.93688106536865
 ten long fields, all changed |     348843672 | 7.53063702583313
 ten long fields, all changed |     352662344 | 7.76640701293945
(3 rows)


pgrb_delta_encoding_v8.patch
             testname             | wal_generated |     duration
----------------------------------+---------------+------------------
 ten long fields, but one changed |     348848144 | 9.22694897651672
 ten long fields, but one changed |     348841376 | 9.11818099021912
 ten long fields, but one changed |     352963488 | 8.37875485420227
(3 rows)


pgrb_delta_encoding_v9.patch

             testname             | wal_generated |     duration
----------------------------------+---------------+------------------
 ten long fields, but one changed |     350166320 | 8.84561610221863
 ten long fields, but one changed |     348840728 | 8.45299792289734
 ten long fields, but one changed |     348846656 | 8.34846496582031
(3 rows)


It appears to me that it can be good idea to merge both the patches
(prefix-suffix encoding + delta-encoding) in a way such that if we
get reasonable compression (50% or so) with prefix-suffix, then we
can return without doing delta encoding and if compression is lesser
than we can do delta encoding for rest of tuple. The reason I think it
will be good because by just doing prefix-suffix we might leave many
cases where good compression is possible.
If you think it is viable way, then I can merge both the patches and
check the results.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Row-security on updatable s.b. views
Next
From: Amit Kapila
Date:
Subject: Re: Retain dynamic shared memory segments for postmaster lifetime