Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 009001ce2c6e$9bea4790$d3bed6b0$@kapila@huawei.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On Wednesday, March 13, 2013 5:50 PM Amit Kapila wrote:
> On Friday, March 08, 2013 9:22 PM Amit Kapila wrote:
> > On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
> > > On 04.03.2013 06:39, Amit Kapila wrote:
> > > > On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
> > > >> On 02/05/2013 11:53 PM, Amit Kapila wrote:
> > > >>>> Performance data for the patch is attached with this mail.
> > > >>>> Conclusions from the readings (these are same as my previous
> > > patch):
> > > >>>>
> > >
> > > I've been doing investigating the pglz option further, and doing
> > > performance comparisons of the pglz approach and this patch. I'll
> > > begin with some numbers:
> > >
> >
> > Based on your patch, I have tried some more optimizations:
> >

Based on numbers provided by Daniel for compression methods, I tried Snappy
Algorithm for encoding
and it addresses most of your concerns that it should not degrade
performance for majority cases.

postgres orginal:
               testname                 | wal_generated |     duration
-----------------------------------------+---------------+------------------two short fields, no change             |
1232916160 | 34.0338308811188two short fields, one changed           |    1232909704 | 32.8722319602966two short
fields,both changed          |    1236770128 | 35.445415019989one short and one long field, no change |    1053000144 |
23.2983899116516tentiny fields, all changed            |    1397452584 | 40.2718069553375hundred tiny fields, first 10
changed  |     622082664 | 21.7642788887024hundred tiny fields, all changed        |     626461528 |
20.964781999588hundredtiny fields, half changed       |     621900472 | 21.6473519802094hundred tiny fields, half
nulled       |     557714752 | 19.0088789463043
 
(9 rows)


postgres encode wal using snappy:

               testname                 | wal_generated |     duration
-----------------------------------------+---------------+------------------two short fields, no change             |
1232915128 | 34.6910920143127two short fields, one changed           |    1238902520 | 34.2287850379944two short
fields,both changed          |    1233882056 | 35.3292708396912one short and one long field, no change |     733095168
|20.3494939804077ten tiny fields, all changed            |    1314959744 | 38.969575881958hundred tiny fields, first 10
changed  |     483275136 | 19.6973309516907hundred tiny fields, all changed        |     481755280 |
19.7665288448334hundredtiny fields, half changed       |     488693616 | 19.7246761322021hundred tiny fields, half
nulled       |     483425712 | 18.6299569606781
 
(9 rows)

Changes are to call snappy compress and decompress for encoding and decoding
in patch.
I am doing encoding for tup length greater than 32, as for too small tuples
it might not make much sense for encoding.

On my m/c while using snapy compress/decompress, it was giving stack
corruption for first 4 bytes, so I put below fix to proceed.
I am looking into reason of same.
1. snappy_compress - Increment the encoded data buffer with 4 bytes before
encryption starts. 
2. snappy_uncompress - Decrement the 4 bytes increment done during compress.

3.  snappy_uncompressed_length - Decrement the 4 bytes increment done during
compress. 


For LZ compression patch, there was small problem in identifying max length
which I have corrected in separate patch
'pglz-with-micro-optimizations-4.patch'


In my opinion, there can be following ways for this patch:
1. Use LZ compression, but provide a way to user so that it can be avoided
for cases where much compression is not possible.  I see this as a viable way because most updates will update only
havefew
 
columns and rest data would be same.
2. Use snappy API's, do anyone know of standard library of snappy?
3. Provide multiple compression ways, so depending on usage, user can use
appropriate one.

Feedback?

With Regards,
Amit Kapila.


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: in-catalog Extension Scripts and Control parameters (templates?)
Next
From: Simon Riggs
Date:
Subject: Re: Changing recovery.conf parameters into GUCs