Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id CAA4eK1JeUbY16uwrDA2TaBkk+rLRL3Giyyqy1mVh_6CThmDR8w@mail.gmail.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Performance Improvement by reducing WAL for Update Operation
Re: Performance Improvement by reducing WAL for Update Operation
List pgsql-hackers
On Fri, Nov 29, 2013 at 3:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 27, 2013 at 9:31 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Sure, but to explore (a), the scope is bit bigger. We have below
>> options to explore (a):
>> 1. try to optimize existing algorithm as used in patch, which we have
>> tried but ofcourse we can spend some more time to see if anything more
>>     can be tried out.
>> 2. try fingerprint technique as suggested by you above.
>> 3. try some other standard methods like vcdiff, lz4 etc.
>
> Well, obviously, I'm hot on idea #2 and think that would be worth
> spending some time on.  If we can optimize the algorithm used in the
> patch some more (option #1), that would be fine, too, but the code
> looks pretty tight to me, so I'm not sure how successful that's likely
> to be.  But if you have an idea, sure.

I have been experimenting chunk wise delta encoding (by using
technique similar to rabin fingerprint method) from last few days and
here are results of my investigation.

Performance Data
----------------------------
Non-default settings:
autovacuum =off
checkpoint_segments =128
checkpoint_timeout = 10min

unpatched

                testname                             | wal_generated |
    duration
-----------------------------------------+---------------+------------------
 one short and one long field, no change |    1054921328 | 25.5855557918549
 hundred tiny fields, all changed             |     634483328 | 20.8992719650269
 hundred tiny fields, half changed           |     635948640 | 19.8670389652252
 hundred tiny fields, half nulled               |     571388552 |
18.9413228034973


lz-delta-encoding

     testname                 | wal_generated |     duration
-----------------------------------------+---------------+------------------
 one short and one long field, no change |     662984384 | 21.7335519790649
 hundred tiny fields, all changed             |     633944320 | 24.1207830905914
 hundred tiny fields, half changed           |     633944344 | 24.4657719135284
 hundred tiny fields, half nulled               |     492200208 |
22.0337791442871


rabin-delta-encoding

                testname                 | wal_generated |     duration
-----------------------------------------+---------------+------------------
 one short and one long field, no change |     662235752 | 20.1823079586029
 hundred tiny fields, all changed             |     633950080 | 22.0473308563232
 hundred tiny fields, half changed           |     633950880 | 21.8351459503174
 hundred tiny fields, half nulled               |     508943072 |
20.9554698467255


Results Summarization
-------------------------------------
1. With Chunkwise approach, WAL reduction is almost same as with LZ
barring half nulled case which can be improved.
2. With Chunkwise approach, CPU usage is reduced to 50% in most cases
where there is less or no compression,
    still there is 5~10% overhead for cases where data is not
compressible. I think there will certainly a small
    overhead of forming hash table and scanning to conclude data is
non-compressible.
3. I have not tested other tests which will anyway return from top of
encoding function due to tuple length less than 32.

Main reasons of improvement
---------------------------------------------
1. lesser hash entries for old tuple and lesser calculations during
compressing of new tuple.
2. memset for data structure related to hash table for lesser size
3. Don't copy into output buffer untill we found match.

Further Actions
------------------------
1. Need to decide if this reduction in CPU usage is acceptable, do we
need enable/disable flag at table level.
2. We can do further micro-optimisations in chunk wise approach like
hash function improvement.
3. Some code improvements are pending like for cases where data to be
compressed is non-contiguous.

Attached files
---------------------
1. pgrb_delta_encoding_v1 - In heaptuple.c, there is a parameter
rabin_fingerprint_comp, set it to true for
    chunkwise delta encoding and set it to false for lz encoding. By
default it is true. I wanted to provide
    better way to enable both modes and tried as well but end up with this way.
2. wal-update-testsuite.sh - test script developed by Heikki to test this patch.

Note -
a. Performance is data is taken on my laptop, needs to be tested on
some better m/c
b. Attached Patch is just a prototype of chunkwise concept, code needs
to be improved and decode
    handling/test is pending.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: same-address mappings vs. relative pointers
Next
From: Tom Lane
Date:
Subject: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?