Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 6C0B27F7206C9E4CA54AE035729E9C383BEAAE32@szxeml509-mbx
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Performance Improvement by reducing WAL for Update Operation
List pgsql-hackers
On Friday, January 04, 2013 8:03 PM Simon Riggs wrote:
On 4 January 2013 13:53, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Friday, December 28, 2012 3:52 PM Simon Riggs wrote:
>> On 28 December 2012 08:07, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>> > Hello, I saw this patch and confirmed that
>> >
>> >  - Coding style looks good.
>> >  - Appliable onto HEAD.
>> >  - Some mis-codings are fixed.
>>
>> I've had a quick review of the patch to see how close we're getting.
>> The perf tests look to me like we're getting what we wanted from this
>> and I'm happy with the recovery performance trade-offs. Well done to
>> both author and testers.
>>
Update patch contains handling of below Comments
>
>* There is a fixed 75% heuristic in the patch. Can we document where
>that came from? Can we have a parameter that sets that please? This
>can be used to have further tests to confirm the useful setting of
>this. I expect it to be removed before we release, but it will help
>during beta.

Added a guc variable wal_update_compression_ratio to set the compression ratio.
It can be removed during beta.

>* The compression algorithm depends completely upon new row length
>savings. If the new row is short, it would seem easier to just skip
>the checks and include it anyway. We can say if old and new vary in
>length by > 50% of each other, just include new as-is, since the rows
>very clearly differ in a big way.

Added a check in heap_delta_encode to identify whether the tuples are differ in length by 50%.

>* If full page writes is on and the page is very old, we are just
>going to copy the whole block. So why not check for that rather than
>do all these push ups and then just copy the page anyway?

Added a function which is used to identify whether the page needs a backup block or not.
based on the result the optimization is applied.


>* I'd like to see a specific test in regression that is designed to
>exercise the code here. That way we will be certain that the code is
>getting regularly tested.

Added the regression tests which covers all the changes done for the optimization except recovery.

>* The internal docs are completely absent. We need at least a whole
>page of descriptive comment, discussing trade-offs and design
>decisions. This is very important because it will help locate bugs
>much faster if these things are clealry documented. It also helps
>reviewers. This is a big timewaster for committers because you have to
>read the whole patch and understand it before you can attempt to form
>opinions. Commits happen quicker and easier with good comments.
>* Need to mention the WAL format change, or include the change within
>the patch so we can see

backend/access/transam/README is updated with details.

>* Lots of typos in comments. Many comments say nothing more than the
>words already used in the function name itself

corrected the typos and removed unnecessary comments.

>* "flags" variables are almost always int or uint in PG source.
>
>* PGLZ_HISTORY_SIZE needs to be documented in the place it is defined,
>not the place its used. The test if (oldtuplen < PGLZ_HISTORY_SIZE)
>really needs to be a test inside the compression module to maintain
>better modularity, so the value itself needn't be exported

(oldtuplen < PGLZ_HISTORY_SIZE) validation is moved inside the heap_delta_encode
and updated the flags variable also.

Test results with modified pgbench (1800 record size) on the latest patch:

-Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
Head                831           4.17 GB        1416           7.13 GB
WAL modification    846           2.36 GB        1712           3.31 GB

-Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
Head                2196          11.01 GB       2758           13.88 GB
WAL modification    3295           5.87 GB       5472            9.02 GB


With Regards,
Amit Kapila.
Attachment

pgsql-hackers by date:

Previous
From: Benedikt Grundmann
Date:
Subject: Re: Improve compression speeds in pg_lzcompress.c
Next
From: Andrew Dunstan
Date:
Subject: Re: PL/perl should fail on configure, not make