Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 6C0B27F7206C9E4CA54AE035729E9C38285495B0@szxeml509-mbx
Whole thread Raw
Responses Re: Performance Improvement by reducing WAL for Update Operation  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers

On Thu, 8 Nov 2012 17:33:54 +0000 Amit Kapila wrote:
On Mon, 29 Oct 2012 20:02:11 +0530 Amit Kapila wrote:
On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
>>> One idea is to use the LZ format in the WAL record, but use your
>>> memcmp() code to construct it. I believe the slow part in LZ compression
>>> is in trying to locate matches in the "history", so if you just replace
>>> that with your code that's aware of the column boundaries and uses
>>> simple memcmp() to detect what parts changed, you could create LZ
>>> compressed output just as quickly as the custom encoded format. It would
>>> leave the door open for making the encoding smarter or to do actual
>>> compression in the future, without changing the format and the code to
>>> decode it.

>>This is good idea. I shall try it.

>>In the existing algorithm for storing the new data which is not present in
>>the history, it needs 1 control byte for
>>every 8 bytes of new data which can increase the size of the compressed
>>output as compare to our delta encoding approach.

>>Approach-2
>---------------
>>Use only one bit for control data [0 - Length and new data, 1 - pick from
>>history based on OFFSET-LENGTH]
>>The modified bit value (0) is to handle the new field data as a continuous
>>stream of data, instead of treating every byte as a new data.

> Attached are the patches
> 1. wal_update_changes_lz_v4 - to use LZ Approach with memcmp to construct WAL record
> 2. wal_update_changes_modified_lz_v5 - to use modified LZ Approach as mentioned above as Approach-2


> The main Changes as compare to previous patch are as follows:
> 1. In heap_delta_encode, use LZ encoding instead of Custom encoding.
> 2. Instead of get_tup_info(), introduced heap_getattr_with_len() macro based on suggestion from Noah.
> 3. LZ macro's moved from .c to .h, as they need to be used for encoding.
> 4. Changed the format for function arguments for heap_delta_encode()/heap_delta_decode() based on suggestion from Noah.

 

Please find the updated patches attached with this mail.

 

Modification in these Patches apart from above:

1. Traverse the tuple only once (previously it needs to traverse 3 times) to check if particular offset matches and get the offset to generate encoded tuple.

    To achieve this I have modified function heap_tuple_attr_equals() to heap_attr_get_length_and_check_equals(), so that it can get the length of tuple attribute

    which can be used to calculate offset. A separate function can also be written to achieve the same.

2. Improve the comments in code.

 

Performance Data:

1. Please refer testcase in attached file pgbench_250.c

   Refer Function used to create random string at end of mail.

2. The detail data and configuration settings can be reffered in attached files (pgbench_encode_withlz_ff100 & pgbench_encode_withlz_ff80).

 

Benchmark results with -F 100:

-Patch-             -tps@-c1-   -tps@-c2-   -tps@-c4-   -tps@-c8-   -WAL@-c8-
xlogscale           802         1453        2253        2643        13.99 GB
xlogscale+org lz    807         1602        3168        5140        9.50 GB
xlogscale+mod lz    796         1620        3216        5270        9.16 GB


Benchmark results with -F 80:

-Patch-             -tps@-c1-   -tps@-c2-   -tps@-c4-   -tps@-c8-   -WAL@-c8-
xlogscale           811         1455        2148        2704        13.6 GB
xlogscale+org lz    829         1684        3223        5325        9.13 GB
xlogscale+mod lz    801         1657        3263        5488        8.86 GB

 

> I shall write the wal_update_changes_custom_delta_v6, and then we can compare all the three patches performance data and decide which one to go based on results.

   The results with this are not better than above 2 Approaches, so I am not attaching it.

 

Function used to create randome string

--------------------------------------------------------

CREATE OR REPLACE FUNCTION random_text_md5_v2(INTEGER)
RETURNS TEXT
LANGUAGE SQL
AS $$

    select upper(
        substring(
            (
                SELECT string_agg(md5(random()::TEXT), '')
                FROM generate_series(1, CEIL($1 / 32.)::integer)
                ),
        $1)
    );

$$;

Suggestions/Comments?

 

With Regards,

Amit Kapila.



Attachment

pgsql-hackers by date:

Previous
From: "Etsuro Fujita"
Date:
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Next
From: Simon Riggs
Date:
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY