Home > mailing lists

Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From	Amit kapila
Subject	Re: Performance Improvement by reducing WAL for Update Operation
Date	November 14, 2012 11:27:08
Msg-id	6C0B27F7206C9E4CA54AE035729E9C38285495B0@szxeml509-mbx Whole thread Raw
Responses	Re: Performance Improvement by reducing WAL for Update Operation
List	pgsql-hackers

Tree view

On Thu, 8 Nov 2012 17:33:54 +0000 Amit Kapila wrote:
On Mon, 29 Oct 2012 20:02:11 +0530 Amit Kapila wrote:
On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
>>> One idea is to use the LZ format in the WAL record, but use your
>>> memcmp() code to construct it. I believe the slow part in LZ compression
>>> is in trying to locate matches in the "history", so if you just replace
>>> that with your code that's aware of the column boundaries and uses
>>> simple memcmp() to detect what parts changed, you could create LZ
>>> compressed output just as quickly as the custom encoded format. It would
>>> leave the door open for making the encoding smarter or to do actual
>>> compression in the future, without changing the format and the code to
>>> decode it.

>>This is good idea. I shall try it.

>>In the existing algorithm for storing the new data which is not present in
>>the history, it needs 1 control byte for
>>every 8 bytes of new data which can increase the size of the compressed
>>output as compare to our delta encoding approach.

>>Approach-2
>---------------
>>Use only one bit for control data [0 - Length and new data, 1 - pick from
>>history based on OFFSET-LENGTH]
>>The modified bit value (0) is to handle the new field data as a continuous
>>stream of data, instead of treating every byte as a new data.

> Attached are the patches
> 1. wal_update_changes_lz_v4 - to use LZ Approach with memcmp to construct WAL record
> 2. wal_update_changes_modified_lz_v5 - to use modified LZ Approach as mentioned above as Approach-2

> The main Changes as compare to previous patch are as follows:
> 1. In heap_delta_encode, use LZ encoding instead of Custom encoding.
> 2. Instead of get_tup_info(), introduced heap_getattr_with_len() macro based on suggestion from Noah.
> 3. LZ macro's moved from .c to .h, as they need to be used for encoding.
> 4. Changed the format for function arguments for heap_delta_encode()/heap_delta_decode() based on suggestion from Noah.

Please find the updated patches attached with this mail.

Modification in these Patches apart from above:

1. Traverse the tuple only once (previously it needs to traverse 3 times) to check if particular offset matches and get the offset to generate encoded tuple.

To achieve this I have modified function heap_tuple_attr_equals() to heap_attr_get_length_and_check_equals(), so that it can get the length of tuple attribute

which can be used to calculate offset. A separate function can also be written to achieve the same.

2. Improve the comments in code.

Performance Data:

1. Please refer testcase in attached file pgbench_250.c

Refer Function used to create random string at end of mail.

2. The detail data and configuration settings can be reffered in attached files (pgbench_encode_withlz_ff100 & pgbench_encode_withlz_ff80).

Benchmark results with -F 100:

-Patch- -tps@-c1- -tps@-c2- -tps@-c4- -tps@-c8- -WAL@-c8-
xlogscale 802 1453 2253 2643 13.99 GB
xlogscale+org lz 807 1602 3168 5140 9.50 GB
xlogscale+mod lz 796 1620 3216 5270 9.16 GB

Benchmark results with -F 80: -Patch- -tps@-c1- -tps@-c2- -tps@-c4- -tps@-c8- -WAL@-c8- xlogscale 811 1455 2148 2704 13.6 GB xlogscale+org lz 829 1684 3223 5325 9.13 GB xlogscale+mod lz 801 1657 3263 5488 8.86 GB

> I shall write the wal_update_changes_custom_delta_v6, and then we can compare all the three patches performance data and decide which one to go based on results.

The results with this are not better than above 2 Approaches, so I am not attaching it.

Function used to create randome string

--------------------------------------------------------

CREATE OR REPLACE FUNCTION random_text_md5_v2(INTEGER)
RETURNS TEXT
LANGUAGE SQL
AS $$

select upper(
substring(
(
SELECT string_agg(md5(random()::TEXT), '')
FROM generate_series(1, CEIL($1 / 32.)::integer)
),
$1)
);

$$;

Suggestions/Comments?

With Regards,

Amit Kapila.

Attachment

pgsql-hackers by date:

From: "Etsuro Fujita"
Date: 14 November 2012, 11:23:29
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY

From: Simon Riggs
Date: 14 November 2012, 11:41:28
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY

Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

Attachment

Previous

Next