Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 6C0B27F7206C9E4CA54AE035729E9C382854833D@szxeml509-mbx
Whole thread Raw
List pgsql-hackers

On Mon, 29 Oct 2012 20:02:11 +0530 Amit Kapila wrote:

On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
>> One idea is to use the LZ format in the WAL record, but use your
>> memcmp() code to construct it. I believe the slow part in LZ compression
>> is in trying to locate matches in the "history", so if you just replace
>> that with your code that's aware of the column boundaries and uses
>> simple memcmp() to detect what parts changed, you could create LZ
>> compressed output just as quickly as the custom encoded format. It would
>> leave the door open for making the encoding smarter or to do actual
>> compression in the future, without changing the format and the code to
>> decode it.

>This is good idea. I shall try it.

>In the existing algorithm for storing the new data which is not present in
>the history, it needs 1 control byte for
>every 8 bytes of new data which can increase the size of the compressed
>output as compare to our delta encoding approach.

>Approach-2
>---------------
>Use only one bit for control data [0 - Length and new data, 1 - pick from
>history based on OFFSET-LENGTH]
>The modified bit value (0) is to handle the new field data as a continuous
>stream of data, instead of treating every byte as a new data.

 

Attached are the patches

1. wal_update_changes_lz_v4 - to use LZ Approach with memcmp to construct WAL record

2. wal_update_changes_modified_lz_v5 - to use modified LZ Approach as mentioned above as Approach-2

 

The main Changes as compare to previous patch are as follows:

1. In heap_delta_encode, use LZ encoding instead of Custom encoding.

2. Instead of get_tup_info(), introduced heap_getattr_with_len() macro based on suggestion from Noah.

3. LZ macro's moved from .c to .h, as they need to be used for encoding.

4. Changed the format for function arguments for heap_delta_encode()/heap_delta_decode() based on suggestion from Noah.

 


 

 

Performance Data:

 






Results:

Threads
1
2
4
8
Patch
Tps
wal size(GB)
Tps
wal size(GB)
Tps
wal size(GB)
Tps
wal size(GB)
Xlog scale
861
4.36
1463
7.33
2135
10.74
2689
13.56
Xlog scale +Original LZ
892
2.46
1685
3.35
3232
6.02
5296
9.20
Xlog scale +Modified LZ
852
2.35
1664
3.25
3229
5.71
5431
8.68


These are still WIP patches. Some cleanup has to be done.

 

Apart from that, I think the reason why still the performance is not same as Custom delta encoding Approach, is that it has IGN command due to which for all

the unchanged data in end, there are no commands and it was able to form tuple in decode using old tuple.

I shall write the wal_update_changes_custom_delta_v6, and then we can compare all the three patches performance data and decide which one to go based on results.

 

Suggestions/Comments?

 

With Regards,

Amit Kapila.


 

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Further pg_upgrade analysis for many tables
Next
From: Tom Lane
Date:
Subject: Re: Proof of concept: auto updatable views [Review of Patch]