Thread: Re: Performance Improvement by reducing WAL for Update Operation

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

14 November 2012, 11:27:08

On Thu, 8 Nov 2012 17:33:54 +0000 Amit Kapila wrote:
On Mon, 29 Oct 2012 20:02:11 +0530 Amit Kapila wrote:
On Sunday, October 28, 2012 12:28 AM Heikki Linnakangas wrote:
>>> One idea is to use the LZ format in the WAL record, but use your
>>> memcmp() code to construct it. I believe the slow part in LZ compression
>>> is in trying to locate matches in the "history", so if you just replace
>>> that with your code that's aware of the column boundaries and uses
>>> simple memcmp() to detect what parts changed, you could create LZ
>>> compressed output just as quickly as the custom encoded format. It would
>>> leave the door open for making the encoding smarter or to do actual
>>> compression in the future, without changing the format and the code to
>>> decode it.

>>This is good idea. I shall try it.

>>In the existing algorithm for storing the new data which is not present in
>>the history, it needs 1 control byte for
>>every 8 bytes of new data which can increase the size of the compressed
>>output as compare to our delta encoding approach.

>>Approach-2
>---------------
>>Use only one bit for control data [0 - Length and new data, 1 - pick from
>>history based on OFFSET-LENGTH]
>>The modified bit value (0) is to handle the new field data as a continuous
>>stream of data, instead of treating every byte as a new data.

> Attached are the patches
> 1. wal_update_changes_lz_v4 - to use LZ Approach with memcmp to construct WAL record
> 2. wal_update_changes_modified_lz_v5 - to use modified LZ Approach as mentioned above as Approach-2

> The main Changes as compare to previous patch are as follows:
> 1. In heap_delta_encode, use LZ encoding instead of Custom encoding.
> 2. Instead of get_tup_info(), introduced heap_getattr_with_len() macro based on suggestion from Noah.
> 3. LZ macro's moved from .c to .h, as they need to be used for encoding.
> 4. Changed the format for function arguments for heap_delta_encode()/heap_delta_decode() based on suggestion from Noah.

Please find the updated patches attached with this mail.

Modification in these Patches apart from above:

1. Traverse the tuple only once (previously it needs to traverse 3 times) to check if particular offset matches and get the offset to generate encoded tuple.

To achieve this I have modified function heap_tuple_attr_equals() to heap_attr_get_length_and_check_equals(), so that it can get the length of tuple attribute

which can be used to calculate offset. A separate function can also be written to achieve the same.

2. Improve the comments in code.

Performance Data:

1. Please refer testcase in attached file pgbench_250.c

Refer Function used to create random string at end of mail.

2. The detail data and configuration settings can be reffered in attached files (pgbench_encode_withlz_ff100 & pgbench_encode_withlz_ff80).

Benchmark results with -F 100:

-Patch- -tps@-c1- -tps@-c2- -tps@-c4- -tps@-c8- -WAL@-c8-
xlogscale 802 1453 2253 2643 13.99 GB
xlogscale+org lz 807 1602 3168 5140 9.50 GB
xlogscale+mod lz 796 1620 3216 5270 9.16 GB

Benchmark results with -F 80: -Patch- -tps@-c1- -tps@-c2- -tps@-c4- -tps@-c8- -WAL@-c8- xlogscale 811 1455 2148 2704 13.6 GB xlogscale+org lz 829 1684 3223 5325 9.13 GB xlogscale+mod lz 801 1657 3263 5488 8.86 GB

> I shall write the wal_update_changes_custom_delta_v6, and then we can compare all the three patches performance data and decide which one to go based on results.

The results with this are not better than above 2 Approaches, so I am not attaching it.

Function used to create randome string

--------------------------------------------------------

CREATE OR REPLACE FUNCTION random_text_md5_v2(INTEGER)
RETURNS TEXT
LANGUAGE SQL
AS $$

select upper(
substring(
(
SELECT string_agg(md5(random()::TEXT), '')
FROM generate_series(1, CEIL($1 / 32.)::integer)
),
$1)
);

$$;

Suggestions/Comments?

With Regards,

Amit Kapila.

Attachment

Re: Performance Improvement by reducing WAL for Update Operation

From

Kyotaro HORIGUCHI

Date:

07 December 2012, 08:58:45

Hello, I looked into the patch and have some comments.

From the restriction of the time for this rather big patch,
please excuse that these comments are on a part of it. Others
will follow in few days.


==== heaptuple.c

noncachegetattr(_with_len): 

- att_getlength should do strlen as worst case or VARSIZE_ANY which is heavier than doing one comparizon, so I
recommendto add 'if (len)' as the restriction for doing this, and give NULL as &len to nocachegetattr_with_len in
nocachegetattr.

heap_attr_get_length_and_check_equals:

- Size seems to be used conventionary as the type for memory object length, so it might be better using Size instead of
int32as the type for *tup[12]_attr_len in parameter.
 

- This function returns always false for attrnum <= 0 as whole tuple or some system attrs comparison regardless of the
realresult, which is a bit different from the anticipation which the name gives. If you need to keep this optimization,
itshould have the name more specific to the purpose.
 

haap_delta_encode:

- Some misleading variable names (like match_not_found), some reatitions of similiar codelets (att_align_pointer,
pglz_out_tag),misleading slight difference of the meanings of variables of similar names(old_off and new_off and the
similarpairs), and bit tricky use of pglz_out_add and pglz_out_tag with length = 0.
 
 These are welcome to be modified for better readability.

==== heapam.c

fastgetattr_with_len

- Missing left paren in the line 867 ('nocachegetattr_with_len(tup)...')

- Missing enclosing paren in heapam.c:879 (len, only on style)

- Allowing len = NULL will be good for better performance, like noncachegetattr.


fastgetattr

- I suppose that the coding covension here is that macro and alternative c-code are expected to be look similar.
fastgetattrlooks quite differ to corresponding macro.
 

...


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

07 December 2012, 13:06:57

On Friday, December 07, 2012 2:28 PM Kyotaro HORIGUCHI wrote:

> Hello, I looked into the patch and have some comments.

Thank you for reviewing the patch.

> From the restriction of the time for this rather big patch,
> please excuse that these comments are on a part of it. Others
> will follow in few days.

It's perfectly fine.


>==== heaptuple.c
>
>noncachegetattr(_with_len):
>
>- att_getlength should do strlen as worst case or VARSIZE_ANY
>  which is heavier than doing one comparizon, so I recommend to
>  add 'if (len)' as the restriction for doing this, and give NULL
>  as &len to nocachegetattr_with_len in nocachegetattr.
Fixed.

>heap_attr_get_length_and_check_equals:
>
>- Size seems to be used conventionary as the type for memory
>  object length, so it might be better using Size instead of
>  int32 as the type for *tup[12]_attr_len in parameter.

Fixed.

>- This function returns always false for attrnum <= 0 as whole
>  tuple or some system attrs comparison regardless of the real
>  result, which is a bit different from the anticipation which
>  the name gives. If you need to keep this optimization, it
>  should have the name more specific to the purpose.

The heap_attr_get_length_and_check_equals function is similar to heap_tuple_attr_equals,
the attrnum <= 0 check is required for heap_tuple_attr_equals.

>haap_delta_encode:
>
>- Some misleading variable names (like match_not_found),
>  some reatitions of similiar codelets (att_align_pointer, pglz_out_tag),
>  misleading slight difference of the meanings of variables of
>  similar names(old_off and new_off and the similar pairs),
>  and bit tricky use of pglz_out_add and pglz_out_tag with length = 0.
>
>  These are welcome to be modified for better readability.

The variable names are modified, please check them once.

The (att_align_pointer, pglz_out_tag) repetition code is added to take care of padding only incase of values are equal.

Use of pglz_out_add and pglz_out_tag with length = 0 is done because of code readability.

>==== heapam.c
>
>fastgetattr_with_len
>
>- Missing left paren in the line 867 ('nocachegetattr_with_len(tup)...')
>
>- Missing enclosing paren in heapam.c:879 (len, only on style)
>
>- Allowing len = NULL will be good for better performance, like
>  noncachegetattr.

Fixed. except len=NULL because fastgetattr is modified as below comment.

>fastgetattr
>
>- I suppose that the coding covension here is that macro and
>  alternative c-code are expected to be look similar. fastgetattr
>  looks quite differ to corresponding macro.

Fixed.


Another change is also done to handle the history size of 2 bytes which is possible with the usage of LZ macro's for
deltaencoding. 


With Regards,
Amit Kapila.

On 28 December 2012 08:07, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

> Hello, I saw this patch and confirmed that
>
> - Coding style looks good.
> - Appliable onto HEAD.
> - Some mis-codings are fixed.

I've had a quick review of the patch to see how close we're getting.
The perf tests look to me like we're getting what we wanted from this
and I'm happy with the recovery performance trade-offs. Well done to
both author and testers.

My comments

* There is a fixed 75% heuristic in the patch. Can we document where
that came from? Can we have a parameter that sets that please? This
can be used to have further tests to confirm the useful setting of
this. I expect it to be removed before we release, but it will help
during beta.

* The compression algorithm depends completely upon new row length
savings. If the new row is short, it would seem easier to just skip
the checks and include it anyway. We can say if old and new vary in
length by > 50% of each other, just include new as-is, since the rows
very clearly differ in a big way. Also, if tuple is same length as
before, can we compare the whole tuple at once to save doing
per-column checks?

* If full page writes is on and the page is very old, we are just
going to copy the whole block. So why not check for that rather than
do all these push ups and then just copy the page anyway?

* TOAST is not handled at all. No comments about it, nothing. Does
that mean it hasn't been considered? Or did we decide not to care in
this release? Presumably that means we are comparing toast pointers
byte by byte to see if they are the same?

* I'd like to see a specific test in regression that is designed to
exercise the code here. That way we will be certain that the code is
getting regularly tested.

* The internal docs are completely absent. We need at least a whole
page of descriptive comment, discussing trade-offs and design
decisions. This is very important because it will help locate bugs
much faster if these things are clealry documented. It also helps
reviewers. This is a big timewaster for committers because you have to
read the whole patch and understand it before you can attempt to form
opinions. Commits happen quicker and easier with good comments.

* Lots of typos in comments. Many comments say nothing more than the
words already used in the function name itself

* "flags" variables are almost always int or uint in PG source.

* PGLZ_HISTORY_SIZE needs to be documented in the place it is defined,
not the place its used. The test if (oldtuplen < PGLZ_HISTORY_SIZE)
really needs to be a test inside the compression module to maintain
better modularity, so the value itself needn't be exported

* Need to mention the WAL format change, or include the change within
the patch so we can see

-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

28 December 2012, 11:27:50

On Friday, December 28, 2012 3:52 PM Simon Riggs wrote:
> On 28 December 2012 08:07, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> > Hello, I saw this patch and confirmed that
> >
> >  - Coding style looks good.
> >  - Appliable onto HEAD.
> >  - Some mis-codings are fixed.
> 
> I've had a quick review of the patch to see how close we're getting.
> The perf tests look to me like we're getting what we wanted from this
> and I'm happy with the recovery performance trade-offs. Well done to
> both author and testers.
> 
> My comments
> 
> * There is a fixed 75% heuristic in the patch. Can we document where
> that came from? 

It is from LZ compression strategy. Refer PGLZ_Strategy.
I will add comment for it.

> Can we have a parameter that sets that please? This
> can be used to have further tests to confirm the useful setting of
> this. I expect it to be removed before we release, but it will help
> during beta.

I shall add that for test purpose.
> * The compression algorithm depends completely upon new row length
> savings. If the new row is short, it would seem easier to just skip
> the checks and include it anyway. We can say if old and new vary in
> length by > 50% of each other, just include new as-is, since the rows
> very clearly differ in a big way.

I think it makes more sense. So I shall update the patch.

> Also, if tuple is same length as
> before, can we compare the whole tuple at once to save doing
> per-column checks?

I shall evaluate and discuss with you.

> * If full page writes is on and the page is very old, we are just
> going to copy the whole block. So why not check for that rather than
> do all these push ups and then just copy the page anyway?

I shall check once and update the patch.

> * TOAST is not handled at all. No comments about it, nothing. Does
> that mean it hasn't been considered? Or did we decide not to care in
> this release? 

> Presumably that means we are comparing toast pointers
> byte by byte to see if they are the same?

Yes, currently this patch is doing byte by byte comparison for toast
pointers. I shall add comment.
In future, we can evaluate if further optimizations can be done.

> * I'd like to see a specific test in regression that is designed to
> exercise the code here. That way we will be certain that the code is
> getting regularly tested.

I shall add more specific tests.

> * The internal docs are completely absent. We need at least a whole
> page of descriptive comment, discussing trade-offs and design
> decisions. This is very important because it will help locate bugs
> much faster if these things are clealry documented. It also helps
> reviewers. This is a big timewaster for committers because you have to
> read the whole patch and understand it before you can attempt to form
> opinions. Commits happen quicker and easier with good comments.

Do you have any suggestion for where to put this information, any particular
ReadMe?

> * Lots of typos in comments. Many comments say nothing more than the
> words already used in the function name itself
> 
> * "flags" variables are almost always int or uint in PG source.

> * PGLZ_HISTORY_SIZE needs to be documented in the place it is defined,
> not the place its used. The test if (oldtuplen < PGLZ_HISTORY_SIZE)
> really needs to be a test inside the compression module to maintain
> better modularity, so the value itself needn't be exported

I shall update the patch to address it.

> * Need to mention the WAL format change, or include the change within
> the patch so we can see

Sure, I will update this in code comments and internals docs.


With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

28 December 2012, 11:28:59

On Friday, December 28, 2012 1:38 PM Kyotaro HORIGUCHI wrote:
> Hello, I saw this patch and confirmed that
> 
>  - Coding style looks good.
>  - Appliable onto HEAD.
>  - Some mis-codings are fixed.
> 
> And took the performance figures for 4 types of modification versus 2
> benchmarks.
> 
> As a whole, this patch brings very large gain in its effective range -
> e.g. updates of relatively small portions in a tuple, but negligible
> loss of performance is observed outside of its effective range on the
> test machine. I suppose the losses will be emphasized by the more
> higher performance of seq write of WAL devices


Thank you very much for the review.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

28 December 2012, 11:55:56

On 28 December 2012 11:27, Amit Kapila <amit.kapila@huawei.com> wrote:

>> * The internal docs are completely absent. We need at least a whole
>> page of descriptive comment, discussing trade-offs and design
>> decisions. This is very important because it will help locate bugs
>> much faster if these things are clealry documented. It also helps
>> reviewers. This is a big timewaster for committers because you have to
>> read the whole patch and understand it before you can attempt to form
>> opinions. Commits happen quicker and easier with good comments.
>
> Do you have any suggestion for where to put this information, any particular
> ReadMe?

Location is less relevant, since it will show up as additions in the patch.

Put it wherever makes most sense in comparison to existing related
comments/README. I have no preference myself.

If its any consolation, I notice a common issue with patches is lack
of *explanatory* comments, as opposed to line by line comments. So
same review comment to 50-75% of patches I've reviewed recently, which
is also likely why.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

28 December 2012, 12:03:27

On 28 December 2012 11:27, Amit Kapila <amit.kapila@huawei.com> wrote:

>> * TOAST is not handled at all. No comments about it, nothing. Does
>> that mean it hasn't been considered? Or did we decide not to care in
>> this release?
>
>> Presumably that means we are comparing toast pointers
>> byte by byte to see if they are the same?
>
> Yes, currently this patch is doing byte by byte comparison for toast
> pointers. I shall add comment.
> In future, we can evaluate if further optimizations can be done.

Just a comment to say that the comparison takes place after TOASTed
columns have been removed. TOAST is already optimised for whole value
UPDATE anyway, so that is the right place to produce the delta.

It does make me think that we can further optimise TOAST by updating
only the parts of a toasted datum that have changed. That will be
useful for JSON and XML applications that change only a portion of
large documents.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

04 January 2013, 13:54:30

On Friday, December 28, 2012 3:52 PM Simon Riggs wrote:
> On 28 December 2012 08:07, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> > Hello, I saw this patch and confirmed that
> >
> >  - Coding style looks good.
> >  - Appliable onto HEAD.
> >  - Some mis-codings are fixed.
> 
> I've had a quick review of the patch to see how close we're getting.
> The perf tests look to me like we're getting what we wanted from this
> and I'm happy with the recovery performance trade-offs. Well done to
> both author and testers.
> 
> 
> * The compression algorithm depends completely upon new row length
> savings. If the new row is short, it would seem easier to just skip
> the checks and include it anyway. We can say if old and new vary in
> length by > 50% of each other, just include new as-is, since the rows
> very clearly differ in a big way. 

> Also, if tuple is same length as 
> before, can we compare the whole tuple at once to save doing
> per-column checks?

If we have to do whole tuple comparison then storing of changed parts might
need to be
be done in a byte-by-byte way rather then at column offset boundaries.
This might not be possible with current algorithm as it stores in WAL
information column-by-column and decrypts also in similar way.

> The internal docs are completely absent. We need at least a whole page of
descriptive > comment, discussing trade-offs and design decisions.

Currently I have planned to put it transam/README, as most of WAL
description is present there.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

04 January 2013, 14:34:07

On 4 January 2013 13:53, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Friday, December 28, 2012 3:52 PM Simon Riggs wrote:
>> On 28 December 2012 08:07, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>> > Hello, I saw this patch and confirmed that
>> >
>> >  - Coding style looks good.
>> >  - Appliable onto HEAD.
>> >  - Some mis-codings are fixed.
>>
>> I've had a quick review of the patch to see how close we're getting.
>> The perf tests look to me like we're getting what we wanted from this
>> and I'm happy with the recovery performance trade-offs. Well done to
>> both author and testers.
>>
>>
>> * The compression algorithm depends completely upon new row length
>> savings. If the new row is short, it would seem easier to just skip
>> the checks and include it anyway. We can say if old and new vary in
>> length by > 50% of each other, just include new as-is, since the rows
>> very clearly differ in a big way.
>
>> Also, if tuple is same length as
>> before, can we compare the whole tuple at once to save doing
>> per-column checks?
>
> If we have to do whole tuple comparison then storing of changed parts might
> need to be
> be done in a byte-by-byte way rather then at column offset boundaries.
> This might not be possible with current algorithm as it stores in WAL
> information column-by-column and decrypts also in similar way.

OK, please explain in comments.

>> The internal docs are completely absent. We need at least a whole page of
> descriptive > comment, discussing trade-offs and design decisions.
>
> Currently I have planned to put it transam/README, as most of WAL
> description is present there.

But also in comments for each major function.

Thanks

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

09 January 2013, 08:05:25

On Friday, January 04, 2013 8:03 PM Simon Riggs wrote:
On 4 January 2013 13:53, Amit Kapila <amit.kapila@huawei.com> wrote:
> On Friday, December 28, 2012 3:52 PM Simon Riggs wrote:
>> On 28 December 2012 08:07, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>> > Hello, I saw this patch and confirmed that
>> >
>> >  - Coding style looks good.
>> >  - Appliable onto HEAD.
>> >  - Some mis-codings are fixed.
>>
>> I've had a quick review of the patch to see how close we're getting.
>> The perf tests look to me like we're getting what we wanted from this
>> and I'm happy with the recovery performance trade-offs. Well done to
>> both author and testers.
>>
Update patch contains handling of below Comments
>
>* There is a fixed 75% heuristic in the patch. Can we document where
>that came from? Can we have a parameter that sets that please? This
>can be used to have further tests to confirm the useful setting of
>this. I expect it to be removed before we release, but it will help
>during beta.

Added a guc variable wal_update_compression_ratio to set the compression ratio.
It can be removed during beta.

>* The compression algorithm depends completely upon new row length
>savings. If the new row is short, it would seem easier to just skip
>the checks and include it anyway. We can say if old and new vary in
>length by > 50% of each other, just include new as-is, since the rows
>very clearly differ in a big way.

Added a check in heap_delta_encode to identify whether the tuples are differ in length by 50%.

>* If full page writes is on and the page is very old, we are just
>going to copy the whole block. So why not check for that rather than
>do all these push ups and then just copy the page anyway?

Added a function which is used to identify whether the page needs a backup block or not.
based on the result the optimization is applied.


>* I'd like to see a specific test in regression that is designed to
>exercise the code here. That way we will be certain that the code is
>getting regularly tested.

Added the regression tests which covers all the changes done for the optimization except recovery.

>* The internal docs are completely absent. We need at least a whole
>page of descriptive comment, discussing trade-offs and design
>decisions. This is very important because it will help locate bugs
>much faster if these things are clealry documented. It also helps
>reviewers. This is a big timewaster for committers because you have to
>read the whole patch and understand it before you can attempt to form
>opinions. Commits happen quicker and easier with good comments.
>* Need to mention the WAL format change, or include the change within
>the patch so we can see

backend/access/transam/README is updated with details.

>* Lots of typos in comments. Many comments say nothing more than the
>words already used in the function name itself

corrected the typos and removed unnecessary comments.

>* "flags" variables are almost always int or uint in PG source.
>
>* PGLZ_HISTORY_SIZE needs to be documented in the place it is defined,
>not the place its used. The test if (oldtuplen < PGLZ_HISTORY_SIZE)
>really needs to be a test inside the compression module to maintain
>better modularity, so the value itself needn't be exported

(oldtuplen < PGLZ_HISTORY_SIZE) validation is moved inside the heap_delta_encode
and updated the flags variable also.

Test results with modified pgbench (1800 record size) on the latest patch:

-Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
Head                831           4.17 GB        1416           7.13 GB
WAL modification    846           2.36 GB        1712           3.31 GB

-Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
Head                2196          11.01 GB       2758           13.88 GB
WAL modification    3295           5.87 GB       5472            9.02 GB


With Regards,
Amit Kapila.

Attachment

wal_update_changes_v7.patch

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

09 January 2013, 11:27:00

On 9 January 2013 08:05, Amit kapila <amit.kapila@huawei.com> wrote:

> Update patch contains handling of below Comments

Thanks


> Test results with modified pgbench (1800 record size) on the latest patch:
>
> -Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
> Head                831           4.17 GB        1416           7.13 GB
> WAL modification    846           2.36 GB        1712           3.31 GB
>
> -Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
> Head                2196          11.01 GB       2758           13.88 GB
> WAL modification    3295           5.87 GB       5472            9.02 GB

And test results on normal pgbench?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

09 January 2013, 12:13:20

On Wednesday, January 09, 2013 4:57 PM Simon Riggs wrote:
> On 9 January 2013 08:05, Amit kapila <amit.kapila@huawei.com> wrote:
> 
> > Update patch contains handling of below Comments
> 
> Thanks
> 
> 
> > Test results with modified pgbench (1800 record size) on the latest
> patch:
> >
> > -Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -
> WAL@-c2-
> > Head                831           4.17 GB        1416           7.13
> GB
> > WAL modification    846           2.36 GB        1712           3.31
> GB
> >
> > -Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -
> WAL@-c8-
> > Head                2196          11.01 GB       2758           13.88
> GB
> > WAL modification    3295           5.87 GB       5472            9.02
> GB
> 
> And test results on normal pgbench?

As there was no gain for original pgbench as was shown in performance
readings, so I thought it is not mandatory.
However I shall run for normal pgbench as it should not lead any further dip
in normal pgbench.
Thanks for pointing.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

11 January 2013, 10:41:17

On Wednesday, January 09, 2013 4:57 PM Simon Riggs wrote:
> On 9 January 2013 08:05, Amit kapila <amit.kapila@huawei.com> wrote:
> 
> > Update patch contains handling of below Comments
> 
> Thanks
> 
> 
> > Test results with modified pgbench (1800 record size) on the latest
> patch:
> >
> > -Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -
> WAL@-c2-
> > Head                831           4.17 GB        1416           7.13
> GB
> > WAL modification    846           2.36 GB        1712           3.31
> GB
> >
> > -Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -
> WAL@-c8-
> > Head                2196          11.01 GB       2758           13.88
> GB
> > WAL modification    3295           5.87 GB       5472            9.02
> GB
> 
> And test results on normal pgbench?

configuration: 

shared_buffers = 4GB 
wal_buffers = 16MB 
checkpoint_segments = 256 
checkpoint_interval = 15min 
autovacuum = off 
server_encoding = SQL_ASCII 
client_encoding = UTF8 
lc_collate = C 
lc_ctype = C 


init: 

pgbench -s 75 -i -F 80 

run: 

pgbench -T 600 


Test results with original pgbench (synccommit off) on the latest patch: 


-Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
Head                1459          1.40 GB        2491           1.70 GB
WAL modification    1558          1.38 GB        2441           1.59 GB


-Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
Head                5139          2.49 GB        10651          4.72 GB
WAL modification    5224          2.28 GB        11329          3.96 GB 



Test results with original pgbench (synccommit on) on the latest patch: 


-Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
Head                146           0.45 GB        167            0.49 GB
WAL modification    144           0.44 GB        166            0.49 GB


-Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
Head                325           0.77 GB        603            1.03 GB
WAL modification    321           0.76 GB        604            1.01 GB



The results are similar as noted by Kyotaro-San. The WAL size is reduced
even for original pgbench.
There is slight performance dip in some of the cases for original pgbench.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 10:57:57

On 11 January 2013 10:40, Amit Kapila <amit.kapila@huawei.com> wrote:

> Test results with original pgbench (synccommit off) on the latest patch:
>
>
> -Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
> Head                1459          1.40 GB        2491           1.70 GB
> WAL modification    1558          1.38 GB        2441           1.59 GB
>
>
> -Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
> Head                5139          2.49 GB        10651          4.72 GB
> WAL modification    5224          2.28 GB        11329          3.96 GB

> There is slight performance dip in some of the cases for original pgbench.

Is this just one run? Can we see 3 runs please?

Can we investigate the performance dip at c=2?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

11 January 2013, 12:31:03

On Friday, January 11, 2013 4:28 PM Simon Riggs wrote:
> On 11 January 2013 10:40, Amit Kapila <amit.kapila@huawei.com> wrote:
> 
> > Test results with original pgbench (synccommit off) on the latest
> patch:
> >
> >
> > -Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -
> WAL@-c2-
> > Head                1459          1.40 GB        2491           1.70
> GB
> > WAL modification    1558          1.38 GB        2441           1.59
> GB
> >
> >
> > -Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -
> WAL@-c8-
> > Head                5139          2.49 GB        10651          4.72
> GB
> > WAL modification    5224          2.28 GB        11329          3.96
> GB
> 
> > There is slight performance dip in some of the cases for original
> pgbench.
> 
> Is this just one run? Can we see 3 runs please?
 This average of 3 runs.
-Patch-               -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2- Head-1                 1648          1.47
GB       2491           1.69 GB Head-2                 1538          1.43 GB        2529           1.72 GB Head-3
         1192          1.31 GB        2453           1.70 GB
 
 AvgHead                1459          1.40 GB        2491           1.70 GB
 WAL modification-1      1618          1.40 GB        2351           1.56
GB WAL modification-2      1623          1.40 GB        2411           1.59
GB WAL modification-3      1435          1.34 GB        2562           1.61
GB
 WAL modification-Avg    1558          1.38 GB        2441           1.59
GB


-Patch-               -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8- Head-1                 5285          2.53
GB       11858           5.43
 
GB Head-2                 5105          2.47 GB        10724           4.98
GB Head-3                 5029          2.46 GB        9372            3.75
GB
 AvgHead                5139          2.49 GB        10651           4.72
GB
 WAL modification-1      5117          2.26 GB        12092           4.42
GB WAL modification-2      5142          2.26 GB        9965            3.48
GB WAL modification-3      5413          2.33 GB        11930           3.99
GB
 WAL modification-Avg    5224          2.28 GB        11329           3.96
GB 


> Can we investigate the performance dip at c=2? Please consider following points for this dip: 1. For synchronous
commit= off, there is always slight variation in data. 2. The size of WAL is reduced. 3. For small rows (128 bytes),
sometimesthe performance difference
 
created by this algorithm doesn't help much,     as the size is not reduced significantly and there is equivalent
overhead for delta compression.     We can put check that this optimization should be applied if row length
is greater than some     threshold(128 bytes, 200 bytes), but I feel as performance dip is not
much and WAL reduction gain is also      there, so it should be okay without any check as well.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 12:47:46

On 11 January 2013 12:30, Amit Kapila <amit.kapila@huawei.com> wrote:

>> Is this just one run? Can we see 3 runs please?
>
>   This average of 3 runs.

The results are so variable its almost impossible to draw any
conclusions at all. I think if we did harder stats on those we'd get
nothing.

Can you do something to bring that in? Or just do more tests to get a
better view?


>  -Patch-               -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2-
>   Head-1                 1648          1.47 GB        2491           1.69 GB
>   Head-2                 1538          1.43 GB        2529           1.72 GB
>   Head-3                 1192          1.31 GB        2453           1.70 GB
>
>   AvgHead                1459          1.40 GB        2491           1.70 GB
>
>   WAL modification-1      1618          1.40 GB        2351           1.56
> GB
>   WAL modification-2      1623          1.40 GB        2411           1.59
> GB
>   WAL modification-3      1435          1.34 GB        2562           1.61
> GB
>
>   WAL modification-Avg    1558          1.38 GB        2441           1.59
> GB
>
>
> -Patch-               -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8-
>   Head-1                 5285          2.53 GB        11858           5.43
> GB
>   Head-2                 5105          2.47 GB        10724           4.98
> GB
>   Head-3                 5029          2.46 GB        9372            3.75
> GB
>
>   AvgHead                5139          2.49 GB        10651           4.72
> GB
>
>   WAL modification-1      5117          2.26 GB        12092           4.42
> GB
>   WAL modification-2      5142          2.26 GB        9965            3.48
> GB
>   WAL modification-3      5413          2.33 GB        11930           3.99
> GB
>
>   WAL modification-Avg    5224          2.28 GB        11329           3.96
> GB
>
>
>> Can we investigate the performance dip at c=2?
>   Please consider following points for this dip:
>   1. For synchronous commit = off, there is always slight variation in data.
>   2. The size of WAL is reduced.
>   3. For small rows (128 bytes), sometimes the performance difference
> created by this algorithm doesn't help much,
>      as the size is not reduced significantly and there is equivalent
> overhead for delta compression.
>      We can put check that this optimization should be applied if row length
> is greater than some
>      threshold(128 bytes, 200 bytes), but I feel as performance dip is not
> much and WAL reduction gain is also
>      there, so it should be okay without any check as well.
>
> With Regards,
> Amit Kapila.
>



-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 13:15:26

On 28 December 2012 10:21, Simon Riggs <simon@2ndquadrant.com> wrote:

> * There is a fixed 75% heuristic in the patch.

I'm concerned that we're doing extra work while holding the buffer
locked, which will exacerbate any block contention that exists.

We have a list of the columns that the UPDATE is touching since we use
that to check column permissions for the UPDATE. Which means we should
be able to use that list to check only the columns actually changing
in this UPDATE statement.

That will likely save us some time during the compression check.

Can you look into that please? I don't think it will be much work.

I've moved this to the next CF. I'm planning to review this one first.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

11 January 2013, 13:49:43

On Friday, January 11, 2013 6:18 PM Simon Riggs wrote:
> On 11 January 2013 12:30, Amit Kapila <amit.kapila@huawei.com> wrote:
> 
> >> Is this just one run? Can we see 3 runs please?
> >
> >   This average of 3 runs.
> 
> The results are so variable its almost impossible to draw any
> conclusions at all. I think if we did harder stats on those we'd get
> nothing.
> 
> Can you do something to bring that in? Or just do more tests to get a
> better view?

To be honest, I have tried this set of 3 readings 2 times and there is
similar fluctuation for sync commit =off
What I can do is early next week, 
a. I can run this test for 10 times to see the results.
b. run the tests for record length-256 instead of 128

However I think my results of sync commit = on is matching with Kyotaro-San.

Please suggest if you have anything in mind?

This is for sync mode= off, if see the result on sync mode= on, it is
comparatively consistent. 
I think for sync commit = off, there is always fluctuation in results. 
The sync mode= on, results are as below:
-Patch-             -tps@-c1-     -WAL@-c1-      -tps@-c2-      -WAL@-c2- Head-1              149          0.46 GB
 160           0.48
 
GB Head-2              145          0.45 GB        180           0.52
GB Head-3              144          0.45 GB        161           0.48
GB
 WAL modification-1    142          0.44 GB        161           0.48 GB WAL modification-2    146          1.45 GB
  162           0.48 GB WAL modification-3    144          1.44 GB        175           0.51 GB
 
-Patch-             -tps@-c4-     -WAL@-c4-      -tps@-c8-      -WAL@-c8- Head-1              325          0.77 GB
 602           1.03
 
GB Head-2              328          0.77 GB        606           1.03
GB Head-3              323          0.77 GB        603           1.03
GB
 WAL modification-1    324          0.76 GB        604           1.01 GB WAL modification-2    322          0.76 GB
  604           1.01 GB WAL modification-3    317          0.75 GB        604           1.01 GB
 
> >
> >
> >> Can we investigate the performance dip at c=2?
> >   Please consider following points for this dip:
> >   1. For synchronous commit = off, there is always slight variation
> in data.
> >   2. The size of WAL is reduced.
> >   3. For small rows (128 bytes), sometimes the performance difference
> > created by this algorithm doesn't help much,
> >      as the size is not reduced significantly and there is equivalent
> > overhead for delta compression.
> >      We can put check that this optimization should be applied if row
> length
> > is greater than some
> >      threshold(128 bytes, 200 bytes), but I feel as performance dip
> is not
> > much and WAL reduction gain is also
> >      there, so it should be okay without any check as well.
> >
> > With Regards,
> > Amit Kapila.
> >

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit Kapila

Date:

11 January 2013, 14:25:09

On Friday, January 11, 2013 6:45 PM Simon Riggs wrote:
> On 28 December 2012 10:21, Simon Riggs <simon@2ndquadrant.com> wrote:
> 
> > * There is a fixed 75% heuristic in the patch.
> 
> I'm concerned that we're doing extra work while holding the buffer
> locked, which will exacerbate any block contention that exists.
> 
> We have a list of the columns that the UPDATE is touching since we use
> that to check column permissions for the UPDATE. Which means we should
> be able to use that list to check only the columns actually changing
> in this UPDATE statement.
> 
> That will likely save us some time during the compression check.
> 
> Can you look into that please? I don't think it will be much work.

IIUC, I have done that way in the initial version of the patch that is do
encoding for modified columns.
I have mentioned reference of my initial patch as below:

modifiedCols = (rt_fetch(resultRelInfo->ri_RangeTableIndex, 
+
estate->es_range_table)->modifiedCols); 

http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
52DE51@szxeml509-mbs

1. However Heikki has pointed, it has some problems similar to for HOT
implementation and that is the reason we have done memcmp for HOT.
2. Also we have found in initial readings that this doesn't have any
performance difference as compare to current Approach.
> I've moved this to the next CF. I'm planning to review this one first.

Thank you.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Alvaro Herrera

Date:

11 January 2013, 14:30:03

Simon Riggs wrote:
> On 28 December 2012 10:21, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> > * There is a fixed 75% heuristic in the patch.
>
> I'm concerned that we're doing extra work while holding the buffer
> locked, which will exacerbate any block contention that exists.
>
> We have a list of the columns that the UPDATE is touching since we use
> that to check column permissions for the UPDATE. Which means we should
> be able to use that list to check only the columns actually changing
> in this UPDATE statement.

But that doesn't include columns changed by triggers, AFAIR, so you
could only use that if there weren't any triggers.

I was also worried about the high variance in the results.  Those
averages look rather meaningless.  Which would be okay, I think, because
it'd mean that performance-wise the patch is a wash, but it is still
achieving a lower WAL volume, which is good.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 14:35:06

On 11 January 2013 14:29, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> But that doesn't include columns changed by triggers, AFAIR, so you
> could only use that if there weren't any triggers.

True, well spotted

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 15:58:01

On 11 January 2013 14:24, Amit Kapila <amit.kapila@huawei.com> wrote:

> http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
> 52DE51@szxeml509-mbs
>
> 1. However Heikki has pointed, it has some problems similar to for HOT
> implementation and that is the reason we have done memcmp for HOT.
> 2. Also we have found in initial readings that this doesn't have any
> performance difference as compare to current Approach.

OK, forget that idea.

>> I've moved this to the next CF. I'm planning to review this one first.
>
> Thank you.

Just reviewing the patch now, making more sense with comments added.

In heap_delta_encode() do we store which columns have changed? Do we
store the whole new column value?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

11 January 2013, 17:08:38

On Friday, January 11, 2013 9:27 PM Simon Riggs wrote:
On 11 January 2013 14:24, Amit Kapila <amit.kapila@huawei.com> wrote:

>> http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
>> 52DE51@szxeml509-mbs
>
>> 1. However Heikki has pointed, it has some problems similar to for HOT
>> implementation and that is the reason we have done memcmp for HOT.
>> 2. Also we have found in initial readings that this doesn't have any
>> performance difference as compare to current Approach.

>OK, forget that idea.

>>> I've moved this to the next CF. I'm planning to review this one first.
>
>> Thank you.

> Just reviewing the patch now, making more sense with comments added.

>In heap_delta_encode() do we store which columns have changed?

Not the attribute bumberwise, but offsetwise it is stored.

> Do we store the whole new column value?

Yes, please refer else part of code

+         else
+         {
+             data_len = new_tup_off - change_off;
+             if ((bp + (2 * data_len)) - bstart >= result_max)
+                 return false;
+
+             /* Copy the modified column data to the output buffer if present */
+             pglz_out_add(ctrlp, ctrlb, ctrl, bp, data_len, dp);
+

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

11 January 2013, 17:30:33

On Friday, January 11, 2013 7:59 PM Alvaro Herrera wrote:
Simon Riggs wrote:
> On 28 December 2012 10:21, Simon Riggs <simon@2ndquadrant.com> wrote:
>

> I was also worried about the high variance in the results.  Those
> averages look rather meaningless.  Which would be okay, I think, because
> it'd mean that performance-wise the patch is a wash,

For larger tuple sizes (>1000 && < 1800), the performance gain will be good.
Please refer performance results by me and Kyotaro-san in below links:

http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C383BEAAE32@szxeml509-mbx
http://archives.postgresql.org/message-id/20121228.170748.90887322.horiguchi.kyotaro@lab.ntt.co.jp

In fact, I believe for all tuples with length between 200 to 1800 bytes and changed values around 15~20%, there will be
bothperformance gain as well as WAL reduction. 
The reason for keeping the logic same for smaller tuples (<=128 bytes) also same, that there is no much performance
differencebut still WAL reduction gain is visible. 

> but it is still achieving a lower WAL volume, which is good.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 17:39:35

On 11 January 2013 17:08, Amit kapila <amit.kapila@huawei.com> wrote:

>> Just reviewing the patch now, making more sense with comments added.
>
>>In heap_delta_encode() do we store which columns have changed?
>
> Not the attribute bumberwise, but offsetwise it is stored.

(Does that mean "numberwise"??)

Can we identify which columns have changed? i.e. 1st, 3rd and 12th columns?


>> Do we store the whole new column value?
>
> Yes, please refer else part of code
>
> +               else
> +               {
> +                       data_len = new_tup_off - change_off;
> +                       if ((bp + (2 * data_len)) - bstart >= result_max)
> +                               return false;
> +
> +                       /* Copy the modified column data to the output buffer if present */
> +                       pglz_out_add(ctrlp, ctrlb, ctrl, bp, data_len, dp);
> +
>

"modified column data" could mean either 1) (modified column) data
i.e. the data for the modified column, or 2) modified (column data)
i.e. the modified data in the column. I read that as (2) and didn't
look at the code. ;-)

Happy now that I know its (1)

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 17:42:18

On 11 January 2013 17:30, Amit kapila <amit.kapila@huawei.com> wrote:
> On Friday, January 11, 2013 7:59 PM Alvaro Herrera wrote:
> Simon Riggs wrote:
>> On 28 December 2012 10:21, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>
>> I was also worried about the high variance in the results.  Those
>> averages look rather meaningless.  Which would be okay, I think, because
>> it'd mean that performance-wise the patch is a wash,
>
> For larger tuple sizes (>1000 && < 1800), the performance gain will be good.
> Please refer performance results by me and Kyotaro-san in below links:
>
> http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C383BEAAE32@szxeml509-mbx
> http://archives.postgresql.org/message-id/20121228.170748.90887322.horiguchi.kyotaro@lab.ntt.co.jp

AFAICS your tests are badly variable, but as Alvaro says, they aren't
accurate enough to tell there's a regression.

I'll assume not and carry on.

(BTW the rejection of the null bitmap patch because of a performance
regression may also need to be reconsidered).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

11 January 2013, 18:11:25

On Friday, January 11, 2013 11:09 PM Simon Riggs wrote:
On 11 January 2013 17:08, Amit kapila <amit.kapila@huawei.com> wrote:

>>> Just reviewing the patch now, making more sense with comments added.
>
>>>In heap_delta_encode() do we store which columns have changed?
>
>> Not the attribute bumberwise, but offsetwise it is stored.

> (Does that mean "numberwise"??)  Yes.

> Can we identify which columns have changed? i.e. 1st, 3rd and 12th columns? As per current algorithm, we can't as it
isbased on offsets. What I mean to say is that the basic idea to reconstruct tuple during recovery  is copy data from
oldtuple offset-wise (offsets stored in encoded tuple) and use new data (modified column data) from encoded tuple
directly.So we don't need exact column numbers. 

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

11 January 2013, 18:15:29

On Friday, January 11, 2013 11:12 PM Simon Riggs wrote:
On 11 January 2013 17:30, Amit kapila <amit.kapila@huawei.com> wrote:
> On Friday, January 11, 2013 7:59 PM Alvaro Herrera wrote:
> Simon Riggs wrote:
>> On 28 December 2012 10:21, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>
>>> I was also worried about the high variance in the results.  Those
>>> averages look rather meaningless.  Which would be okay, I think, because
>>> it'd mean that performance-wise the patch is a wash,
>
>> For larger tuple sizes (>1000 && < 1800), the performance gain will be good.
>> Please refer performance results by me and Kyotaro-san in below links:
>
>> http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C383BEAAE32@szxeml509-mbx
>> http://archives.postgresql.org/message-id/20121228.170748.90887322.horiguchi.kyotaro@lab.ntt.co.jp

>AFAICS your tests are badly variable, but as Alvaro says, they aren't
>accurate enough to tell there's a regression.

>I'll assume not and carry on.

> (BTW the rejection of the null bitmap patch because of a performance
> regression may also need to be reconsidered).
 I can post detailed numbers during next commit fest.

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

11 January 2013, 18:53:57

On 11 January 2013 18:11, Amit kapila <amit.kapila@huawei.com> wrote:

>> Can we identify which columns have changed? i.e. 1st, 3rd and 12th columns?
>   As per current algorithm, we can't as it is based on offsets.
>   What I mean to say is that the basic idea to reconstruct tuple during recovery
>   is copy data from old tuple offset-wise (offsets stored in encoded tuple) and use new data (modified column data)
>   from encoded tuple directly. So we don't need exact column numbers.

Another patch is going through next CF related to reassembling changes
from WAL records.

To do that efficiently, we would want to store a bitmap showing which
columns had changed in each update. Would that be an easy addition, or
is that blocked by some aspect of the current design?

The idea would be that we could re-construct an UPDATE statement that
would perform exactly the same change, yet without needing to refer to
a base tuple.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

12 January 2013, 03:50:26

On Saturday, January 12, 2013 12:23 AM Simon Riggs wrote:
On 11 January 2013 18:11, Amit kapila <amit.kapila@huawei.com> wrote:

>>> Can we identify which columns have changed? i.e. 1st, 3rd and 12th columns?
>>   As per current algorithm, we can't as it is based on offsets.
>>   What I mean to say is that the basic idea to reconstruct tuple during recovery
>>   is copy data from old tuple offset-wise (offsets stored in encoded tuple) and use new data (modified column data)
>>   from encoded tuple directly. So we don't need exact column numbers.

> Another patch is going through next CF related to reassembling changes
> from WAL records.

> To do that efficiently, we would want to store a bitmap showing which
> columns had changed in each update. Would that be an easy addition, or
> is that blocked by some aspect of the current design?
 I don't think it should be a problem, as it can go in current way of WAL tuple construction as  we do in this patch
whenold and new buf are different. This differentiation is done in  log_heap_update. 
 IMO, for now we can avoid this optimization (way we have done incase updated tuple is not on same page)  for the
bitmapstoring patch and later we can evaluate if we can do this optimization for  the feature of that patch.  

> The idea would be that we could re-construct an UPDATE statement that
> would perform exactly the same change, yet without needing to refer to
> a base tuple.
 I understood, that such a functionality would be needed by logical replication.


With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

12 January 2013, 10:15:58

On 12 January 2013 03:50, Amit kapila <amit.kapila@huawei.com> wrote:
> On Saturday, January 12, 2013 12:23 AM Simon Riggs wrote:
> On 11 January 2013 18:11, Amit kapila <amit.kapila@huawei.com> wrote:
>
>>>> Can we identify which columns have changed? i.e. 1st, 3rd and 12th columns?
>>>   As per current algorithm, we can't as it is based on offsets.
>>>   What I mean to say is that the basic idea to reconstruct tuple during recovery
>>>   is copy data from old tuple offset-wise (offsets stored in encoded tuple) and use new data (modified column
data)
>>>   from encoded tuple directly. So we don't need exact column numbers.
>
>> Another patch is going through next CF related to reassembling changes
>> from WAL records.
>
>> To do that efficiently, we would want to store a bitmap showing which
>> columns had changed in each update. Would that be an easy addition, or
>> is that blocked by some aspect of the current design?
>
>   I don't think it should be a problem, as it can go in current way of WAL tuple construction as
>   we do in this patch when old and new buf are different. This differentiation is done in
>   log_heap_update.
>
>   IMO, for now we can avoid this optimization (way we have done incase updated tuple is not on same page)
>   for the bitmap storing patch and later we can evaluate if we can do this optimization for
>   the feature of that patch.

Yes, we can simply disable this feature. But that is just bad planning
and we should give some thought to having new features play nicely
together.

I would like to work out how to modify this so it can work with wal
decoding enabled. I know we can do this, I want to look at how,
because we know we're going to do it.

>> The idea would be that we could re-construct an UPDATE statement that
>> would perform exactly the same change, yet without needing to refer to
>> a base tuple.
>
>   I understood, that such a functionality would be needed by logical replication.

Yes, though the features being added are to allow decoding of WAL for
any purpose.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Simon Riggs

Date:

12 January 2013, 11:06:16

On 11 January 2013 15:57, Simon Riggs <simon@2ndquadrant.com> wrote:

>>> I've moved this to the next CF. I'm planning to review this one first.
>>
>> Thank you.
>
> Just reviewing the patch now, making more sense with comments added.

Making more sense, but not yet making complete sense.

I'd like you to revisit the patch comments since some of them are
completely unreadable.

Examples

"Frames the original tuple which needs to be inserted into the heap by
decoding the WAL tuplewith the help of old Heap tuple."
"The delta tuples for update WAL is to eliminate copying the entire
the new record to WAL for the update operation."

I don't mind rewording the odd line here and there, that's just normal
editing, but this needs extensive work in terms of number of places
requiring change and the level of change at each place. That's not
easy for me to do when I'm trying to understand the patch in the first
place. My own written English isn't that great, so please read some of
the other comments in other parts of the code so you can see the level
of clarity that's needed in PostgreSQL.

Copying chunks of text from other comments doesn't help much either,
especially when you miss out parts of the explanation. You refer to a
"history tag" but don't define it that well, and don't explain why it
might sometimes be 3 bytes, or what that means. pg_lzcompress doesn't
call it that either, which is confusing. If you use a concept from
elsewhere you should either use the same name, or if you rename it,
rename it in both places.

/** Do only the delta encode when the update is going to the same page and* buffer doesn't need a backup block in case
offull-pagewrite is on.*/

if ((oldbuf == newbuf) && !XLogCheckBufferNeedsBackup(newbuf))

The comment above says nothing. I can see that oldbuf and newbuf must
be the same and the call to XLogCheckBufferNeedsBackup is clear
because the function is already well named.

What I'd expect to see here is a discussion of why this test is being
applied and maybe why it is applied here. Such an important test
deserves a long discussion, perhaps 10-20 lines of comment.

Thanks

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

12 January 2013, 14:07:35

On Saturday, January 12, 2013 3:45 PM Simon Riggs wrote:
On 12 January 2013 03:50, Amit kapila <amit.kapila@huawei.com> wrote:
> On Saturday, January 12, 2013 12:23 AM Simon Riggs wrote:
> On 11 January 2013 18:11, Amit kapila <amit.kapila@huawei.com> wrote:
>
>>>>> Can we identify which columns have changed? i.e. 1st, 3rd and 12th columns?
>>>>   As per current algorithm, we can't as it is based on offsets.
>>>>   What I mean to say is that the basic idea to reconstruct tuple during recovery
>>>>   is copy data from old tuple offset-wise (offsets stored in encoded tuple) and use new data (modified column
data)
>>>>   from encoded tuple directly. So we don't need exact column numbers.
>
>>> Another patch is going through next CF related to reassembling changes
>>> from WAL records.
>
>>> To do that efficiently, we would want to store a bitmap showing which
>>> columns had changed in each update. Would that be an easy addition, or
>>> is that blocked by some aspect of the current design?
>
>>   I don't think it should be a problem, as it can go in current way of WAL tuple construction as
>>   we do in this patch when old and new buf are different. This differentiation is done in
>>   log_heap_update.
>
>>   IMO, for now we can avoid this optimization (way we have done incase updated tuple is not on same page)
>>   for the bitmap storing patch and later we can evaluate if we can do this optimization for
>>   the feature of that patch.

> Yes, we can simply disable this feature. But that is just bad planning
> and we should give some thought to having new features play nicely
> together.

> I would like to work out how to modify this so it can work with wal
> decoding enabled. I know we can do this, I want to look at how,
> because we know we're going to do it.
 I am sure this can be done, as for WAL decoding we mainly new values and column numbers So if we include bitmap in WAL
tupleand teach WAL decoding method how to decode this new format WAL tuple it can be done. However it will need changes
inalgorithm for both the patches and it can be risk for one or for both patches. I am open to have discussion about how
bothcan work together, but IMHO at this moment (as this will be last CF) it will be little risky. If there is some way
suchthat with minor modifications, we can address this scenario, I will be happy to see both working together. 

With Regards,
Amit Kapila.

Re: Performance Improvement by reducing WAL for Update Operation

From

Amit kapila

Date:

12 January 2013, 14:10:54

On Saturday, January 12, 2013 4:36 PM Simon Riggs wrote:
On 11 January 2013 15:57, Simon Riggs <simon@2ndquadrant.com> wrote:

>>>> I've moved this to the next CF. I'm planning to review this one first.
>>
>>> Thank you.
>
>> Just reviewing the patch now, making more sense with comments added.

> Making more sense, but not yet making complete sense.

> I'd like you to revisit the patch comments since some of them are
> completely unreadable.
 I will once again review all the comments and make them more meaningful.

With Regards,
Amit Kapila.