Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Performance Improvement by reducing WAL for Update Operation
Date
Msg-id 007001ce1a5b$3abb42a0$b031c7e0$@kapila@huawei.com
Whole thread Raw
In response to Re: Performance Improvement by reducing WAL for Update Operation  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
> On 04.03.2013 06:39, Amit Kapila wrote:
> > On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
> >> On 02/05/2013 11:53 PM, Amit Kapila wrote:
> >>>> Performance data for the patch is attached with this mail.
> >>>> Conclusions from the readings (these are same as my previous
> patch):
> >>>>
> 
> I've been doing investigating the pglz option further, and doing
> performance comparisons of the pglz approach and this patch. I'll begin
> with some numbers:
> 
> unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45):
> 
>                  testname                 | wal_generated |
> duration
> 
> -----------------------------------------+---------------+-------------
> -
> -----------------------------------------+---------------+----
>   two short fields, no change             |    1245525360 |
> 9.94613695144653
>   two short fields, one changed           |    1245536528 |
> 10.146910905838
>   two short fields, both changed          |    1245523160 |
> 11.2332470417023
>   one short and one long field, no change |    1054926504 |
> 5.90477800369263
>   ten tiny fields, all changed            |    1411774608 |
> 13.4536008834839
>   hundred tiny fields, all changed        |     635739680 |
> 7.57448387145996
>   hundred tiny fields, half changed       |     636930560 |
> 7.56888699531555
>   hundred tiny fields, half nulled        |     573751120 |
> 6.68991994857788
> 
> Amit's wal_update_changes_v10.patch:
> 
>                  testname                 | wal_generated |
> duration
> 
> -----------------------------------------+---------------+-------------
> -
> -----------------------------------------+---------------+----
>   two short fields, no change             |    1249722112 |
> 13.0558869838715
>   two short fields, one changed           |    1246145408 |
> 12.9947438240051
>   two short fields, both changed          |    1245951056 |
> 13.0262880325317
>   one short and one long field, no change |     678480664 |
> 5.70031690597534
>   ten tiny fields, all changed            |    1328873920 |
> 20.0167419910431
>   hundred tiny fields, all changed        |     638149416 |
> 14.4236788749695
>   hundred tiny fields, half changed       |     635560504 |
> 14.8770561218262
>   hundred tiny fields, half nulled        |     558468352 |
> 16.2437210083008
> 
> pglz-with-micro-optimizations-1.patch:
> 
>                   testname                 | wal_generated |
> duration
> -----------------------------------------+---------------+-------------
> -
> -----------------------------------------+---------------+----
>   two short fields, no change             |    1245519008 |
> 11.6702048778534
>   two short fields, one changed           |    1245756904 |
> 11.3233819007874
>   two short fields, both changed          |    1249711088 |
> 11.6836447715759
>   one short and one long field, no change |     664741392 |
> 6.44810795783997
>   ten tiny fields, all changed            |    1328085568 |
> 13.9679481983185
>   hundred tiny fields, all changed        |     635974088 |
> 9.15514206886292
>   hundred tiny fields, half changed       |     636309040 |
> 9.13769292831421
>   hundred tiny fields, half nulled        |     496396448 |
> 8.77351498603821

For some of the tests, it doesn't even execute main part of
compression/encoding. 
The reason is that the length of tuple is less than strategy min length, so
it returns from below check
in function pglz_delta_encode()
if (strategy->match_size_good <= 0 ||                slen < strategy->min_input_size ||                slen >
strategy->max_input_size)               return false;
 

The tests for which it doesn't execute encoding are below:
two short fields, no change
two short fields, one changed     
two short fields, both changed
ten tiny fields, all changed                                         


For above cases, the reason of difference in timings for both approaches
with original could be due to the reason that
this check is done after some processing. So I think if we check the length
in log_heap_update, then
there should not be timing difference for above test scenario's. I can check
that once.

This optimization helps only when tuple is of length > 128~200 bytes and
upto 1800 bytes (till it turns to toast), otherwise it could result in
overhead without any major WAL reduction. 
Infact I think in one of my initial patch there is a check if length of
tuple is greater than 128 bytes then perform the optimization.

I shall try to run both patches for cases when length of tuple is > 128~200
bytes, as this optimization has benefits in those cases. 

> In each test, a table is created with a large number of identical rows,
> and fillfactor=50. Then a full-table UPDATE is performed, and the
> UPDATE is timed. Duration is the time spent in the UPDATE (lower is
> better), and wal_generated is the amount of WAL generated by the
> updates (lower is better).
> 
> The summary is that Amit's patch is a small win in terms of CPU usage,
> in the best case where the table has few columns, with one large column
> that is not updated. In all other cases it just adds overhead. In terms
> of WAL size, you get a big gain in the same best case scenario.
> 
> Attached is a different version of this patch, which uses the pglz
> algorithm to spot the similarities between the old and new tuple,
> instead of having explicit knowledge of where the column boundaries
> are.
> This has the advantage that it will spot similarities, and be able to
> compress, in more cases. For example, you can see a reduction in WAL
> size in the "hundred tiny fields, half nulled" test case above.
> 
> The attached patch also just adds overhead in most cases, but the
> overhead is much smaller in the worst case. I think that's the right
> tradeoff here - we want to avoid scenarios where performance falls off
> the cliff. That said, if you usually just get a slowdown, we certainly
> can't make this the default, and if we can't turn it on by default,
> this probably just isn't worth it.

As I mentioned, for smaller tuples it can be overhead without any major
benefit of WAL reduction, 
so I think before doing encoding it should ensure that tuple length is
greater than some threshold length.
Yes, it can miss some cases like your test has shown for (hundred tiny
fields, half nulled), 
but we might be able to safely enable it for default.

> The attached patch contains the variable-hash-size changes I posted in
> the "Optimizing pglz compressor". But in the delta encoding function,
> it goes further than that, and contains some further micro-
> optimizations:
> the hash is calculated in a rolling fashion, and it uses a specialized
> version of the pglz_hist_add macro that knows that the input can't
> exceed 4096 bytes. Those changes shaved off some cycles, but you could
> probably do more. 

> One idea is to only add every 10 bytes or so to the
> history lookup table; that would sacrifice some compressibility for
> speed.

Do you mean to say roll for 10 times and then call pglz_hist_add_no_recycle
and then same
before pglz_find_match?

I shall try doing this for the tests.

> If you could squeeze pglz_delta_encode function to be cheap enough that
> we could enable this by default, this would be pretty cool patch. Or at
> least, the overhead in the cases that you get no compression needs to
> be brought down, to about 2-5 % at most I think. If it can't be done
> easily, I feel that this probably needs to be dropped.

Agreed, though it gives benefit for some of the cases, but it should not
degrade much
for any of other cases.

One more thing that any compression technique will have some overhead, so it
should be
used optimally rather then in every case. So in that regards, I think we
should do this 
optimization only when it has better chance of win (like based on length of
tuple or some other criteria 
where WAL tuple can be logged as-is). What is your opinion?

> PS. I haven't done much testing of WAL redo, so it's quite possible
> that the encoding is actually buggy, or that decoding is slow. But I
> don't think there's anything so fundamentally wrong that it would
> affect the performance results much.

I also don't think it will have any problem, but I can run some test to
verify the same.

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Materialized View patch broke pg_dump
Next
From: Pavan Deolasee
Date:
Subject: Re: Writable foreign tables: how to identify rows