Re: Performance Improvement by reducing WAL for Update Operation - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Performance Improvement by reducing WAL for Update Operation |
Date | |
Msg-id | 51366323.8070606@vmware.com Whole thread Raw |
In response to | Re: Performance Improvement by reducing WAL for Update Operation (Amit Kapila <amit.kapila@huawei.com>) |
Responses |
Re: Performance Improvement by reducing WAL for Update Operation
Re: Performance Improvement by reducing WAL for Update Operation Re: Performance Improvement by reducing WAL for Update Operation Re: Performance Improvement by reducing WAL for Update Operation |
List | pgsql-hackers |
On 04.03.2013 06:39, Amit Kapila wrote: > On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote: >> On 02/05/2013 11:53 PM, Amit Kapila wrote: >>>> Performance data for the patch is attached with this mail. >>>> Conclusions from the readings (these are same as my previous patch): >>>> >>>> 1. With orignal pgbench there is a max 7% WAL reduction with not >> much >>>> performance difference. >>>> 2. With 250 record pgbench there is a max wal reduction of 35% with >> not >>>> much performance difference. >>>> 3. With 500 and above record size in pgbench there is an improvement >> in >>>> the performance and wal reduction both. >>>> >>>> If the record size increases there is a gain in performance and wal >>>> size is reduced as well. >>>> >>>> Performance data for synchronous_commit = on is under progress, I >> shall >>>> post it once it is done. >>>> I am expecting it to be same as previous. >>> Please find the performance readings for synchronous_commit = on. >>> >>> Each run is taken for 20 min. >>> >>> Conclusions from the readings with synchronous commit on mode: >>> >>> 1. With orignal pgbench there is a max 2% WAL reduction with not much >>> performance difference. >>> 2. With 500 record pgbench there is a max wal reduction of 3% with >> not much >>> performance difference. >>> 3. With 1800 record size in pgbench there is both an improvement in >> the >>> performance (approx 3%) as well as wal reduction (44%). >>> >>> If the record size increases there is a very good reduction in WAL >> size. >> >> The stats look fairly sane. I'm a little concerned about the apparent >> trend of falling TPS in the patched vs original tests for the 1-client >> test as record size increases, but it's only 0.0%->0.2%->0.4%, and the >> 0.4% case made other config changes too. Nonetheless, it might be wise >> to check with really big records and see if the trend continues. > > For bigger size (~2000) records, it goes into toast, for which we don't do > this optimization. > This optimization is mainly for medium size records. I've been doing investigating the pglz option further, and doing performance comparisons of the pglz approach and this patch. I'll begin with some numbers: unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45): testname | wal_generated | duration -----------------------------------------+---------------+------------------ two short fields, no change | 1245525360 | 9.94613695144653 two short fields, one changed | 1245536528 | 10.146910905838 two short fields, both changed | 1245523160 | 11.2332470417023 one short and one long field, no change | 1054926504 | 5.90477800369263 ten tiny fields, all changed | 1411774608 | 13.4536008834839 hundred tiny fields, all changed | 635739680 | 7.57448387145996 hundred tiny fields, half changed | 636930560 | 7.56888699531555 hundred tiny fields, half nulled | 573751120 | 6.68991994857788 Amit's wal_update_changes_v10.patch: testname | wal_generated | duration -----------------------------------------+---------------+------------------ two short fields, no change | 1249722112 | 13.0558869838715 two short fields, one changed | 1246145408 | 12.9947438240051 two short fields, both changed | 1245951056 | 13.0262880325317 one short and one long field, no change | 678480664 | 5.70031690597534 ten tiny fields, all changed | 1328873920 | 20.0167419910431 hundred tiny fields, all changed | 638149416 | 14.4236788749695 hundred tiny fields, half changed | 635560504 | 14.8770561218262 hundred tiny fields, half nulled | 558468352 | 16.2437210083008 pglz-with-micro-optimizations-1.patch: testname | wal_generated | duration -----------------------------------------+---------------+------------------ two short fields, no change | 1245519008 | 11.6702048778534 two short fields, one changed | 1245756904 | 11.3233819007874 two short fields, both changed | 1249711088 | 11.6836447715759 one short and one long field, no change | 664741392 | 6.44810795783997 ten tiny fields, all changed | 1328085568 | 13.9679481983185 hundred tiny fields, all changed | 635974088 | 9.15514206886292 hundred tiny fields, half changed | 636309040 | 9.13769292831421 hundred tiny fields, half nulled | 496396448 | 8.77351498603821 In each test, a table is created with a large number of identical rows, and fillfactor=50. Then a full-table UPDATE is performed, and the UPDATE is timed. Duration is the time spent in the UPDATE (lower is better), and wal_generated is the amount of WAL generated by the updates (lower is better). The summary is that Amit's patch is a small win in terms of CPU usage, in the best case where the table has few columns, with one large column that is not updated. In all other cases it just adds overhead. In terms of WAL size, you get a big gain in the same best case scenario. Attached is a different version of this patch, which uses the pglz algorithm to spot the similarities between the old and new tuple, instead of having explicit knowledge of where the column boundaries are. This has the advantage that it will spot similarities, and be able to compress, in more cases. For example, you can see a reduction in WAL size in the "hundred tiny fields, half nulled" test case above. The attached patch also just adds overhead in most cases, but the overhead is much smaller in the worst case. I think that's the right tradeoff here - we want to avoid scenarios where performance falls off the cliff. That said, if you usually just get a slowdown, we certainly can't make this the default, and if we can't turn it on by default, this probably just isn't worth it. The attached patch contains the variable-hash-size changes I posted in the "Optimizing pglz compressor". But in the delta encoding function, it goes further than that, and contains some further micro-optimizations: the hash is calculated in a rolling fashion, and it uses a specialized version of the pglz_hist_add macro that knows that the input can't exceed 4096 bytes. Those changes shaved off some cycles, but you could probably do more. One idea is to only add every 10 bytes or so to the history lookup table; that would sacrifice some compressibility for speed. If you could squeeze pglz_delta_encode function to be cheap enough that we could enable this by default, this would be pretty cool patch. Or at least, the overhead in the cases that you get no compression needs to be brought down, to about 2-5 % at most I think. If it can't be done easily, I feel that this probably needs to be dropped. PS. I haven't done much testing of WAL redo, so it's quite possible that the encoding is actually buggy, or that decoding is slow. But I don't think there's anything so fundamentally wrong that it would affect the performance results much. - Heikki
Attachment
pgsql-hackers by date: