Re: Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id 6595e618-cbe4-2854-af0a-4a9992293bb7@2ndquadrant.com
Whole thread Raw
In response to Re: Patch: Write Amplification Reduction Method (WARM)  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Patch: Write Amplification Reduction Method (WARM)  (Haribabu Kommi <kommi.haribabu@gmail.com>)
List pgsql-hackers
On 10/06/2016 07:36 AM, Pavan Deolasee wrote:
>
>
> On Wed, Oct 5, 2016 at 1:43 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
...
>     I can confirm the significant speedup, often by more than 75%
>     (depending on number of indexes, whether the data set fits into RAM,
>     etc.). Similarly for the amount of WAL generated, although that's a
>     bit more difficult to evaluate due to full_page_writes.
>
>     I'm not going to send detailed results, as that probably does not
>     make much sense at this stage of the development - I can repeat the
>     tests once the open questions get resolved.
>
>
> Sure. Anything that stands out? Any regression that you see? I'm not
> sure if your benchmarks exercise the paths which might show overheads
> without any tangible benefits. For example, I wonder if a test with many
> indexes where most of them get updated and then querying the table via
> those updated indexes could be one such test case.
>

No, nothing that would stand out. Let me explain what benchmark(s) I've 
done. I've made some minor mistakes when running the benchmarks, so I 
plan to rerun them and post the results after that. So let's take the 
data with a grain of salt.

My goal was to compare current non-HOT behavior (updating all indexes) 
with the WARM (updating only indexes on modified columns), and I've 
taken two approaches:

1) fixed number of indexes, update variable number of columns

Create a table with 8 secondary indexes and then run a bunch of 
benchmarks updating increasing number of columns. So the first run did
    UPDATE t SET c1 = c1+1 WHERE id = :id;

while the second did
    UPDATE t SET c1 = c1+1, c2 = c2+1 WHERE id = :id;

and so on, up to updating all the columns in the last run. I've used 
multiple scripts to update all the columns / indexes uniformly 
(essentially using multiple "-f" flags with pgbench). The runs were 
fairly long (2h, enough to get stable behavior).

For a small data set (fits into RAM), the results look like this:
         master  patched     diff    1    5994       8490     +42%    2    4347       7903     +81%    3    4340
7400    +70%    4    4324       6929     +60%    5    4256       6495     +52%    6    4253       5059     +19%    7
4235      4534     +7%    8    4194       4237     +1%
 

and the amount of WAL generated (after correction for tps difference) 
looks like this (numbers are MBs)
         master   patched    diff    1     27257     18508    -32%    2     21753     14599    -33%    3     21912
15864   -28%    4     22021     17135    -22%    5     21819     18258    -16%    6     21929     20659     -6%    7
21994     22234     +1%    8     21851     23267     +6%
 

So this is quite significant difference. I'm pretty sure the minor WAL 
increase for the last two runs is due to full page writes (which also 
affects the preceding runs, making the WAL reduction smaller than the 
tps increase).

I do have results for larger data sets (>RAM), the results are very 
similar although the speedup seems a bit smaller. But I need to rerun those.

2) single-row update, adding indexes between runs

This is kinda the opposite of the previous approach, i.e. transactions 
always update a single column (multiple scripts to update the columns 
uniformly), but there are new indexes added between runs. The results 
(for a large data set, exceeding RAM) look like this:
         master   patched    diff    0       954     1404     +47%    1       701     1045     +49%    2       484
816    +70%    3       346      683     +97%    4       248      608     +145%    5       190      525     +176%    6
   152      397     +161%    7       123      315     +156%    8       123      270     +119%
 

So this looks really interesting.

>
>     There's a lot of useful and important feedback in the thread(s) so
>     far, particularly the descriptions of various failure cases. I think
>     it'd be very useful to collect those examples and turn them into
>     regression tests - that's something the patch should include anyway.
>
>
> Sure. I added only a handful test cases which I knew regression isn't
> covering. But I'll write more of them. One good thing is that the code
> gets heavily exercised even during regression. I caught and fixed
> multiple bugs running regression. I'm not saying that's enough, but it
> certainly gives some confidence.
>

I don't see any changes to src/test in the patch, so I'm not sure what 
you mean when you say you added a handful of test cases?

>
>
>     and update:
>
>         update t set a = a+1, b=b+1;
>
>     which has to update all indexes on the table, but:
>
>         select n_tup_upd, n_tup_hot_upd from pg_stat_user_tables
>
>          n_tup_upd | n_tup_hot_upd
>         -----------+---------------
>               1000 |          1000
>
>     So it's still counted as "WARM" - does it make sense?
>
>
> No, it does not. The code currently just marks any update as a WARM
> update if the table supports it and there is enough free space in the
> page. And yes, you're right. It's worth fixing that because of one-WARM
> update per chain limitation. Will fix.
>

Hmmm, so this makes monitoring of %WARM during benchmarks less reliable 
than I hoped for :-(

>
>     The way this is piggy-backed on the current HOT statistics seems a
>     bit strange for another reason,
>
>
> Agree. We could add a similar n_tup_warm_upd counter.
>

Yes, although HOT is a special case of WARM. But it probably makes sense 
to differentiate them, I guess.

>
>     But WARM changes that - it allows adding index entries only to a
>     subset of indexes, which means the "per row" n_tup_hot_upd counter
>     is not sufficient. When you have a table with 10 indexes, and the
>     counter increases by 1, does that mean the update added index tuple
>     to 1 index or 9 of them?
>
>
> How about having counters similar to n_tup_ins/n_tup_del for indexes
> as well? Today it does not make sense because every index gets the
> same number of inserts, but WARM will change that.
>
> For example, we could have idx_tup_insert and idx_tup_delete that shows
> up in pg_stat_user_indexes. I don't know if idx_tup_delete adds any
> value, but one can then look at idx_tup_insert for various indexes to
> get a sense which indexes receives more inserts than others. The indexes
> which receive more inserts are the ones being frequently updated as
> compared to other indexes.
>

Hmmm, I'm not sure that'll work. I mean, those metrics would be useful 
(although I can't think of a use case for idx_tup_delete), but I'm not 
sure it's a enough to measure WARM. We need to compute
    index_tuples_inserted / index_tuples_total

where (index_tuples_total - index_tuples_inserted) is the number of 
index tuples we've been able to skip thanks to WARM. So we'd also need 
to track the number of index tuples that we skipped for the index, and 
I'm not sure that's a good idea.

Also, we really don't care about inserted tuples - what matters for WARM 
are updates, so idx_tup_insert is either useless (because it also 
includes non-UPDATE entries) or the naming is misleading.

> This also relates to vacuuming strategies. Today HOT updates do not
> count for triggering vacuum (or to be more precise, HOT pruned tuples
> are discounted while counting dead tuples). WARM tuples get the same
> treatment as far as pruning is concerned, but since they cause fresh
> index inserts, I wonder if we need some mechanism to cleanup the dead
> line pointers and dead index entries. This will become more important if
> we do something to convert WARM chains into HOT chains, something that
> only VACUUM can do in the design I've proposed so far.
>

True.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: VACUUM's ancillary tasks
Next
From: Amit Langote
Date:
Subject: Re: Declarative partitioning - another take