Re: Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CABOikdMduu9wOhfvNzqVuNW4YdBgbgwv-A=HNFCL7R5Tmbx7JA@mail.gmail.com
Whole thread Raw
In response to Re: Patch: Write Amplification Reduction Method (WARM)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Patch: Write Amplification Reduction Method (WARM)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
> On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>
> Then during recheck, we pass already compressed values to
> index_form_tuple(). But my point is, the following code will ensure that we
> don't compress it again. My reading is that the first check for
> !VARATT_IS_EXTENDED will return false if the value is already compressed.
>

You are right.  I was confused with previous check of VARATT_IS_EXTERNAL.


Ok, thanks. 
 
>
> TBH I couldn't find why the original index insertion code will always supply
> uncompressed values.
>

Just try by inserting large value of text column ('aaaaaa.....bbb')
upto 2.5K.  Then have a breakpoint in heap_prepare_insert and
index_form_tuple, and debug both the functions, you can find out that
even though we compress during insertion in heap, the index will
compress the original value again.


Ok, tried that. AFAICS index_form_tuple gets compressed values.
 

Yeah probably you are right, but I am not sure if it is good idea to
compare compressed values.


Again, I don't see a problem there.
 
I think with this new changes in btrecheck, it would appear to be much
costlier as compare to what you have few versions back.  I am afraid
that it can impact performance for cases where there are few WARM
updates in chain and many HOT updates as it will run recheck for all
such updates.

My feeling is that the recheck could be costly for very fat indexes, but not doing WARM could be costly too for such indexes. We can possibly construct a worst case where 
1. set up a table with a fat index.
2. do a WARM update to a tuple
3. then do several HOT updates to the same tuple
4. query the row via the fat index. 


Initialisation:

-- Adjust parameters to force index scans
-- enable_seqscan to false
-- seq_page_cost = 10000

DROP TABLE IF EXISTS pgbench_accounts;

CREATE TABLE pgbench_accounts (
aid text,
bid bigint,
abalance bigint,
filler1 text DEFAULT md5(random()::text),
filler2 text DEFAULT md5(random()::text),
filler3 text DEFAULT md5(random()::text),
filler4 text DEFAULT md5(random()::text),
filler5 text DEFAULT md5(random()::text),
filler6 text DEFAULT md5(random()::text),
filler7 text DEFAULT md5(random()::text),
filler8 text DEFAULT md5(random()::text),
filler9 text DEFAULT md5(random()::text),
filler10 text DEFAULT md5(random()::text),
filler11 text DEFAULT md5(random()::text),
filler12 text DEFAULT md5(random()::text)
) WITH (fillfactor=90);
\set end 0
\set start (:end + 1)
\set end (:start + (:scale * 100))

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text || <2300 chars string>, (random()::bigint) % :scale, 0;

CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);

-- Force a WARM update on one row
UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' || repeat('abcdefghij', 20000);

Test:
-- Fetch the row using the fat index. Since the row contains a 
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||  <2300 chars string> ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||  <2300 chars string>;
END;

I did 4 5-minutes runs with master and WARM and there is probably a 2-3% regression. 

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of scans on the fat index)
master:
txns      idx_scan
414117 828233
411109 822217
411848 823695
408424 816847

WARM:
txns       idx_scan
404139 808277
398880 797759
399949 799897
397927 795853

==========

I then also repeated the tests, but this time using compressible values. The regression in this case is much higher, may be 15% or more. 

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text || repeat('abcdefghij', 20000), (random()::bigint) % :scale, 0;

-- Fetch the row using the fat index. Since the row contains a 
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' || repeat('abcdefghij', 20000) ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' || repeat('abcdefghij', 20000);
END;

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of scans on the fat index)
master:
txns       idx_scan
56976 113953
56822 113645
56915 113831
56865 113731

WARM:
txns      idx_scan
49044 98087
49020 98039
49007 98013
49006 98011

But TBH I believe this regression is coming from the changes to heap_tuple_attr_equals where we are decompressing both old and new values and then comparing them. For 200K bytes long values, that must be something. Another reason why I think so is because I accidentally did one run which did not use index scans and did not perform any WARM updates, but the regression was kinda similar. So that makes me think that the regression is coming from somewhere else and change in heap_tuple_attr_equals seems like a good candidate.

I think we can fix that by comparing compressed values.  I know you had raised concerns, but Robert confirmed that (IIUC) it's not a problem today. We will figure out how to deal with it if we ever add support for different compression algorithms or compression levels. And I also think this is kinda synthetic use case and the fact that there is not much regression with indexes as large as 2K bytes seems quite comforting to me.

===========

Apart from this, I also ran some benchmarks by removing index on the abalance column in my test suite so that all updates are HOT updates. I did not find any regression in that scenario. WARM was a percentage or more better, but I assume that's just noise. These benchmarks were done on scale factor 100, running for 1hr each. Headline numbers are:

WARM: 5802 txns/sec
master: 5719 txns/sec.

===========

Another workload where WARM could cause regression is where there are many indexes on a table and UPDATEs update all but one indexes. We will do WARM update in this case but since N-1 indexes will anyways get a new index entry, benefits of WARM will be marginal. There will be increased cost of AV because we will scan N-1 indexes for cleanup.

While this could be an atypical workload, its probably worth to guard against this. I propose that we stop WARM at the source if we detect that more than certain percentage of indexes will be updated by an UPDATE statement. Of course, we can be more fancy and look at each index structure and arrive at a cost model. But a simple 50% rule seems a good starting point. So if an UPDATE is going to modify more than 50% indexes, do a non-WARM update. Attached patch adds that support.

I ran tests by modifying the benchmark used for previous tests by adding abalance column to all indexes except one on aid. With the patch applied, there are zero WARM updates on the table (as expected). The headline numbers are:

master: 4101 txns/sec
WARM: 4033 txns/sec

So probably within acceptable range.

============

Finally, I tested another workload where we have total 6 indexes and 3 of them are modified by each UPDATE and 3 are not. Ran it with scale factor 100 for 1hr each. The headline numbers:

master: 3679 txns/sec (I don't see a reason why master should worse compared to 5 index update case, so probably needs more runs to check aberration)
WARM: 4050 txns/sec (not much difference from no WARM update case, but since master degenerated, probably worth doing another round.. I am using AWS instance and it's not first time I am seeing aberrations).


Thanks,
Pavan 

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Partition-wise join for join between (declaratively)partitioned tables
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Partitioned tables and relfilenode