Re: Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CABOikdPCbBitxZM+Ek3kXLLua13ogaxhdyY6mw7m+yUFUNus_w@mail.gmail.com
Whole thread Raw
In response to Re: Patch: Write Amplification Reduction Method (WARM)  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers


On Wed, Aug 31, 2016 at 10:38 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Wed, Aug 31, 2016 at 1:45 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
We discussed a few ideas to address the "Duplicate Scan" problem. For example, we can teach Index AMs to discard any duplicate (key, CTID) insert requests. Or we could guarantee uniqueness by either only allowing updates in one lexical order. While the former is a more complete solution to avoid duplicate entries, searching through large number of keys for non-unique indexes could be a drag on performance. The latter approach may not be sufficient for many workloads. Also tracking increment/decrement for many indexes will be non-trivial.

There is another problem with allowing many index entries pointing to the same WARM chain. It will be non-trivial to know how many index entries are currently pointing to the WARM chain and index/heap vacuum will throw up more challenges.

Instead, what I would like to propose and the patch currently implements is to restrict WARM update to once per chain. So the first non-HOT update to a tuple or a HOT chain can be a WARM update. The chain can further be HOT updated any number of times. But it can no further be WARM updated. This might look too restrictive, but it can still bring down the number of regular updates by almost 50%. Further, if we devise a strategy to convert a WARM chain back to HOT chain, it can again be WARM updated. (This part is currently not implemented). A good side effect of this simple strategy is that we know there can maximum two index entries pointing to any given WARM chain.

We should probably think about coordinating with my btree patch.

From the description above, the strategy is quite readily "upgradable" to one in which the indexam discards duplicate (key,ctid) pairs and that would remove the limitation of only one WARM update... right?

Yes, we should be able to add further optimisations on lines you're working on, but what I like about the current approach is that a) it reduces complexity of the patch and b) having thought about cleaning up WARM chains, limiting number of index entries per root chain to a small number will simplify that aspect too.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_sequence catalog
Next
From: Corey Huinker
Date:
Subject: Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)