Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CABOikdMWMS71HaN4RtRuUehZVGJ8_z_VL6GpkmbNSMfBTyFb+Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers


On Thu, Feb 2, 2017 at 6:17 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

Please see rebased patches attached. There is not much change other than the fact the patch now uses new catalog maintenance API. 


Another rebase on current master.

This time I am also attaching a proof-of-concept patch to demonstrate chain conversion. The proposed algorithm is mentioned in the README.WARM, but I'll briefly explain here.

The chain conversion works in two phases and requires another index pass during vacuum. During first heap scan, we collect candidate chains for conversion. A chain qualifies for conversion if it has all tuples with matching index keys with respect to all current indexes (i.e. chain becomes HOT). WARM chains become HOT as and when old versions retire (or new versions retire in case of aborts). But before we can mark them HOT again, we must first remove duplicate (and potentially wrong) index pointers. This algorithm deals with that.

When a WARM update occurs and we insert a new index entry in one or more indexes, we mark the new index pointer with a special RED flag. The heap tuple created by this UPDATE is also marked as RED. If the tuple is then HOT-updated, subsequent versions will be marked RED as well. IOW each WARM chain has two HOT chains inside it and these chains are identified as BLUE and RED chains. The index pointer which satisfies key in RED chain is marked RED too.

When we collect candidate WARM chains in the first heap scan, we also remember the color of the chain.

During first index scan we delete all known dead index pointers (same as lazy_tid_reaped). Also we also count number of RED and BLUE pointers to each candidate chain.

The next index scan will either 1. remove an index pointer which is known to be useless or 2. color a RED pointer BLUE.
- A BLUE pointer to a RED chain is removed when there exists a RED pointer to the chain. If there is no RED pointer, we can't remove the BLUE pointer because that is the only path to the heap tuple (case when WARM does not cause new index entry). But we instead color the heap tuples BLUE
- A BLUE pointer to a BLUE chain is always retained
- A RED pointer to a BLUE chain is always removed (aborted updates)
- A RED pointer to a RED chain is colored BLUE (we will color the heap tuples BLUE in the second heap scan)

Once the index pointers are taken care of such that there exists exactly one pointer to a chain, the chain can be converted into HOT chains by clearing WARM and RED flags.

There is one case of aborted vacuums. If a crash happens after coloring RED pointer BLUE, but before we can clear the heap tuples, we might end up with two BLUE pointers to a RED chain. This case will require recheck logic and is not yet implemented.

The POC only works with BTREEs because the unused bit in IndexTuple's t_info is already used by HASH indexes. For heap tuples, we can reuse one of HEAP_MOVED_IN/OFF bits for marking tuples RED since this is only required for WARM tuples. So the bit can be checked along with WARM bit. 

Unless there is an objection to the design or someone thinks it cannot work, I'll look at some alternate mechanism to free up more bits in tuple header or at least in the index tuples. One idea is to free up 3 bits from ip_posid knowing that OffsetNumber can never really need more than 13 bits with the other constraints in place. We could use some bit-field magic to do that with minimal changes. The thing that concerns me is whether there will be a guaranteed way to make that work on all hardwares without breaking the on-disk layout.

Comments/suggestions?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Should we cacheline align PGXACT?
Next
From: Pavan Deolasee
Date:
Subject: Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)