[HACKERS] WARM and indirect indexes - Mailing list pgsql-hackers

From Alvaro Herrera
Subject [HACKERS] WARM and indirect indexes
Date
Msg-id 20170110192442.ocws4pu5wjxcf45b@alvherre.pgsql
Whole thread Raw
Responses Re: [HACKERS] WARM and indirect indexes  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Two options are on the table to attack the problem of updates causing
write amplification: WARM and indirect indexes.  They are completely
different approaches but have overlapping effects on what scenarios are
improved.  Here's a recap of both features, with the intent that we make
a well-considered decision about each.

The main effect of both features is that an updated tuple doesn't
require updating indexes that are on unmodified columns.  Indirect
indexes are a completely new server feature which may enable other
refinements later on; WARM is a targeted optimization on top of the HOT
optimization.  

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit.  With indirect indexes, the user needs to create the index as
indirect explicitely.

There are two big disadvantages to WARM (as to HOT): it cannot be
applied when the heap page is full; and concurrent long-running
transactions can spoil reclaimability of recently dead tuples in heap
pages.  There's a further disadvantage: currently, there can be only one
WARM update in an update chain.  (Pavan believes it is possible to allow
multiple ones.)  All those cases can benefit from indirect indexes.

Another interesting case is a table with a primary key and a JSON
object, on which you have a GIN index (or an int[] array, or text
search).  What happens if you modify the JSON?  With WARM, this is just
a normal index update.  With indirect indexes, you may be able to skip
inserting index entries for all the JSON elements except those which
changed.  (However, this is not implemented yet.)

- When scanning a WARM-updated block starting from an index, you may
need to do more work to walk the update chain until you find the visible
tuple.  Most of the time, HOT/WARM chains are very short thanks to HOT
pruning, so this shouldn't be a problem.

- Indirect indexes require a primary key to be present.  If the PK is
dropped, the IndIx must be dropped too.

- Indirect indexes become larger if the primary key is wide.

- Indirect indexes are not fully implemented yet (need to remove
restriction of PK value being 6 bytes; also wholesale vacuuming, though
there's no universal agreement that this is strictly necessary.)

- An indirect index takes longer to read, since it needs to descend both
the IndIx itself and the primary key index.

-- 
Álvaro Herrera



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Cache Hash Index meta page.
Next
From: Magnus Hagander
Date:
Subject: Re: [HACKERS] Replication/backup defaults