Re: Lossy Index Tuple Enhancement (LITE) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Lossy Index Tuple Enhancement (LITE)
Date
Msg-id 20160803232852.GA1702@momjian.us
Whole thread Raw
In response to Lossy Index Tuple Enhancement (LITE)  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Lossy Index Tuple Enhancement (LITE)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Wed, Aug  3, 2016 at 08:20:49AM +0100, Simon Riggs wrote:
> == Update Duplicate Removal ==
> 
> We want an optimization that reduces the effects of multiple UPDATEs
> on the same block that have duplicate values caused because another
> index column has been updated and a non-HOT index insert has been
> generated. This optimizes the case where someone has 12 indexes yet
> updates just one of them, so we optimize the 11 unnecessary updates,
> if possible.
> 
> As UPDATEs occur we request inserts into the index. If a lossy index
> pointer already exists that covers the new linepointer then we skip
> the index insert, optimizing writes to the index. If a lossy index
> pointer already exists for that value but doesn't yet cover our
> linepointer then we update the bitmask. If a non-lossy index pointer
> already exists we set the lossy bit and update the bitmask to reflect
> which linepointers the index entry covers. If no index pointer exists
> we insert one. The first update on a row will not be optimized, but
> the 2nd - 16th could be. These actions optimize away most of the
> writes and WAL activity to the index data block since only 1 in every
> 16 updates would cause changes to the block (actually about 1 in 10 on
> average). Most importantly, updates will cause far fewer index block
> splits and reindexing will be less needed.
> 
> The same situation can occur if we have many INSERTs with same values
> on the same block. This could lead to a reduction in size of the btree
> for indexes with many duplicate values.

Let me see if I understand this.  Right now, if no indexed values are
changed and the old and new update rows are in the same page, we don't
create a new index entry for the new row, but rather use redirect tid
pointers.  We can recycle tuple data and tid pointers for frequent
updates because there is only one index entry for the entire update
chain.

This doesn't work if we changed an indexed value.  Your solution is not
to use redirect tid/line pointers at all.  Instead, you create normal
index pointers for the indexes with changed column values.  For the
indexes without changed column values, you create this new LITE index
entry which can point to all the update-chain tuples in the same page.

In Postgres currently, we can remove tuple data when it is expired, and
tids if they are part of redirect chains (no index changes), and need
vacuum to remove non-redirect tids and old index entries.

With LITE, you can avoid the creation of duplicate-value index entries
for indexes without changed column values by using a bitmap in place of
the tid item number (16 bits).  It can't remove dead tids.

I had some ideas of how to handle the bitmap, like doing "mod 16" on the
tid item value, meaning we would have 16 equal-sized buckets.  Assuming
tuples are 100 bytes, that makes each bucket hold five rows on average.

However, I am now wondering why bother with the bitmap at all.  Can't
the LITE item pointer just point to the head of the update chain, and we
can walk the chain to find the tuple that matches our snapshot?  (The
tuple has a pointer to the next entry in the chain.)

Basically, right now with an update that changes values in some indexes
and not others, and the tuples are all on the same page, we are creating
multiple index pointers to point to different tuples on the same page. 
I am suggesting we don't do that, and instead just leave the index tuple
pointing to the head of the chain.

I think the big problem with this is that removing the tuple data will
mark also remove the rows update chain pointer.  Maybe we can modify tid
redirect pointers to handle this.

> == Clustered Index ==

I read about clustered indexes and I am not sure of how it would help in
most cases.  Are there enough index entries with identical or
sequentially-grouped values to be useful?  It kind of feels like a
single-page BRIN index.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Small fix: avoid passing null pointers to memcpy()
Next
From: Bruce Momjian
Date:
Subject: Re: Slowness of extended protocol