Re: Indirect indexes - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Indirect indexes
Date
Msg-id CANP8+jKv9QTYMG5cPW-+U1kYkbg2wf2jRKQ4dVe-JfmjxFcDbQ@mail.gmail.com
Whole thread Raw
In response to Re: Indirect indexes  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Indirect indexes
List pgsql-hackers
On 18 October 2016 at 23:46, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

> Then vacuum removes (0;1) from heap, reference to (0;1) from tbl_pk_idx.
> But how will it remove (1,1) tuple from tbl_val_indirect_idx?  Thus, before
> vacuuming tbl_val_indirect_idx we should know not only values of id which
> are being removed, but actually (id, val) pairs which are being removed.
> Should we collect those paris while scanning heap?  But we should also take
> into account that multiple heap tuples might have same (id, val) pair values
> (assuming there could be other columns being updated).  Therefore, we should
> take into account when last pair of particular (id, val) pair value was
> deleted from heap.  That would be very huge change to vacuum, may be even
> writing way more complex vacuum algorithm from scratch.  Probably, you see
> the better solution of this problem.

The best way to sum up the problem is to consider how we deal with
repeated updates to a single tuple that flip the value from A to B
then back to A then to B then A etc.. Any value in the index can point
to multiple versions of the same tuple and multiple index values can
point to the same tuple (PK value). This problem behaviour was already
known to me from Claudio's earlier analysis of WARM (thanks Claudio).

Yes, VACUUMing that is likely to be a complex issue, as you say. At
the moment I don't have a plan for that, but am not worried.

Indirect indexes produce less index entries in general than current,
so the problem is by-design much smaller than current situation.
Indirect indexes can support killed-tuple interface, so scanning the
index by users will result in natural index maintenance, further
reducing the problem.

So there will be a much reduced need for bulk maintenance. Bulk
maintainence of the index, when needed, can be performed by scanning
the whole table via the index, after the PK index has been vacuumed.
That can be optimized using an index-only scan of the PK to avoid
touching the heap, which should be effective since the VM has been so
recently refreshed. For correctness it would require the index blocks
to be locked against write while checking for removal, so bulk
collection of values to optimize the underlying index doesn't seem
useful. The index scan could also be further optimized by introducing
a visibility map for the index, which is something that would also
optimize normal index VACUUMs as well, but that is a later project and
not something for 10.x

At this stage, the discussion should be 1) can it work? 2) do we want
it? At present, I see that it can work and we just need to be careful
to measure the effectiveness of it to demonstrate that it really is a
better way of doing things in some cases. Indirect indexes are
definitely not a panacea for all problems, for me it is just another
option to add into the rich world of Postgres indexing. Getting
traction can be difficult if people don't understand the pros and cons
of the different index types, but I'm happy we have a wide spread of
knowledge now.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Jeevan Chalke
Date:
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server
Next
From: Pavan Deolasee
Date:
Subject: Re: FSM corruption leading to errors