Re: Indirect indexes - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Indirect indexes |
Date | |
Msg-id | CANP8+jKv9QTYMG5cPW-+U1kYkbg2wf2jRKQ4dVe-JfmjxFcDbQ@mail.gmail.com Whole thread Raw |
In response to | Re: Indirect indexes (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: Indirect indexes
|
List | pgsql-hackers |
On 18 October 2016 at 23:46, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Then vacuum removes (0;1) from heap, reference to (0;1) from tbl_pk_idx. > But how will it remove (1,1) tuple from tbl_val_indirect_idx? Thus, before > vacuuming tbl_val_indirect_idx we should know not only values of id which > are being removed, but actually (id, val) pairs which are being removed. > Should we collect those paris while scanning heap? But we should also take > into account that multiple heap tuples might have same (id, val) pair values > (assuming there could be other columns being updated). Therefore, we should > take into account when last pair of particular (id, val) pair value was > deleted from heap. That would be very huge change to vacuum, may be even > writing way more complex vacuum algorithm from scratch. Probably, you see > the better solution of this problem. The best way to sum up the problem is to consider how we deal with repeated updates to a single tuple that flip the value from A to B then back to A then to B then A etc.. Any value in the index can point to multiple versions of the same tuple and multiple index values can point to the same tuple (PK value). This problem behaviour was already known to me from Claudio's earlier analysis of WARM (thanks Claudio). Yes, VACUUMing that is likely to be a complex issue, as you say. At the moment I don't have a plan for that, but am not worried. Indirect indexes produce less index entries in general than current, so the problem is by-design much smaller than current situation. Indirect indexes can support killed-tuple interface, so scanning the index by users will result in natural index maintenance, further reducing the problem. So there will be a much reduced need for bulk maintenance. Bulk maintainence of the index, when needed, can be performed by scanning the whole table via the index, after the PK index has been vacuumed. That can be optimized using an index-only scan of the PK to avoid touching the heap, which should be effective since the VM has been so recently refreshed. For correctness it would require the index blocks to be locked against write while checking for removal, so bulk collection of values to optimize the underlying index doesn't seem useful. The index scan could also be further optimized by introducing a visibility map for the index, which is something that would also optimize normal index VACUUMs as well, but that is a later project and not something for 10.x At this stage, the discussion should be 1) can it work? 2) do we want it? At present, I see that it can work and we just need to be careful to measure the effectiveness of it to demonstrate that it really is a better way of doing things in some cases. Indirect indexes are definitely not a panacea for all problems, for me it is just another option to add into the rich world of Postgres indexing. Getting traction can be difficult if people don't understand the pros and cons of the different index types, but I'm happy we have a wide spread of knowledge now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: