Re: Indirect indexes - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: Indirect indexes
Date
Msg-id CAGTBQpZDDN1DOkNPg0J7Dw0JEpbrNMMn6gtFqQWPz4mQ2U_C-Q@mail.gmail.com
Whole thread Raw
In response to Re: Indirect indexes  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Thu, Oct 20, 2016 at 1:08 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> On Thu, Oct 20, 2016 at 9:20 PM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>>
>>
>> With indirect indexes, since you don't need to insert a tid, you can
>> just "insert on conflict do nothing" on the index.
>
>
> Would that work with non-unique indexes? Anyways, the point I was trying to
> make is that there are a similar technical challenges and we could solve it
> for WARM as well with your work for finding conflicting <key, tid> pairs and
> then not doing inserts. My thinking currently is that it will lead to other
> challenges, especially around vacuum, but I could be wrong.

Consider this:

Have a vacuum_cycle_id field in the metapage, with one bit reserved to
whether there's a vacuum in progress.

While there is a vacuum in progress on the index, all kinds of
modifications will look up the <key, pk> entry, and store the current
vacuum_cycle_id on the unused space for the tid pointer on the index
entry. When not, only new entries will be added (with the current
vacuum cycle id). So, during vacuum, indirect indexes incur a similar
cost to that of regular indexes, but only during vacuum.

When vacuuming, allocate 1/2 maintenance_work_mem for a bloom filter,
and increase all vacuum cycle ids (on the metapage) and mark a vacuum
in progress.

Scan the heap, add <key, pk> pairs of *non-dead* tuples to the bloom
filter. That's one BF per index, sadly, but bear with me.

Then scan the indexes. <key, pk> pairs *not* in the BF that have the
*old* vacuum cycle id get removed.

Clear the vacuum in progress flag on all indexes' metapage.

The only drawback here is that mwm dictates the amount of uncleanable
waste left on the indexes (BF false positives). Surely, the BF could
be replaced with an accurate set rather than an approximate one, but
that could require a lot of memory if keys are big, and a lot of scans
of the indexes. The BF trick bounds the amount of waste left while
minimizing work.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Improve output of BitmapAnd EXPLAIN ANALYZE
Next
From: Michael Paquier
Date:
Subject: Re: Renaming of pg_xlog and pg_clog