Re: [HACKERS] GUC for cleanup indexes threshold. - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [HACKERS] GUC for cleanup indexes threshold.
Date
Msg-id CAGTBQpbb8NH2XB7mnZTGS-H5tS4nOgt70=WO9uuUcdQv6-khDA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] GUC for cleanup indexes threshold.  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, Mar 19, 2018 at 8:50 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> require the bulk-delete method of scanning whole index and of logging
>>> WAL. But it leads some extra overhead. With this patch we no longer
>>> need to depend on the full scan on b-tree index. This might be useful
>>> for a future when we make the bulk-delete of b-tree index not scan
>>> whole index.
>>
>> Perhaps I'm taking something incorrectly, but is it just the
>> result of skipping 'maybe needed' scans without condiering the
>> actual necessity?
>
> I meant to scan only index pages that are relevant with garbages TIDs
> on a table. The current b-tree index bulk-deletion is very slow and
> heavy because we always scans the whole index even if there is only 1
> dead tuples in a table. To address this problem I'm thinking a way to
> make bulk-delete not scan whole index if there is a few dead tuples in
> a table. That is, we do index scans to collect the stack of btree
> pages and reclaim garbage. Maybe we will full index scan if there are
> a lot of dead tuples, which would be same as what we're doing on
> planning access paths.

In theory it's not very difficult to do. I was pondering doing some
PoC after the other vacuum patches get through.

TL;DR version is, as long as there's enough MWM to fit the keys, they
can be stashed before vacuuming the heap, and used to perform index
scans instead for index cleanup. If MWM runs out, it just goes back to
bulk-delete scans (it would imply there's a lot to clean up anyway, so
index scans wouldn't be worth it). A finer decision can be made with
random_page_cost on which technique is likely faster as well.

So yes, lets not paint ourselves into a corner where full index scans
are required for correct operation, that would make the above
impossible.


pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Problems with Error Messages wrt Domains, Checks
Next
From: Pavan Deolasee
Date:
Subject: Re: Faster inserts with mostly-monotonically increasing values