Re: [HACKERS] GUC for cleanup indexes threshold. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] GUC for cleanup indexes threshold.
Date
Msg-id 20180320.135719.90053076.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] GUC for cleanup indexes threshold.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] GUC for cleanup indexes threshold.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
At Mon, 19 Mar 2018 20:50:48 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoCfWXcX-po8Q1r779nyVGzs01pwpSLM=u7Sx3Hv+L+4gg@mail.gmail.com>
> On Mon, Mar 19, 2018 at 2:45 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAB8tQg9xwojupUJjKD=fMhtx6thDEPENDdhftVLWcR8A@mail.gmail.com>
> >> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
> >> <a.korotkov@postgrespro.ru> wrote:
> >> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
> >> >> <a.korotkov@postgrespro.ru> wrote:
> >> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada <sawada.mshk@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> >> >> >> <a.korotkov@postgrespro.ru> wrote:
> >> >> >> > 2) These parameters are reset during btbulkdelete() and set during
> >> >> >> > btvacuumcleanup().
> >> >> >>
> >> >> >> Can't we set these parameters even during btbulkdelete()? By keeping
> >> >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> >> >> >> even after index bulk-delete.
> >> >> >
> >> >> >
> >> >> > We certainly can update cleanup-related parameters during
> >> >> > btbulkdelete().
> >> >> > However, in this case we would update B-tree meta-page during each
> >> >> > VACUUM cycle.  That may cause some overhead for non append-only
> >> >> > workloads.  I don't think this overhead would be sensible, because in
> >> >> > non append-only scenarios VACUUM typically writes much more of
> >> >> > information.
> >> >> > But I would like this oriented to append-only workload patch to be
> >> >> > as harmless as possible for other workloads.
> >> >>
> >> >> What overhead are you referring here? I guess the overhead is only the
> >> >> calculating the oldest btpo.xact. And I think it would be harmless.
> >> >
> >> >
> >> > I meant overhead of setting last_cleanup_num_heap_tuples after every
> >> > btbulkdelete with wal-logging of meta-page.  I bet it also would be
> >> > harmless, but I think that needs some testing.
> >>
> >> Agreed.
> >>
> >> After more thought, it might be too late but we can consider the
> >> possibility of another idea proposed by Peter. Attached patch
> >> addresses the original issue of index cleanups by storing the epoch
> >> number of page deletion XID into PageHeader->pd_prune_xid which is
> >> 4byte field.
> >
> > Mmm. It seems to me that the story is returning to the
> > beginning. Could I try retelling the story?
> >
> > I understant that the initial problem was vacuum runs apparently
> > unnecessary full-scan on indexes many times. The reason for that
> > is the fact that a cleanup scan may leave some (or many under
> > certain condition) dead pages not-recycled but we don't know
> > whether a cleanup is needed or not. They will be staying left
> > forever unless we run additional cleanup-scans at the appropriate
> > timing.
> >
> > (If I understand it correctly,) Sawada-san's latest proposal is
> > (fundamentally the same to the first one,) just skipping the
> > cleanup scan if the vacuum scan just before found that the number
> > of *live* tuples are increased. If there where many deletions and
> > insertions but no increase of total number of tuples, we don't
> > have a cleanup. Consequently it had a wraparound problem and it
> > is addressed in this version.
> 
> No, it doesn't have a wraparound problem. The patch based on Peter's
> idea I proposed adds an epoch number of page deletion xid and compare
> them when we judge whether the page is recyclable or not. It's
> something like we create 64-bit xid of deletion xid. Also, if there is
> even one deletion the bulk-delete will be performed instead of the
> index cleanup. So with this patch we do the index cleanup only when
> the reltuple of table is increased by fraction of
> vacuum_index_cleanup_scale_factor from previous stats. It doesn't need
> to do the index cleanup by any xid thresholds.


Perhaps you took me wrong. I know the last patch doesn't have (or
at least intends to get rid of ) the problem, and I wrote that
the problem was introduced by your *first* patch.

> > (ditto.) Alexander proposed to record the oldest xid of
> > recyclable pages in metapage (and the number of tuples at the
> > last cleanup). This prevents needless cleanup scan and surely
> > runs cleanups to remove all recyclable pages.
> 
> Yes, but the concerns we discussed are that we need extra WAL-logging
> for updating the metapage and it works only for append-only case. If
> we also want to support more cases we will need to update the metapage
> during bulk-delete. The overhead of WAL-logging would be harmless but
> should be tested as Alexander mentioned.

Agreed.

> > I think that we can accept Sawada-san's proposal if we accept the
> > fact that indexes can retain recyclable pages for a long
> > time. (Honestly I don't think so.)
> >
> > If (as I might have mentioned as the same upthread for Yura's
> > patch,) we accept to hold the information on index meta page,
> > Alexander's way would be preferable. The difference betwen Yura's
> > and Alexander's is the former runs cleanup scan if a recyclable
> > page is present but the latter avoids that before any recyclable
> > pages are knwon to be removed.
> >
> >>               Comparing to the current proposed patch this patch
> >> doesn't need neither the page upgrade code nor extra WAL-logging. If
> >
> > # By the way, my proposal was storing the information as Yura
> > # proposed into stats collector. The information maybe be
> > # available a bit lately, but it doesn't harm. This doesn't need
> > # extra WAL logging nor the upgrad code:p
> >
> >> we also want to address cases other than append-only case we will
> >
> > I'm afraid that "the problem for the other cases" is a new one
> > that this patch introduces, not an existing one.
> 
> I meant that the current Alexandor's proposal works for append-only
> table. If we want to support other cases we have to update metapage
> during bulk-delete, which assumes that bulk-delete always scans whole
> index.

True. Currently no patches so far gets rid of the whole-cleanup-scan.

> >> require the bulk-delete method of scanning whole index and of logging
> >> WAL. But it leads some extra overhead. With this patch we no longer
> >> need to depend on the full scan on b-tree index. This might be useful
> >> for a future when we make the bulk-delete of b-tree index not scan
> >> whole index.
> >
> > Perhaps I'm taking something incorrectly, but is it just the
> > result of skipping 'maybe needed' scans without condiering the
> > actual necessity?
> 
> I meant to scan only index pages that are relevant with garbages TIDs
> on a table. The current b-tree index bulk-deletion is very slow and
> heavy because we always scans the whole index even if there is only 1
> dead tuples in a table. To address this problem I'm thinking a way to
> make bulk-delete not scan whole index if there is a few dead tuples in
> a table. That is, we do index scans to collect the stack of btree
> pages and reclaim garbage. Maybe we will full index scan if there are
> a lot of dead tuples, which would be same as what we're doing on
> planning access paths.

Yeah, that seems good!  A possible problem of that is that the
pages we want to recycle in a cleanup scan can *not* be only them
that have found to be recyclable in the vacuum-scan just
before. When we leave some recyclable pages in a cleanup scan, we
should do whole-scan at the next chance if we don't have the TID
list (or in other smaller form, or just the number of recyclable
pages?) at the time.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: ON CONFLICT DO UPDATE for partitioned tables
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping