Re: [HACKERS] GUC for cleanup indexes threshold. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] GUC for cleanup indexes threshold.
Date
Msg-id 20180319.145712.28636437.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] GUC for cleanup indexes threshold.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Sorry I'd like to make a trivial but critical fix.

At Mon, 19 Mar 2018 14:45:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20180319.144505.166111203.horiguchi.kyotaro@lab.ntt.co.jp>
> At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAB8tQg9xwojupUJjKD=fMhtx6thDEPENDdhftVLWcR8A@mail.gmail.com>
> > On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> > > wrote:
> > >>
> > >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
> > >> <a.korotkov@postgrespro.ru> wrote:
> > >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada <sawada.mshk@gmail.com>
> > >> > wrote:
> > >> >>
> > >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> > >> >> <a.korotkov@postgrespro.ru> wrote:
> > >> >> > 2) These parameters are reset during btbulkdelete() and set during
> > >> >> > btvacuumcleanup().
> > >> >>
> > >> >> Can't we set these parameters even during btbulkdelete()? By keeping
> > >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> > >> >> even after index bulk-delete.
> > >> >
> > >> >
> > >> > We certainly can update cleanup-related parameters during
> > >> > btbulkdelete().
> > >> > However, in this case we would update B-tree meta-page during each
> > >> > VACUUM cycle.  That may cause some overhead for non append-only
> > >> > workloads.  I don't think this overhead would be sensible, because in
> > >> > non append-only scenarios VACUUM typically writes much more of
> > >> > information.
> > >> > But I would like this oriented to append-only workload patch to be
> > >> > as harmless as possible for other workloads.
> > >>
> > >> What overhead are you referring here? I guess the overhead is only the
> > >> calculating the oldest btpo.xact. And I think it would be harmless.
> > >
> > >
> > > I meant overhead of setting last_cleanup_num_heap_tuples after every
> > > btbulkdelete with wal-logging of meta-page.  I bet it also would be
> > > harmless, but I think that needs some testing.
> > 
> > Agreed.
> > 
> > After more thought, it might be too late but we can consider the
> > possibility of another idea proposed by Peter. Attached patch
> > addresses the original issue of index cleanups by storing the epoch
> > number of page deletion XID into PageHeader->pd_prune_xid which is
> > 4byte field.
> 
> Mmm. It seems to me that the story is returning to the
> beginning. Could I try retelling the story?
> 
> I understant that the initial problem was vacuum runs apparently
> unnecessary full-scan on indexes many times. The reason for that
> is the fact that a cleanup scan may leave some (or many under
> certain condition) dead pages not-recycled but we don't know
> whether a cleanup is needed or not. They will be staying left
> forever unless we run additional cleanup-scans at the appropriate
> timing.
> 
> (If I understand it correctly,) Sawada-san's latest proposal is
> (fundamentally the same to the first one,) just skipping the
> cleanup scan if the vacuum scan just before found that the number
> of *live* tuples are increased. If there where many deletions and
> insertions but no increase of total number of tuples, we don't
> have a cleanup. Consequently it had a wraparound problem and it
> is addressed in this version.
> 
> (ditto.) Alexander proposed to record the oldest xid of
> recyclable pages in metapage (and the number of tuples at the
> last cleanup). This prevents needless cleanup scan and surely
> runs cleanups to remove all recyclable pages.
> 
> I think that we can accept Sawada-san's proposal if we accept the
> fact that indexes can retain recyclable pages for a long
> time. (Honestly I don't think so.)
> 
> If (as I might have mentioned as the same upthread for Yura's
> patch,) we accept to hold the information on index meta page,
> Alexander's way would be preferable. The difference betwen Yura's
> and Alexander's is the former runs cleanup scan if a recyclable
> page is present but the latter avoids that before any recyclable

- pages are knwon to be removed.
+ pages are knwon to be actually removable

> >               Comparing to the current proposed patch this patch
> > doesn't need neither the page upgrade code nor extra WAL-logging. If
> 
> # By the way, my proposal was storing the information as Yura
> # proposed into stats collector. The information maybe be
> # available a bit lately, but it doesn't harm. This doesn't need
> # extra WAL logging nor the upgrad code:p
> 
> > we also want to address cases other than append-only case we will
> 
> I'm afraid that "the problem for the other cases" is a new one
> that this patch introduces, not an existing one.
> 
> > require the bulk-delete method of scanning whole index and of logging
> > WAL. But it leads some extra overhead. With this patch we no longer
> > need to depend on the full scan on b-tree index. This might be useful
> > for a future when we make the bulk-delete of b-tree index not scan
> > whole index.
> 
> Perhaps I'm taking something incorrectly, but is it just the
> result of skipping 'maybe needed' scans without condiering the
> actual necessity?
> 
> I also don't like extra WAL logging, but it happens once (or
> twice?) per vaccum cycle (for every index). On the other hand I
> want to put the on-the-fly upgrade path out of the ordinary
> path. (Reviving the pg_upgrade's custom module?)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs
Next
From: Amit Langote
Date:
Subject: Re: ON CONFLICT DO UPDATE for partitioned tables