Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id 20190319.165932.91309774.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum
Re: [HACKERS] Block level parallel vacuum
List pgsql-hackers
At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com>
> > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > almost the same. I suspect that the indexes are too-small and all
> > the index pages were on memory and CPU is saturated. Maybe you
> > had four cores and parallel workers more than the number had no
> > effect.  Other normal backends should have been able do almost
> > nothing meanwhile. Usually the number of parallel workers is
> > determined so that IO capacity is filled up but this feature
> > intermittently saturates CPU capacity very under such a
> > situation.
> >
> 
> I'm sorry I didn't make it clear enough. If the parallel degree is
> higher than 'the number of indexes - 1' redundant workers are not
> launched. So for indexes=4, 8, 16 the number of actually launched
> parallel workers is up to 3, 7, 15 respectively. That's why the result
> shows almost the same execution time in the cases where nindexes <=
> parallel_degree.

In the 16 indexes case, the performance saturated at 4 workers
which contradicts to your explanation.

> I'll share the performance test result of more larger tables and indexes.
> 
> > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > buffering few tuples) to workers and workers process it not by
> > bulkdelete, but just tuple_delete (we don't have one). That could
> > avoid the sleep time of heap-scan while index bulkdelete.
> >
> 
> Just to be clear, in parallel lazy vacuum all parallel vacuum
> processes including the leader process do index vacuuming, no one
> doesn't sleep during index vacuuming. The leader process does heap
> scan and launches parallel workers before index vacuuming. Each
> processes exclusively processes indexes one by one.

The leader doesn't continue heap-scan while index vacuuming is
running. And the index-page-scan seems eat up CPU easily. If
index vacuum can run simultaneously with the next heap scan
phase, we can make index scan finishes almost the same time with
the next round of heap scan. It would reduce the (possible) CPU
contention. But this requires as the twice size of shared
memoryas the current implement.

> Such index deletion method could be an optimization but I'm not sure
> that the calling tuple_delete many times would be faster than one
> bulkdelete. If there are many dead tuples vacuum has to call
> tuple_delete as much as dead tuples. In general one seqscan is faster
> than tons of indexscan. There is the proposal for such one by one
> index deletions[1] but it's not a replacement of bulkdelete.

I'm not sure what you mean by 'replacement' but it depends on how
large part of a table is removed at once. As mentioned in the
thread. But unfortunately it doesn't seem easy to do..

> > > Attached the updated version patches. The patches apply to the current
> > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > Node since it's under the discussion. After the direction has been
> > > decided, I'll update the patches.
> >
> > As for the to-be-or-not-to-be a node problem, I don't think it is
> > needed but from the point of consistency, it seems reasonable and
> > it is seen in other nodes that *Stmt Node holds option Node. But
> > makeVacOpt and it's usage, and subsequent operations on the node
> > look somewhat strange.. Why don't you just do
> > "makeNode(VacuumOptions)"?
> 
> Thank you for the comment but this part has gone away as the recent
> commit changed the grammar production of vacuum command.

Oops!


> > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> >
> > If I understand this correctly, nindexes is always > 1 there. At
> > lesat asserted that > 0 there.
> >
> > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> >
> > I don't think the name is good. (dt menant detach by the first look for me..)
> 
> Fixed.
> 
> >
> > >+        if (lps->nworkers_requested > 0)
> > >+            appendStringInfo(&buf,
> > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d,
requested%d)",
 
> >
> > "planned"?
> 
> The 'planned' shows how many parallel workers we planned to launch.
> The degree of parallelism is determined based on either user request
> or the number of indexes that the table has.
> 
> >
> >
> > >+        /* Get the next index to vacuum */
> > >+        if (do_parallel)
> > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > >+        else
> > >+            idx = nprocessed++;
> >
> > It seems that both of the two cases can be handled using
> > LVParallelState and most of the branches by lps or do_parallel
> > can be removed.
> >
> 
> Sorry I couldn't get your comment. You meant to move nprocessed to
> LVParallelState?

Exactly. I meant letting lvshared points to private memory, but
it might introduce confusion.


> [1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: Speed up transaction completion faster after many relations areaccessed in a transaction
Next
From: Thomas Munro
Date:
Subject: Re: DNS SRV support for LDAP authentication