RE: Parallel heap vacuum - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Parallel heap vacuum
Date
Msg-id TYAPR01MB5692B24EF8A304FB36206772F5B12@TYAPR01MB5692.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Parallel heap vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Dear Sawada-san,

> Thank you for testing!

I tried to profile the vacuuming with the larger case (40 workers for the 20G table)
and attached FlameGraph showed the result. IIUC, I cannot find bottlenecks.

> > 2.
> > I compared parallel heap scan and found that it does not have compute_worker
> API.
> > Can you clarify the reason why there is an inconsistency?
> > (I feel it is intentional because the calculation logic seems to depend on the
> heap structure,
> > so should we add the API for table scan as well?)
> 
> There is room to consider a better API design, but yes, the reason is
> that the calculation logic depends on table AM implementation. For
> example, I thought it might make sense to consider taking the number
> of all-visible pages into account for the calculation of the number of
> parallel workers as we don't want to launch many workers on the table
> where most pages are all-visible. Which might not work for other table
> AMs.
>

Okay, thanks for confirming. I wanted to ask others as well.


> I'm updating the patch to implement parallel heap vacuum and will
> share the updated patch. It might take time as it requires to
> implement shared iteration support in radx tree.

Here are other preliminary comments for v2 patch. This does not contain
cosmetic ones. 

1.
Shared data structure PHVShared does not contain the mutex lock. Is it intentional
because they are accessed by leader only after parallel workers exit?

2.
Per my understanding, the vacuuming goes like below steps.

a. paralell workers are launched for scanning pages
b. leader waits until scans are done
c. leader does vacuum alone (you may extend here...)
d. parallel workers are launched again to cleanup indeces

If so, can we reuse parallel workers for the cleanup? Or, this is painful
engineering than the benefit?

3.
According to LaunchParallelWorkers(), the bgw_name and bgw_type are hardcoded as
"parallel worker ..." Can we extend this to improve the trackability in the
pg_stat_activity?

4.
I'm not the expert TidStore, but as you said TidStoreLockExclusive() might be a
bottleneck when tid is added to the shared TidStore. My another primitive idea
is that to prepare per-worker TidStores (in the PHVScanWorkerState or LVRelCounters?)
and gather after the heap scanning. If you extend like parallel workers do vacuuming,
the gathering may not be needed: they can access own TidStore and clean up.
One downside is that the memory consumption may be quite large.


How do you think?

Best regards,
Hayato Kuroda
FUJITSU LIMITED 

Attachment

pgsql-hackers by date:

Previous
From: Steven Niu
Date:
Subject: [Patch] remove duplicated smgrclose
Next
From: Amit Kapila
Date:
Subject: Re: long-standing data loss bug in initial sync of logical replication