Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum |
Date | |
Msg-id | CA+fd4k4Gi1yrrSbfb_8gbOLYeMOi7ZaKv1n0c9aUFn8nJo4Wng@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Thu, 9 Jan 2020 at 19:33, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > What do you think of the attached? Sawada-san, kindly verify the > > > changes and let me know your opinion. > > > > I agreed to not include both the FAST option patch and > > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus > > on the main part and we can discuss and add them later if want. > > > > I've looked at the latest version patch you shared. Overall it looks > > good and works fine. I have a few small comments: > > > > I have addressed all your comments and slightly change nearby comments > and ran pgindent. I think we can commit the first two preparatory > patches now unless you or someone else has any more comments on those. Yes. I'd like to briefly summarize the v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel for other reviewers who wants to newly starts to review this patch: Introduce PARALLEL option to VACUUM command. Parallel vacuum is enabled by default. The number of parallel workers is determined based on the number of indexes that support parallel index when user didn't specify the parallel degree or PARALLEL option is omitted. Specifying PARALLEL 0 disables parallel vacuum. In parallel vacuum of this patch, only the leader process does heap scan and collect dead tuple TIDs on the DSM segment. Before starting index vacuum or index cleanup the leader launches the parallel workers and perform it together with parallel workers. Individual index are processed by one vacuum worker process. Therefore parallel vacuum can be used when the table has at least 2 indexes (the leader always takes one index). After launched parallel workers, the leader process vacuums indexes first that don't support parallel index after launched parallel workers. The parallel workers process indexes that support parallel index vacuum and the leader process join as a worker after completing such indexes. Once all indexes are processed the parallel worker processes exit. After that, the leader process re-initializes the parallel context so that it can use the same DSM for multiple passes of index vacuum and for performing index cleanup. For updating the index statistics, we need to update the system table and since updates are not allowed during parallel mode we update the index statistics after exiting from the parallel mode. When the vacuum cost-based delay is enabled, even parallel vacuum is throttled. The basic idea of a cost-based vacuum delay for parallel index vacuuming is to allow all parallel vacuum workers including the leader process to have a shared view of cost related parameters (mainly VacuumCostBalance). We allow each worker to update it as and when it has incurred any cost and then based on that decide whether it needs to sleep. We allow the worker to sleep proportional to the work done and reduce the VacuumSharedCostBalance by the amount which is consumed by the current worker (VacuumCostBalanceLocal). This can avoid letting the workers sleep who have done less or no I/O as compared to other workers and therefore can ensure that workers who are doing more I/O got throttled more. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: