Re: Block level parallel vacuum WIP - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Block level parallel vacuum WIP |
Date | |
Msg-id | CA+TgmobV6+ZPTNE3Z+08D9Xp7UK+mSq-rztOW+=RGsr5-pKiUA@mail.gmail.com Whole thread Raw |
In response to | Block level parallel vacuum WIP (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Block level parallel vacuum WIP
(Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Block level parallel vacuum WIP (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I'd like to propose block level parallel VACUUM. > This feature makes VACUUM possible to use multiple CPU cores. Great. This is something that I have thought about, too. Andres and Heikki recommended it as a project to me a few PGCons ago. > As for PoC, I implemented parallel vacuum so that each worker > processes both 1 and 2 phases for particular block range. > Suppose we vacuum 1000 blocks table with 4 workers, each worker > processes 250 consecutive blocks in phase 1 and then reclaims dead > tuples from heap and indexes (phase 2). > To use visibility map efficiency, each worker scan particular block > range of relation and collect dead tuple locations. > After each worker finished task, the leader process gathers these > vacuum statistics information and update relfrozenxid if possible. This doesn't seem like a good design, because it adds a lot of extra index scanning work. What I think you should do is: 1. Use a parallel heap scan (heap_beginscan_parallel) to let all workers scan in parallel. Allocate a DSM segment to store the control structure for this parallel scan plus an array for the dead tuple IDs and a lock to protect the array. 2. When you finish the heap scan, or when the array of dead tuple IDs is full (or very nearly full?), perform a cycle of index vacuuming. For now, have each worker process a separate index; extra workers just wait. Perhaps use the condition variable patch that I posted previously to make the workers wait. Then resume the parallel heap scan, if not yet done. Later, we can try to see if there's a way to have multiple workers work together to vacuum a single index. But the above seems like a good place to start. > I also changed the buffer lock infrastructure so that multiple > processes can wait for cleanup lock on a buffer. You won't need this if you proceed as above, which is probably a good thing. > And the new GUC parameter vacuum_parallel_workers controls the number > of vacuum workers. I suspect that for autovacuum there is little reason to use parallel vacuum, since most of the time we are trying to slow vacuum down, not speed it up. I'd be inclined, for starters, to just add a PARALLEL option to the VACUUM command, for when people want to speed up parallel vacuums. Perhaps VACUUM (PARALLEL 4) relation; ...could mean to vacuum the relation with the given number of workers, and: VACUUM (PARALLEL) relation; ...could mean to vacuum the relation in parallel with the system choosing the number of workers - 1 worker per index is probably a good starting formula, though it might need some refinement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: