Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum |
Date | |
Msg-id | CAD21AoApAnUCk9oX48t6k+7fn3w=wiWMpcDkkKBXR2khCGkwVg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum
|
List | pgsql-hackers |
On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I wonder if we really want this behavior. Should a setting that > > > controls the degree of parallelism when scanning the table also affect > > > VACUUM? I tend to think that we probably don't ever want VACUUM of a > > > table to be parallel by default, but rather something that the user > > > must explicitly request. Happy to hear other opinions. If we do want > > > this behavior, I think this should be written differently, something > > > like this: The PARALLEL N option to VACUUM takes precedence over this > > > option. > > > > For example, I can imagine a use case where a batch job does parallel > > vacuum to some tables in a maintenance window. The batch operation > > will need to compute and specify the degree of parallelism every time > > according to for instance the number of indexes, which would be > > troublesome. But if we can set the degree of parallelism for each > > tables it can just to do 'VACUUM (PARALLEL)'. > > True, but the setting in question would also affect the behavior of > sequential scans and index scans. TBH, I'm not sure that the > parallel_workers reloption is really a great design as it is: is > hard-coding the number of workers really what people want? Do they > really want the same degree of parallelism for sequential scans and > index scans? Why should they want the same degree of parallelism also > for VACUUM? Maybe they do, and maybe somebody explain why they do, > but as of now, it's not obvious to me why that should be true. I think that there are users who want to specify the degree of parallelism. I think that hard-coding the number of workers would be good design for something like VACUUM which is a simple operation for single object; since there are no joins, aggregations it'd be relatively easy to compute it. That's why the patch introduces PARALLEL N option as well. I think that a reloption for parallel vacuum would be just a way to save the degree of parallelism. And I agree that users don't want to use same degree of parallelism for VACUUM, so maybe it'd better to add new reloption like parallel_vacuum_workers. On the other hand, it can be a separate patch, I can remove the reloption part from this patch and will propose it when there are requests. > > > Since the parallel vacuum uses memory in the same manner as the single > > process vacuum it's not deteriorated. I'd agree that that patch is > > more smarter and this patch can be built on top of it but I'm > > concerned that there two proposals on that thread and the discussion > > has not been active for 8 months. I wonder if it would be worth to > > think of improving the memory allocating based on that patch after the > > parallel vacuum get committed. > > Well, I think we can't just say "oh, this patch is going to use twice > as much memory as before," which is what it looks like it's doing > right now. If you think it's not doing that, can you explain further? In the current design, the leader process allocates the whole DSM at once when starting and records dead tuple's TIDs to the DSM. This is the same behaviour as before except for it's recording dead tuples TID to the shared memory segment. Once index vacuuming finished the leader process re-initialize DSM for the next time. So parallel vacuum uses the same amount of memory as before during execution. > > > Agreed. I'll separate patches and propose it. > > Cool. Probably best to keep that on this thread. Understood. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: