Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum |
Date | |
Msg-id | CA+TgmoZs8DN4u64D2pRxg6VKBa1BtBrWZcLSUU3jkQ2wUx0c9g@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Block level parallel vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum
|
List | pgsql-hackers |
On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I wonder if we really want this behavior. Should a setting that > > controls the degree of parallelism when scanning the table also affect > > VACUUM? I tend to think that we probably don't ever want VACUUM of a > > table to be parallel by default, but rather something that the user > > must explicitly request. Happy to hear other opinions. If we do want > > this behavior, I think this should be written differently, something > > like this: The PARALLEL N option to VACUUM takes precedence over this > > option. > > For example, I can imagine a use case where a batch job does parallel > vacuum to some tables in a maintenance window. The batch operation > will need to compute and specify the degree of parallelism every time > according to for instance the number of indexes, which would be > troublesome. But if we can set the degree of parallelism for each > tables it can just to do 'VACUUM (PARALLEL)'. True, but the setting in question would also affect the behavior of sequential scans and index scans. TBH, I'm not sure that the parallel_workers reloption is really a great design as it is: is hard-coding the number of workers really what people want? Do they really want the same degree of parallelism for sequential scans and index scans? Why should they want the same degree of parallelism also for VACUUM? Maybe they do, and maybe somebody explain why they do, but as of now, it's not obvious to me why that should be true. > Since the parallel vacuum uses memory in the same manner as the single > process vacuum it's not deteriorated. I'd agree that that patch is > more smarter and this patch can be built on top of it but I'm > concerned that there two proposals on that thread and the discussion > has not been active for 8 months. I wonder if it would be worth to > think of improving the memory allocating based on that patch after the > parallel vacuum get committed. Well, I think we can't just say "oh, this patch is going to use twice as much memory as before," which is what it looks like it's doing right now. If you think it's not doing that, can you explain further? > Agreed. I'll separate patches and propose it. Cool. Probably best to keep that on this thread. > The main motivations are refactoring and improving readability but > it's mainly for the previous version patch which implements parallel > heap vacuum. It might no longer need here. I'll try to implement > without LVState. Thank you. Oh, OK. > > + /* > > + * If there is already-updated result in the shared memory we use it. > > + * Otherwise we pass NULL to index AMs, meaning it's first time call, > > + * and copy the result to the shared memory segment. > > + */ > > > > I'm probably missing something here, but isn't the intention that we > > only do each index once? If so, how would there be anything there > > already? Once from for_cleanup = false and once for for_cleanup = > > true? > > We call ambulkdelete (for_cleanup = false) 0 or more times for each > index and call amvacuumcleanup (for_cleanup = true) at the end. In the > first time calling either ambulkdelete or amvacuumcleanup the lazy > vacuum must pass NULL to them. They return either palloc'd > IndexBulkDeleteResult or NULL. If they returns the former the lazy > vacuum must pass it to them again at the next time. In current design, > since there is no guarantee that an index is always processed by the > same vacuum process each vacuum processes save the result to DSM in > order to share those results among vacuum processes. The 'updated' > flags indicates that its slot is used. So we can pass the address of > DSM if 'updated' is true, otherwise pass NULL. Ah, OK. Thanks for explaining. > Almost comments I got have been incorporated to the local branch but a > few comments need discussion. I'll submit the updated version patch > once I addressed all of comments. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: