Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CA+TgmoZs8DN4u64D2pRxg6VKBa1BtBrWZcLSUU3jkQ2wUx0c9g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum
List pgsql-hackers
On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I wonder if we really want this behavior.  Should a setting that
> > controls the degree of parallelism when scanning the table also affect
> > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > table to be parallel by default, but rather something that the user
> > must explicitly request.  Happy to hear other opinions.  If we do want
> > this behavior, I think this should be written differently, something
> > like this: The PARALLEL N option to VACUUM takes precedence over this
> > option.
>
> For example, I can imagine a use case where a batch job does parallel
> vacuum to some tables in a maintenance window. The batch operation
> will need to compute and specify the degree of parallelism every time
> according to for instance the number of indexes, which would be
> troublesome. But if we can set the degree of parallelism for each
> tables it can just to do 'VACUUM (PARALLEL)'.

True, but the setting in question would also affect the behavior of
sequential scans and index scans.  TBH, I'm not sure that the
parallel_workers reloption is really a great design as it is: is
hard-coding the number of workers really what people want?  Do they
really want the same degree of parallelism for sequential scans and
index scans?  Why should they want the same degree of parallelism also
for VACUUM?  Maybe they do, and maybe somebody explain why they do,
but as of now, it's not obvious to me why that should be true.

> Since the parallel vacuum uses memory in the same manner as the single
> process vacuum it's not deteriorated. I'd agree that that patch is
> more smarter and this patch can be built on top of it but I'm
> concerned that there two proposals on that thread and the discussion
> has not been active for 8 months. I wonder if  it would be worth to
> think of improving the memory allocating based on that patch after the
> parallel vacuum get committed.

Well, I think we can't just say "oh, this patch is going to use twice
as much memory as before," which is what it looks like it's doing
right now. If you think it's not doing that, can you explain further?

> Agreed. I'll separate patches and propose it.

Cool.  Probably best to keep that on this thread.

> The main motivations are refactoring and improving readability but
> it's mainly for the previous version patch which implements parallel
> heap vacuum. It might no longer need here. I'll try to implement
> without LVState. Thank you.

Oh, OK.

> > + /*
> > + * If there is already-updated result in the shared memory we use it.
> > + * Otherwise we pass NULL to index AMs, meaning it's first time call,
> > + * and copy the result to the shared memory segment.
> > + */
> >
> > I'm probably missing something here, but isn't the intention that we
> > only do each index once?  If so, how would there be anything there
> > already?  Once from for_cleanup = false and once for for_cleanup =
> > true?
>
> We call ambulkdelete (for_cleanup = false) 0 or more times for each
> index and call amvacuumcleanup (for_cleanup = true) at the end. In the
> first time calling either ambulkdelete or amvacuumcleanup the lazy
> vacuum must pass NULL to them. They return either palloc'd
> IndexBulkDeleteResult or NULL. If they returns the former the lazy
> vacuum must pass it to them again at the next time. In current design,
> since there is no guarantee that an index is always processed by the
> same vacuum process each vacuum processes save the result to DSM in
> order to share those results among vacuum processes. The 'updated'
> flags indicates that its slot is used. So we can pass the address of
> DSM if 'updated' is true, otherwise pass NULL.

Ah, OK.  Thanks for explaining.

> Almost comments I got have been incorporated to the local branch but a
> few comments need discussion. I'll submit the updated version patch
> once I addressed all of comments.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: New vacuum option to do only freezing
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involvingsystem columns