Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id CAA4eK1KyPf0BNpTTmWhVSoQkaGgO1LrNjJHsChJFS=AaJqKdvg@mail.gmail.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Wed, Jul 2, 2014 at 11:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Jeff Janes wrote:
>
> > I would only envision using the parallel feature for vacuumdb after a
> > pg_upgrade or some other major maintenance window (that is the only
> > time I ever envision using vacuumdb at all).  I don't think autovacuum
> > can be expected to handle such situations well, as it is designed to
> > be a smooth background process.
>
> That's a fair point.  One thing that would be pretty neat but I don't
> think I would get anyone to implement it, is having the user control the
> autovacuum launcher in some way.  For instance "please vacuum this set
> of tables as quickly as possible", and it would launch as many workers
> are configured.  It would take months to get a UI settled for this,
> however.

This sounds to be a better way to have multiple workers working
on vacuuming tables.  For vacuum as we already have some sort
of infrastructure (vacuum workers) to perform tasks in parallel, why
not to leverage that instead of inventing a new one even if we assume
that we can reduce the duplicate code.

> > I don't know how to calibrate the number of lines that is worthwhile.
> > If you write in C and need to have cross-platform compatibility and
> > robust error handling, it seems to take hundreds of lines to do much
> > of anything.  The code duplication is a problem, but I don't think
> > just raw line count is, especially since it has already been written.
>
> Well, there are (at least) two types of duplicate code: first you have
> these common routines such as pgpipe that are duplicates for no good
> reason.  Just move them to src/port or something and it's all good.  But
> the OP said there is code that cannot be shared even though it's very
> similar in both incarnations.  That means we cannot (or it's difficult
> to) just have one copy, which means as they fix bugs in one copy we need
> to update the other.

I checked briefly the duplicate code among both versions and I think,
we might be able to reduce it to a significant amount by making common
functions and use AH where passed (as an example, I have checked
function ParallelBackupStart() which is more than 100 lines).  If you see
code duplication as a major point for which you don't prefer this patch,
then I think that can be ameliorated or atleast it is worth a try to do so.
However I think it might be better to achieve in a way suggested by you
using autovacuum launcher.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: [PATCH] introduce XLogLockBlockRangeForCleanup()
Next
From: Rajeev rastogi
Date:
Subject: Re: Autonomous Transaction (WIP)