Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id 20140702181519.GF7340@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Amit Kapila <amit.kapila16@gmail.com>)
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Dilip kumar <dilip.kumar@huawei.com>)
List pgsql-hackers
Jeff Janes wrote:

> I would only envision using the parallel feature for vacuumdb after a
> pg_upgrade or some other major maintenance window (that is the only
> time I ever envision using vacuumdb at all).  I don't think autovacuum
> can be expected to handle such situations well, as it is designed to
> be a smooth background process.

That's a fair point.  One thing that would be pretty neat but I don't
think I would get anyone to implement it, is having the user control the
autovacuum launcher in some way.  For instance "please vacuum this set
of tables as quickly as possible", and it would launch as many workers
are configured.  It would take months to get a UI settled for this,
however.

> I guess the ideal solution would be for manual VACUUM to have a
> PARALLEL option, then vacuumdb could just invoke that one table at a
> time.  That way you would get within-table parallelism which would be
> important if one table dominates the entire database cluster. But I
> don't foresee that happening any time soon.

I see this as a completely different feature, which might also be pretty
neat, at least if you're open to spending more I/O bandwidth processing
a single table: have several processes scanning the heap simultaneously.
Since I think vacuum is mostly I/O bound at the moment, I'm not sure
there is much point in this currently.

> I don't know how to calibrate the number of lines that is worthwhile.
> If you write in C and need to have cross-platform compatibility and
> robust error handling, it seems to take hundreds of lines to do much
> of anything.  The code duplication is a problem, but I don't think
> just raw line count is, especially since it has already been written.

Well, there are (at least) two types of duplicate code: first you have
these common routines such as pgpipe that are duplicates for no good
reason.  Just move them to src/port or something and it's all good.  But
the OP said there is code that cannot be shared even though it's very
similar in both incarnations.  That means we cannot (or it's difficult
to) just have one copy, which means as they fix bugs in one copy we need
to update the other.  This is bad -- witness the situation with ecpg's
copy of date/time code, where there are bugs fixed in the backend
version but the ecpg version does not have the fix.  It's difficult to
keep track of these things.

> The trend in this project seems to be for shell scripts to eventually
> get converted into C programs.  In fact, src/bin/scripts now has no
> scripts at all.  Also it is important to vacuum/analyze tables in the
> same database at the same time, otherwise you will not get much
> speed-up in the ordinary case where there is only one meaningful
> database.  Doing that in a shell script would be fairly hard.  It
> should be pretty easy in Perl (at least for me--I'm sure others
> disagree), but that also doesn't seem to be the way we do things for
> programs intended for end users.

Yeah, shipping shell scripts doesn't work very well for us.  I'm
thinking perhaps we can have sample scripts in which we show how to use
parallel(1) to run multiple vacuumdb's in parallel in Unix and some
similar mechanism in Windows, and that's it.  So we wouldn't provide the
complete toolset, but the platform surely has ways to make it happen.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: 9.5 CF1
Next
From: Sawada Masahiko
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]