Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id CAA4eK1Kv70TY7v3kUkPTs-NODXqb-Cwcme0bFcP6hkM0kq2S_w@mail.gmail.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Fri, Sep 26, 2014 at 7:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Amit Kapila wrote:
>
> > Today while again thinking about the startegy used in patch to
> > parallelize the operation (vacuum database), I think we can
> > improve the same for cases when number of connections are
> > lesser than number of tables in database (which I presume
> > will normally be the case).  Currently we are sending command
> > to vacuum one table per connection, how about sending multiple
> > commands (example Vacuum t1; Vacuum t2) on one connection.
> > It seems to me there is extra roundtrip for cases when there
> > are many small tables in database and few large tables.  Do
> > you think we should optimize for any such cases?
>
> I don't think this is a good idea; at least not in a first cut of this
> patch.  It's easy to imagine that a table you initially think is small
> enough turns out to have grown much larger since last analyze.

That could be possible, but currently it vacuum's even system tables
one by one (where I think chances of growing up would be comparatively
less) which was the main reason I thought it might be worth
to consider if the current work distribution strategy is good enough.

> In that
> case, putting one worker to process that one together with some other
> table could end up being bad for parallelism, if later it turns out that
> some other worker has no table to process.  (Table t2 in your example
> could grown between the time the command is sent and t1 is vacuumed.)
>
> It's simpler to have workers do one thing at a time only.

Yeah probably that is best at least for initial patch.

> I don't think it's a very good idea to call pg_relation_size() on every
> table in the database from vacuumdb.

You might be right, however I was bit skeptical about the current
strategy where the work unit is one object and each object is considered
same irrespective of it's size/bloat.  OTOH I agree with you that it is
good to keep the first version simpler.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Scaling shared buffer eviction
Next
From: Stephen Frost
Date:
Subject: Re: Sloppy thinking about leakproof properties of opclass co-members