Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands |
Date | |
Msg-id | 2EB66FDC-441C-4BE4-8CC2-84C0BEAB17A0@amazon.com Whole thread Raw |
In response to | Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands
|
List | pgsql-hackers |
On 8/28/17, 11:26 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote: > Here is some input for vacuum_multiple_tables_v9, about which I think > that we are getting to something committable. Here are some minor > comments. Thanks for another review. > <para> > - With no parameter, <command>VACUUM</command> processes every table in the > + With no parameters, <command>VACUUM</command> processes every table in the > current database that the current user has permission to vacuum. > - With a parameter, <command>VACUUM</command> processes only that table. > + With parameters, <command>VACUUM</command> processes only the tables > + specified. > </para> > The part about parameters looks fine to me if unchanged. This is reverted in v10. > + foreach(lc, relations) > + { > + VacuumRelation *relation = lfirst_node(VacuumRelation, lc); > + if (relation->va_cols != NIL && (options & VACOPT_ANALYZE) == 0) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("ANALYZE option must be specified when a > column list is provided"))); > + } > Could you add a hint with the relation name involved here? When many > relations are defined in the VACUUM query this would be useful for the > user. Added in v10. > + relinfo = makeNode(VacuumRelation); > + rel_oid = HeapTupleGetOid(tuple); > + relinfo->oid = rel_oid; > There are 4 patterns like that in the patch. We could make use of a > makeVacuumRelation. Agreed, I've added one. > About the de-duplication patch, I have to admit that I am still not a > fan of doing such a thing. Another road that we could take is to > simply complain with a proper error message if: > - the same column name is specified twice for a relation. > - the same relation is defined twice. In the case of partitions, we > could track the fact that it is already listed as part of a parent, > though perhaps it does not seem worth the extra CPU cost especially > when there are multiple nesting levels with partitions. > Autovacuum has also the advantage, if I recall correctly, to select > all columns for analyze, and skip parent partitions when scanning for > relations so that's a safe bet from this side. Opinions welcome. I lean towards favoring the de-duplication patch, but maybe I am biased as the author. I can see the following advantages: 1. Ease of use. By taking care of de-duplicating on behalf of the user, they needn't worry about inheritance structures or accidentally specifying the same relation or column twice. This might be especially useful if a large number of relations or columns must be specified. 2. Resource conservation. By de-duplicating, VACUUM and ANALYZE are doing roughly the same thing but with less work. 3. The obnoxious errors you were experiencing are resolved. This seems like the strongest argument to me, as it fixes an existing issue. Disadvantages might include: 1. Users cannot schedule repeated VACUUMs on the same relation (e.g. 'VACUUM table, table, table;'). However, I cannot think of a time when I needed this, and it seems like something else is wrong with VACUUM if folks are resorting to this. In the end, you could still achieve this via several VACUUM statements. 2. Any inferred ordering for how the relations are processed will not be accurate if there are duplicates. Ultimately, users might lose some amount of control here, but I am not sure how prevalent this use case might be. In the worst case, you could achieve this via several individual VACUUM statements as well. Your suggestion to ERROR seems like a reasonable compromise, but I could see it causing frustration in some cases, especially with partitioning. Nathan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: