Thread: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

[HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

From
"Bossart, Nathan"
Date:
Hi Hackers,

Currently, VACUUM commands allow you to specify one table or all of the tables in the current database to vacuum.  I’ve
recentlyfound myself wishing I could specify multiple tables in a single VACUUM statement.  For example, this would be
convenientwhen there are several large tables in a database and only a few need cleanup for XID purposes.  Is this a
featurethat the community might be interested in?
 

I’ve attached my first attempt at introducing this functionality.  In the patch, I’ve extended the table_name parameter
inthe VACUUM grammar to a qualified_name_list.  While this fits into the grammar decently well, I suspect that it may
bedesirable to be able to specify a column list for each table as well (e.g. VACUUM foo (a), bar (b)).  The attached
patchrequires that only one table be specified in order to specify a column list.  But before I spend too much more
timeworking on this, I thought I’d see how pgsql-hackers@ felt about this feature.
 

Thanks,
Nathan Bossart


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Wed, May 10, 2017 at 08:10:48PM +0000, Bossart, Nathan wrote:
> Hi Hackers,
> 
> Currently, VACUUM commands allow you to specify one table or all of
> the tables in the current database to vacuum.  I’ve recently found
> myself wishing I could specify multiple tables in a single VACUUM
> statement.  For example, this would be convenient when there are
> several large tables in a database and only a few need cleanup for
> XID purposes.  Is this a feature that the community might be
> interested in?

I haven't read the implementation yet, but I've wanted this feature
several times.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



"Bossart, Nathan" <bossartn@amazon.com> writes:
> Currently, VACUUM commands allow you to specify one table or all of the tables in the current database to vacuum.
I’verecently found myself wishing I could specify multiple tables in a single VACUUM statement.  For example, this
wouldbe convenient when there are several large tables in a database and only a few need cleanup for XID purposes.  Is
thisa feature that the community might be interested in? 

I'm a bit surprised to realize that we don't allow that, since the
underlying code certainly can do it.

You realize of course that ANALYZE should grow this capability as well.

> I’ve attached my first attempt at introducing this functionality.  In the patch, I’ve extended the table_name
parameterin the VACUUM grammar to a qualified_name_list.  While this fits into the grammar decently well, I suspect
thatit may be desirable to be able to specify a column list for each table as well (e.g. VACUUM foo (a), bar (b)). 

The column list only matters for ANALYZE (or VACUUM ANALYZE).  But yes,
it should be per-table.
        regards, tom lane



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
Great, I’ll keep working on this patch and will update this thread with a more complete implementation.

Nathan


On Thu, May 11, 2017 at 6:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
>> Currently, VACUUM commands allow you to specify one table or all of the tables in the current database to vacuum.
I’verecently found myself wishing I could specify multiple tables in a single VACUUM statement.  For example, this
wouldbe convenient when there are several large tables in a database and only a few need cleanup for XID purposes.  Is
thisa feature that the community might be interested in? 
>
> I'm a bit surprised to realize that we don't allow that, since the
> underlying code certainly can do it.
>
> You realize of course that ANALYZE should grow this capability as well.

Yup. It is just a matter of extending ExecVacuum() to handle a list of
qualified names with a quick look at the grammar as we are talking
only about manual commands. One question I am wondering though is do
we want to have everything happening in the same transaction? I would
say yes to that to simplify the code. I think that VERBOSE should also
report the per-table information, so this can be noisy with many
tables but that's more helpful than gathering all the results.

>> I’ve attached my first attempt at introducing this functionality.  In the patch, I’ve extended the table_name
parameterin the VACUUM grammar to a qualified_name_list.  While this fits into the grammar decently well, I suspect
thatit may be desirable to be able to specify a column list for each table as well (e.g. VACUUM foo (a), bar (b)). 
>
> The column list only matters for ANALYZE (or VACUUM ANALYZE).  But yes,
> it should be per-table.

The grammar allows that by the way:
=# VACUUM (full) aa (a);
VACUUM
Perhaps that's an oversight? I don't think it makes much sense.
--
Michael



Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, May 11, 2017 at 6:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The column list only matters for ANALYZE (or VACUUM ANALYZE).  But yes,
>> it should be per-table.

> The grammar allows that by the way:
> =# VACUUM (full) aa (a);
> VACUUM
> Perhaps that's an oversight? I don't think it makes much sense.

It would be hard to reject at the grammar level, and not very friendly
either because you'd only get "syntax error".  We could certainly make
the runtime code throw an error if you gave a column list without saying
ANALYZE.  But on the other hand, why bother?  I do not remember ever
seeing a question that boiled down to somebody being confused by this.
        regards, tom lane



On Thu, May 11, 2017 at 11:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 11, 2017 at 6:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Bossart, Nathan" <bossartn@amazon.com> writes:
>>> Currently, VACUUM commands allow you to specify one table or all of the tables in the current database to vacuum.
I’verecently found myself wishing I could specify multiple tables in a single VACUUM statement.  For example, this
wouldbe convenient when there are several large tables in a database and only a few need cleanup for XID purposes.  Is
thisa feature that the community might be interested in? 
>>
>> I'm a bit surprised to realize that we don't allow that, since the
>> underlying code certainly can do it.
>>
>> You realize of course that ANALYZE should grow this capability as well.
>
> Yup. It is just a matter of extending ExecVacuum() to handle a list of
> qualified names with a quick look at the grammar as we are talking
> only about manual commands. One question I am wondering though is do
> we want to have everything happening in the same transaction? I would
> say yes to that to simplify the code. I think that VERBOSE should also
> report the per-table information, so this can be noisy with many
> tables but that's more helpful than gathering all the results.

I agree to report per-table information. Especially In case of one of
tables specified failed during vacuuming, I think we should report at
least information of tables that is done successfully so far.

>
>>> I’ve attached my first attempt at introducing this functionality.  In the patch, I’ve extended the table_name
parameterin the VACUUM grammar to a qualified_name_list.  While this fits into the grammar decently well, I suspect
thatit may be desirable to be able to specify a column list for each table as well (e.g. VACUUM foo (a), bar (b)). 
>>
>> The column list only matters for ANALYZE (or VACUUM ANALYZE).  But yes,
>> it should be per-table.
>
> The grammar allows that by the way:
> =# VACUUM (full) aa (a);
> VACUUM
> Perhaps that's an oversight? I don't think it makes much sense.
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/10/17, 8:10 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
> I agree to report per-table information. Especially In case of one of
> tables specified failed during vacuuming, I think we should report at
> least information of tables that is done successfully so far.

+1

I believe you already get all per-table information when you do not specify a table name (“VACUUM VERBOSE;”), so I
wouldexpect to get this for free as long as we are building this into the existing ExecVacuum(…) and vacuum(…)
functions.

Nathan


On Thu, May 11, 2017 at 11:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It would be hard to reject at the grammar level, and not very friendly
> either because you'd only get "syntax error".  We could certainly make
> the runtime code throw an error if you gave a column list without saying
> ANALYZE.  But on the other hand, why bother?  I do not remember ever
> seeing a question that boiled down to somebody being confused by this.

The docs also say that adding a column list implies an ANALYZE even if
other keywords are added, and that's the case. Sorry for the noise.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
Hi again,

Attached is a more complete first attempt at adding this functionality.  I added two node types: one for parsing the
“relationand columns” list in the grammar, and one for holding the relation information we need for each call to
vacuum_rel(…)/analyze_rel(…). I also added assertions and comments for some undocumented assumptions that we currently
relyupon.
 

Adjustments to the documentation for VACUUM/ANALYZE and new checks in the VACUUM regression test are included in this
patchas well.
 

Looking forward to any feedback that you have.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
"Bossart, Nathan" <bossartn@amazon.com> writes:
> Looking forward to any feedback that you have.

You probably won't get much in the near term, because we're in
stabilize-the-release mode not develop-new-features mode.
Please add your patch to the pending commitfest
https://commitfest.postgresql.org/14/
so that we remember to look at it when the time comes.
        regards, tom lane



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/11/17, 6:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> You probably won't get much in the near term, because we're in
> stabilize-the-release mode not develop-new-features mode.
> Please add your patch to the pending commitfest
> https://commitfest.postgresql.org/14/
> so that we remember to look at it when the time comes.

Understood.  I’ve added it to the commitfest.

Nathan


On Fri, May 12, 2017 at 9:47 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> Attached is a more complete first attempt at adding this functionality.  I added two node types: one for parsing the
“relationand columns” list in the grammar, and one for holding the relation information we need for each call to
vacuum_rel(…)/analyze_rel(…). I also added assertions and comments for some undocumented assumptions that we currently
relyupon. 
>
> Adjustments to the documentation for VACUUM/ANALYZE and new checks in the VACUUM regression test are included in this
patchas well. 
>
> Looking forward to any feedback that you have.

Browsing the code....
<synopsis>
-ANALYZE [ VERBOSE ] [ <replaceable
class="PARAMETER">table_name</replaceable> [ ( <replaceable
class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+ANALYZE [ VERBOSE ] [ <replaceable
class="PARAMETER">table_name</replaceable> [ [ ( <replaceable
class="PARAMETER">column_name</replaceable> [, ...] ) ] [, ...] ]</synopsis>
It seems to me that you don't need those extra square brackets around
the column list. Same remark for vacuum.sgml.
    <listitem>     <para>
-      The name (possibly schema-qualified) of a specific table to
+      The name (possibly schema-qualified) of the specific tables to      analyze.  If omitted, all regular tables,
partitionedtables, and      materialized views in the current database are analyzed (but not 
-      foreign tables).  If the specified table is a partitioned table, both the
+      foreign tables).  If a specified table is a partitioned table, both the      inheritance statistics of the
partitionedtable as a whole and      statistics of the individual partitions are updated.     </para> 
Don't think that's needed. table_name is still referencing a single
table name. And similar remark for vacuum.sgml.

In short for all that, it is enough to have "[, ... ]" to document
that a list is accepted.
   /* Now go through the common routine */
-   vacuum(vacstmt->options, vacstmt->relation, InvalidOid, ¶ms,
-          vacstmt->va_cols, NULL, isTopLevel);
+   vacuum(vacstmt->options, vacstmt->rels, InvalidOid, ¶ms,
+          NULL, isTopLevel);}
It seems to me that it would have been less invasive to loop through
vacuum() for each relation. Do you foresee advantages in allowing
vacuum() to handle multiple? I am not sure if is is worth complicating
the current logic more considering that you have as well so extra
logic to carry on option values.

+ * used for error messages.  In the case that relid is valid, rels
+ * must have exactly one element corresponding to the specified relid.
s/rels/relations/ as variable name?
--
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/11/17, 7:20 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Browsing the code....

Thanks for taking a look.

> It seems to me that you don't need those extra square brackets around
> the column list. Same remark for vacuum.sgml.

I’ll remove them.  My intent was to ensure that we didn’t accidentally suggest that statements like “VACUUM , foo,
bar;”were accepted, but the extra brackets don’t seem to fix that, and I don’t foresee much confusion anyway.  
 

> In short for all that, it is enough to have "[, ... ]" to document
> that a list is accepted.

That makes sense, I’ll fix it.

> It seems to me that it would have been less invasive to loop through
> vacuum() for each relation. Do you foresee advantages in allowing
> vacuum() to handle multiple? I am not sure if is is worth complicating
> the current logic more considering that you have as well so extra
> logic to carry on option values.

That was the approach I first prototyped.  The main disadvantage that I found was that the command wouldn’t fail-fast
ifone of the tables or columns didn’t exist, and I thought that it might be frustrating to encounter such an error in
themiddle of vacuuming several large tables.  It’s easy enough to change the logic to emit a warning and simply move on
tothe next table, but that seemed like it could be easily missed among the rest of the vacuum log statements
(especiallywith the verbose option specified).  What are your thoughts on this?
 

In the spirit of simplifying things a bit, I do think it is possible to eliminate one of the new node types, since the
fieldsfor each are almost identical.
 

> + * used for error messages.  In the case that relid is valid, rels
> + * must have exactly one element corresponding to the specified relid.
> s/rels/relations/ as variable name?

Agreed, that seems nicer.

Nathan



On Fri, May 12, 2017 at 1:06 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 5/11/17, 7:20 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> It seems to me that it would have been less invasive to loop through
>> vacuum() for each relation. Do you foresee advantages in allowing
>> vacuum() to handle multiple? I am not sure if is is worth complicating
>> the current logic more considering that you have as well so extra
>> logic to carry on option values.
>
> That was the approach I first prototyped.  The main disadvantage that I found was that the command wouldn’t fail-fast
ifone of the tables or columns didn’t exist, and I thought that it might be frustrating to encounter such an error in
themiddle of vacuuming several large tables.  It’s easy enough to change the logic to emit a warning and simply move on
tothe next table, but that seemed like it could be easily missed among the rest of the vacuum log statements
(especiallywith the verbose option specified).  What are your thoughts on this? 

Hm. If multiple tables are specified and that some of them take a long
time, it could be possible that an error still happens if the
definition of one of those tables changes while VACUUM is in the
middle of running. And this makes moot the error checks that happened
at first step. So it seems to me that we definitely should have a
WARNING if multiple tables are defined anyway, and that to avoid code
duplication we may want to just do those checks once, before
processing one of the listed tables. It is true that is would be easy
to miss a WARNING in the VERBOSE logs, but issuing an ERROR would
really be frustrating in the middle of a nightly run of VACUUM.

> In the spirit of simplifying things a bit, I do think it is possible to eliminate one of the new node types, since
thefields for each are almost identical. 

Two looks too much for a code just aiming at scaling up vacuum to
handle N items. It may be possible to make things even more simple,
but I have not put much thoughts into that to be honest.
--
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
I’ve attached a v3 patch that addresses your feedback:

Changes:
  - removed extra square brackets in documentation changes
  - removed unnecessary documentation changes for parameter list
  - eliminated one of the new node types
  - renamed ‘rel’ argument to ‘relations’ in vacuum(…)
  - moved relations list to vacuum memory context in vacuum(…)
  - minor addition to VACUUM regression test
  - rebased with master

On 5/15/17, 11:00 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Hm. If multiple tables are specified and that some of them take a long
> time, it could be possible that an error still happens if the
> definition of one of those tables changes while VACUUM is in the
> middle of running. And this makes moot the error checks that happened
> at first step. So it seems to me that we definitely should have a
> WARNING if multiple tables are defined anyway, and that to avoid code
> duplication we may want to just do those checks once, before
> processing one of the listed tables. It is true that is would be easy
> to miss a WARNING in the VERBOSE logs, but issuing an ERROR would
> really be frustrating in the middle of a nightly run of VACUUM.

I think this issue already exists, as this comment in get_rel_oids(…) seems to indicate:

    /*
     * Since we don't take a lock here, the relation might be gone, or the
     * RangeVar might no longer refer to the OID we look up here.  In the
     * former case, VACUUM will do nothing; in the latter case, it will
     * process the OID we looked up here, rather than the new one. Neither
     * is ideal, but there's little practical alternative, since we're
     * going to commit this transaction and begin a new one between now
     * and then.
     */
     relid = RangeVarGetRelid(vacrel, NoLock, false);

With the patch applied, I believe this statement still holds true.  So if the relation disappears before we get to
vacuum_rel(…),we will simply skip it and move on to the next one.  The vacuum_rel(…) code provides a WARNING in many
cases(e.g. the user does not have privileges to VACUUM the table), but we seem to silently skip the table when it
disappearsbefore the call to vacuum_rel(…).  If we added a WARNING to the effect of “skipping vacuum of <table_name> —
relationno longer exists” for this case, I think what you are suggesting would be satisfied.
 

However, ANALYZE has a slight caveat.  While analyze_rel(…) silently skips the relation if it no longer exists like
vacuum_rel(…)does, we do not pre-validate the columns list at all.  So, in an ANALYZE statement with multiple tables
andcolumns specified, it’ll only fail once we get to the undefined column.  To fix this, we could add a check for the
columnlists near get_rel_oids(…) and adjust do_analyze_rel(…) to emit a WARNING and skip any columns that vanish in the
meantime.

Does this seem like a sane approach?

  1. Emit WARNING when skipping if relation disappears before we get to it.
  2. Early in vacuum(…), check that the specified columns exist.
  3. Emit WARNING and skip any specified columns that vanish before processing.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Wed, May 17, 2017 at 7:56 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I think this issue already exists, as this comment in get_rel_oids(…) seems to indicate:
>
>     /*
>      * Since we don't take a lock here, the relation might be gone, or the
>      * RangeVar might no longer refer to the OID we look up here.  In the
>      * former case, VACUUM will do nothing; in the latter case, it will
>      * process the OID we looked up here, rather than the new one. Neither
>      * is ideal, but there's little practical alternative, since we're
>      * going to commit this transaction and begin a new one between now
>      * and then.
>      */
>      relid = RangeVarGetRelid(vacrel, NoLock, false);
>
> With the patch applied, I believe this statement still holds true.  So if the relation disappears before we get to
vacuum_rel(…),we will simply skip it and move on to the next one.  The vacuum_rel(…) code provides a WARNING in many
cases(e.g. the user does not have privileges to VACUUM the table), but we seem to silently skip the table when it
disappearsbefore the call to vacuum_rel(…). 

Yes, that's the bits using try_relation_open() which returns NULL if
the relation is gone. I don't think that we want VACUUM to be noisy
about that when running it on a database.

> If we added a WARNING to the effect of “skipping vacuum of <table_name> — relation no longer exists” for this case, I
thinkwhat you are suggesting would be satisfied. 

We would do no favor by reporting nothing to the user. Without any
information, the user triggering a manual vacuum may believe that the
relation has been vacuumed as it was listed but that would not be the
case.

> However, ANALYZE has a slight caveat.  While analyze_rel(…) silently skips the relation if it no longer exists like
vacuum_rel(…)does, we do not pre-validate the columns list at all.  So, in an ANALYZE statement with multiple tables
andcolumns specified, it’ll only fail once we get to the undefined column.  To fix this, we could add a check for the
columnlists near get_rel_oids(…) and adjust do_analyze_rel(…) to emit a WARNING and skip any columns that vanish in the
meantime.
>
> Does this seem like a sane approach?
>
>   1. Emit WARNING when skipping if relation disappears before we get to it.
>   2. Early in vacuum(…), check that the specified columns exist.

And issue an ERROR, right?

>   3. Emit WARNING and skip any specified columns that vanish before processing.

This looks like a sensible approach to me.

+           RelationAndColumns *relinfo = (RelationAndColumns *)
lfirst(relation);
+           int per_table_opts = options | relinfo->options;  /*
VACOPT_ANALYZE may be set per-table */
+           ListCell *oid;
I have just noticed this bit in your patch. So you can basically do
something like that:
VACUUM (ANALYZE) foo, (FULL) bar;
Do we really want to go down to this level of control? I would keep
things a bit more restricted as users may be confused by the different
lock levels involved by each operation, and make use of the same
options for all the relations listed. Opinions from others is welcome.
--
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/16/17, 11:21 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> On Wed, May 17, 2017 at 7:56 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> I think this issue already exists, as this comment in get_rel_oids(…) seems to indicate:
>>
>>     /*
>>      * Since we don't take a lock here, the relation might be gone, or the
>>      * RangeVar might no longer refer to the OID we look up here.  In the
>>      * former case, VACUUM will do nothing; in the latter case, it will
>>      * process the OID we looked up here, rather than the new one. Neither
>>      * is ideal, but there's little practical alternative, since we're
>>      * going to commit this transaction and begin a new one between now
>>      * and then.
>>      */
>>      relid = RangeVarGetRelid(vacrel, NoLock, false);
>>
>> With the patch applied, I believe this statement still holds true.  So if the relation disappears before we get to
vacuum_rel(…),we will simply skip it and move on to the next one.  The vacuum_rel(…) code provides a WARNING in many
cases(e.g. the user does not have privileges to VACUUM the table), but we seem to silently skip the table when it
disappearsbefore the call to vacuum_rel(…).
 
>
> Yes, that's the bits using try_relation_open() which returns NULL if
> the relation is gone. I don't think that we want VACUUM to be noisy
> about that when running it on a database.

Agreed.

>> If we added a WARNING to the effect of “skipping vacuum of <table_name> — relation no longer exists” for this case,
Ithink what you are suggesting would be satisfied.
 
>
> We would do no favor by reporting nothing to the user. Without any
> information, the user triggering a manual vacuum may believe that the
> relation has been vacuumed as it was listed but that would not be the
> case.

Agreed.  Unfortunately, I think this is already possible when you specify a table to be vacuumed.

>> However, ANALYZE has a slight caveat.  While analyze_rel(…) silently skips the relation if it no longer exists like
vacuum_rel(…)does, we do not pre-validate the columns list at all.  So, in an ANALYZE statement with multiple tables
andcolumns specified, it’ll only fail once we get to the undefined column.  To fix this, we could add a check for the
columnlists near get_rel_oids(…) and adjust do_analyze_rel(…) to emit a WARNING and skip any columns that vanish in the
meantime.
>>
>> Does this seem like a sane approach?
>>
>>   1. Emit WARNING when skipping if relation disappears before we get to it.
>>   2. Early in vacuum(…), check that the specified columns exist.
>
> And issue an ERROR, right?

Correct.  This means that both the relations and columns specified would cause an ERROR if they do not exist and a
WARNINGif they disappear before we can actually process them.
 

> +           RelationAndColumns *relinfo = (RelationAndColumns *)
> lfirst(relation);
> +           int per_table_opts = options | relinfo->options;  /*
> VACOPT_ANALYZE may be set per-table */
> +           ListCell *oid;
> I have just noticed this bit in your patch. So you can basically do
> something like that:
> VACUUM (ANALYZE) foo, (FULL) bar;
> Do we really want to go down to this level of control? I would keep
> things a bit more restricted as users may be confused by the different
> lock levels involved by each operation, and make use of the same
> options for all the relations listed. Opinions from others is welcome.

I agree with you here, too.  I stopped short of allowing customers to explicitly provide per-table options, so the
exampleyou provided wouldn’t work here.  This is more applicable for something like the following:
 
       VACUUM (FREEZE, VERBOSE) foo, bar (a);

In this case, the FREEZE and VERBOSE options are used for both tables.  However, we have a column list specified for
‘bar’,and the ANALYZE option is implied when we specify a column list.  So when we process ‘bar’, we need to apply the
ANALYZEoption, but we do not need it for ‘foo’.  For now, that is all that this per-table options variable is used
for.

Nathan


Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
I’ve attached a v4 of the patch.

Changes:
 - Early in vacuum(…), emit an ERROR if any specified columns do not exist.
 - Emit a WARNING and skip any specified tables or columns that disappear before they are actually processed.
 - Small additions to the VACUUM regression test.
 - Rebased with master.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Thu, May 18, 2017 at 12:06 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I agree with you here, too.  I stopped short of allowing customers to explicitly provide per-table options, so the
exampleyou provided wouldn’t work here.  This is more applicable for something like the following: 
>
>         VACUUM (FREEZE, VERBOSE) foo, bar (a);
>
> In this case, the FREEZE and VERBOSE options are used for both tables.  However, we have a column list specified for
‘bar’,and the ANALYZE option is implied when we specify a column list.  So when we process ‘bar’, we need to apply the
ANALYZEoption, but we do not need it for ‘foo’.  For now, that is all that this per-table options variable is used for. 

Hm. One argument can be made here: having a column list defined in one
of the tables implies that ANALYZE is enforced for all the relations
listed instead of doing that only on the relations listing columns.
--
Michael



On Thu, May 18, 2017 at 7:38 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I’ve attached a v4 of the patch.
>
> Changes:
>  - Early in vacuum(…), emit an ERROR if any specified columns do not exist.
>  - Emit a WARNING and skip any specified tables or columns that disappear before they are actually processed.
>  - Small additions to the VACUUM regression test.
>  - Rebased with master.
>

Thank you for updating the patch.

Some review comments.

I got the following warning messages when building. You might want to
include parser/parse_relation.h.

vacuum.c: In function ‘check_columns_exist’:
vacuum.c:567: warning: implicit declaration of function ‘attnameAttNum’
--

=# vacuum (verbose) hoge, hoge;
INFO:  vacuuming "public.hoge"
INFO:  "hoge": found 0 removable, 0 nonremovable row versions in 0 out
of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 604
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "public.hoge"
INFO:  "hoge": found 0 removable, 0 nonremovable row versions in 0 out
of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 604
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Even if the same relation is specified more than once, we should
vacuum the relation only once. Similarly if we specify the parent
table and its child table we want to vacuum each relation only once.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, May 18, 2017 at 11:01 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 18, 2017 at 12:06 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> I agree with you here, too.  I stopped short of allowing customers to explicitly provide per-table options, so the
exampleyou provided wouldn’t work here.  This is more applicable for something like the following: 
>>
>>         VACUUM (FREEZE, VERBOSE) foo, bar (a);
>>
>> In this case, the FREEZE and VERBOSE options are used for both tables.  However, we have a column list specified for
‘bar’,and the ANALYZE option is implied when we specify a column list.  So when we process ‘bar’, we need to apply the
ANALYZEoption, but we do not need it for ‘foo’.  For now, that is all that this per-table options variable is used for. 
>
> Hm. One argument can be made here: having a column list defined in one
> of the tables implies that ANALYZE is enforced for all the relations
> listed instead of doing that only on the relations listing columns.

It seems to me that it's not good idea to forcibly set ANALYZE in
spite of  ANALYZE option is not specified. One reason is that it would
make us difficult to grep it from such as server log. I think It's
better to use the same vacuum option to the all listed relations.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, May 18, 2017 at 2:59 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> It seems to me that it's not good idea to forcibly set ANALYZE in
> spite of  ANALYZE option is not specified. One reason is that it would
> make us difficult to grep it from such as server log. I think It's
> better to use the same vacuum option to the all listed relations.

Even now, if you use VACUUM without listing ANALYZE directly, with
relation listing a set of columns, then ANALYZE is implied. I agree
with your point that the same options should be used for all the
relations, and it seems to me that if at least one relation listed has
a column list, then ANALYZE should be implied for all relations.
-- 
Michael



On Thu, May 18, 2017 at 3:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, May 18, 2017 at 2:59 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> It seems to me that it's not good idea to forcibly set ANALYZE in
>> spite of  ANALYZE option is not specified. One reason is that it would
>> make us difficult to grep it from such as server log. I think It's
>> better to use the same vacuum option to the all listed relations.
>
> Even now, if you use VACUUM without listing ANALYZE directly, with
> relation listing a set of columns, then ANALYZE is implied.

Oh.. I'd missed that behavior. Thanks!

>  I agree
> with your point that the same options should be used for all the
> relations, and it seems to me that if at least one relation listed has
> a column list, then ANALYZE should be implied for all relations.

+1

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Thu, May 18, 2017 at 2:45 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, May 18, 2017 at 3:19 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, May 18, 2017 at 2:59 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> It seems to me that it's not good idea to forcibly set ANALYZE in
>>> spite of  ANALYZE option is not specified. One reason is that it would
>>> make us difficult to grep it from such as server log. I think It's
>>> better to use the same vacuum option to the all listed relations.
>>
>> Even now, if you use VACUUM without listing ANALYZE directly, with
>> relation listing a set of columns, then ANALYZE is implied.
>
> Oh.. I'd missed that behavior. Thanks!
>
>>  I agree
>> with your point that the same options should be used for all the
>> relations, and it seems to me that if at least one relation listed has
>> a column list, then ANALYZE should be implied for all relations.
>
> +1

Ugh, really?  Are we sure that the current behavior is anything other
than a bug?  The idea that VACUUM foo (a) implies ANALYZE doesn't
really sit very well with me in the first place.  I'd be more inclined
to reject that with an ERROR complaining that the column list can't be
specified except for ANALYZE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> Ugh, really?  Are we sure that the current behavior is anything other
> than a bug?  The idea that VACUUM foo (a) implies ANALYZE doesn't
> really sit very well with me in the first place.  I'd be more inclined
> to reject that with an ERROR complaining that the column list can't be
> specified except for ANALYZE.

Yeah, that's probably more sensible.  I think the rationale was "if you
specify columns you must want the ANALYZE option, so why make you type
that in explicitly?".   But I can see the argument that it's likely to
confuse users who might have a weaker grasp of the semantics.
        regards, tom lane



On Fri, May 19, 2017 at 12:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Ugh, really?  Are we sure that the current behavior is anything other
>> than a bug?  The idea that VACUUM foo (a) implies ANALYZE doesn't
>> really sit very well with me in the first place.  I'd be more inclined
>> to reject that with an ERROR complaining that the column list can't be
>> specified except for ANALYZE.
>
> Yeah, that's probably more sensible.  I think the rationale was "if you
> specify columns you must want the ANALYZE option, so why make you type
> that in explicitly?".   But I can see the argument that it's likely to
> confuse users who might have a weaker grasp of the semantics.
>

I'd not known such VACUUM behavior so I was a bit surprised but
considering consistency with current behavior I thought that is not
bad idea. But complaining with error seems more sensible.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
Thanks all for your feedback.  I’ve attached v5 of the patch.

Changes:
 - fixed implicit declaration of ‘attnameAttNum’ in vacuum.c
 - specified relations and columns are de-duplicated
 - users are now forced to specify ANALYZE if a column list is given
 - added to VACUUM regression test
 - rebased with master

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Fri, May 19, 2017 at 12:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Ugh, really?  Are we sure that the current behavior is anything other
>> than a bug?

Point was raised already upthread by me ince, which is what lead me to
the reasoning of my last argument:
https://www.postgresql.org/message-id/31695.1494471378@sss.pgh.pa.us
And, like you, I saw that as an oversight.

> The idea that VACUUM foo (a) implies ANALYZE doesn't
>> really sit very well with me in the first place.  I'd be more inclined
>> to reject that with an ERROR complaining that the column list can't be
>> specified except for ANALYZE.
>
> Yeah, that's probably more sensible.  I think the rationale was "if you
> specify columns you must want the ANALYZE option, so why make you type
> that in explicitly?".   But I can see the argument that it's likely to
> confuse users who might have a weaker grasp of the semantics.

I am fine with an ERROR if a column list is specified without ANALYZE
listed in the options. But that should happen as well for the case
where only one relation is listed.
-- 
Michael



On Fri, May 19, 2017 at 9:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I am fine with an ERROR if a column list is specified without ANALYZE
> listed in the options. But that should happen as well for the case
> where only one relation is listed.

Perhaps this could be changed for 10? Changing the behavior in
back-branches looks sensitive to me.
-- 
Michael



Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, May 19, 2017 at 9:06 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> I am fine with an ERROR if a column list is specified without ANALYZE
>> listed in the options. But that should happen as well for the case
>> where only one relation is listed.

> Perhaps this could be changed for 10? Changing the behavior in
> back-branches looks sensitive to me.

It would make more sense to me to change it as part of the feature
addition, when/if this patch gets committed.  Otherwise, we just break
code that works today and we can't point to any solid benefit.
        regards, tom lane



On Fri, May 19, 2017 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Fri, May 19, 2017 at 9:06 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> I am fine with an ERROR if a column list is specified without ANALYZE
>>> listed in the options. But that should happen as well for the case
>>> where only one relation is listed.
>
>> Perhaps this could be changed for 10? Changing the behavior in
>> back-branches looks sensitive to me.
>
> It would make more sense to me to change it as part of the feature
> addition, when/if this patch gets committed.  Otherwise, we just break
> code that works today and we can't point to any solid benefit.

Fine for me as well. I would suggest to split the patch into two parts
to ease review then:
- Rework this error handling for one relation.
- The main patch.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/18/17, 6:12 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Fine for me as well. I would suggest to split the patch into two parts
> to ease review then:
> - Rework this error handling for one relation.
> - The main patch.

I’d be happy to do so, but I think part one would be pretty small, and almost all of the same code needs to be changed
inthe main patch anyway.  I do not foresee a huge impact on review-ability either way.  If others disagree, I can split
itup.
 

Nathan


"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 5/18/17, 6:12 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> Fine for me as well. I would suggest to split the patch into two parts
>> to ease review then:
>> - Rework this error handling for one relation.
>> - The main patch.

> I’d be happy to do so, but I think part one would be pretty small, and almost all of the same code needs to be
changedin the main patch anyway.  I do not foresee a huge impact on review-ability either way.  If others disagree, I
cansplit it up. 

Yeah, I'm dubious that that's really necessary.  If the change proves
bigger than you're anticipating, maybe it's worth a two-step approach,
but I share your feeling that it probably isn't.
        regards, tom lane



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/18/17, 8:03 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>”Bossart, Nathan" <bossartn@amazon.com> writes:
>> On 5/18/17, 6:12 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>>> Fine for me as well. I would suggest to split the patch into two parts
>>> to ease review then:
>>> - Rework this error handling for one relation.
>>> - The main patch.
>>
>> I’d be happy to do so, but I think part one would be pretty small, and almost all of the same code needs to be
changedin the main patch anyway.  I do not foresee a huge impact on review-ability either way.  If others disagree, I
cansplit it up.
 
>
>Yeah, I'm dubious that that's really necessary.  If the change proves
>bigger than you're anticipating, maybe it's worth a two-step approach,
>but I share your feeling that it probably isn’t.

Just in case it was missed among the discussion, I’d like to point out that v5 of the patch includes the “ERROR if
ANALYZEnot specified” change.
 

Nathan


On Fri, May 19, 2017 at 12:17 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> Just in case it was missed among the discussion, I’d like to point out that v5 of the patch includes the “ERROR if
ANALYZEnot specified” change. 

As long as I don't forget...

+VACUUM vactst (i);
Looking at the tests of v5, I think that you should as well add a test
that lists multiple relations with one or more relations listing a
column list for a VACUUM query, without ANALYZE specified in the
options as the parsings of VacuumStmt and AnalyzeStmt are two
different code paths, giving something like that:
VACUUM (FREEZE) rel1, rel2(col1,col2); --error
--
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 5/18/17, 8:26 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> +VACUUM vactst (i);
> Looking at the tests of v5, I think that you should as well add a test
> that lists multiple relations with one or more relations listing a
> column list for a VACUUM query, without ANALYZE specified in the
> options as the parsings of VacuumStmt and AnalyzeStmt are two
> different code paths, giving something like that:
> VACUUM (FREEZE) rel1, rel2(col1,col2); --error

Agreed, this seems like a good test case.  I’ve added it in v6 of the patch, which is attached.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
I’ve rebased this patch with master to create v7, which is attached.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Tue, Aug 15, 2017 at 8:28 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I’ve rebased this patch with master to create v7, which is attached.

Thanks for the rebased patch. I am switching into review mode actively
now, so I'll look at it soon.
--
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Tue, Aug 15, 2017 at 4:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Aug 15, 2017 at 8:28 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> I’ve rebased this patch with master to create v7, which is attached.
>
> Thanks for the rebased patch. I am switching into review mode actively
> now, so I'll look at it soon.

Another pass for this patch.

+           /*
+            * We have already checked the column list in vacuum(...),
+            * but the columns may have disappeared since then.  If
+            * this happens, emit a nice warning message and skip the
+            * undefined column.
+            */
I think that this would be reworded. "nice" is cute is this context.
Why not just saying something like:
"Do not issue an ERROR if a column is missing but use a WARNING
instead. At the beginning of the VACUUM run, the code already checked
for undefined columns and informed about an ERROR, but we want the
processing to move on for existing columns."

+   /*
+    * Now dedupe the list to avoid any redundant work (e.g. user specifies
+    * the same relation twice).  We also take care of combining any
+    * separate column lists for duplicate relations.
+    *
+    * We do this after resolving the OIDs so that we do not miss entries
+    * that have unequal RangeVars but resolve to the same set of OIDs.
+    * For example, "foo" and "public.foo" could be the same relation.
+    */
+   relations = dedupe_relations(relations);
This has been introduced in v5. If we would want to put some effort
for that, I think that it could be a separate patch and a separate
discussion. This patch does not make things worse than they are, see
HEAD for example with the same column specified twice:
=# create table aa as select generate_series(1, 10000) as a;
SELECT 10000
=# vacuum (analyze) aa (a, a);
ERROR:  23505: duplicate key value violates unique constraint
"pg_statistic_relid_att_inh_index"
DETAIL:  Key (starelid, staattnum, stainherit)=(16385, 1, f) already exists.
SCHEMA NAME:  pg_catalog
TABLE NAME:  pg_statistic
CONSTRAINT NAME:  pg_statistic_relid_att_inh_index
LOCATION:  _bt_check_unique, nbtinsert.c:434

And actually, your patch does not seem to work, and makes things worse:
=# analyze aa (a, a);
ERROR:  XX000: tuple already updated by self
LOCATION:  simple_heap_update, heapam.c:4482

+/*
+ * This is used to keep track of a relation and an optional list of
+ * column names, as may be specified in VACUUM and ANALYZE.
+ */
+typedef struct RelationAndColumns
+{
+   NodeTag      type;
+   RangeVar    *relation;  /* single table to process */
+   List        *va_cols;   /* list of column names, or NIL for all */
+   List        *oids;      /* corresponding OIDs (filled in by
[auto]vacuum.c) */
+} RelationAndColumns;
This name is a bit awkward. Say VacuumRelation? I also don't
understand why there are multiple OIDs here. There should be only one,
referring to the relation of this RangeVar. Even for inherited
relations what should be done is to add one entry RelationAndColumns
(or VacuumRelation) for each child relation.

+   /*
+    * Check that all specified columns exist so that we can fast-fail
+    * commands with multiple tables.  If the column disappears before we
+    * actually process it, we will emit a WARNING and skip it later on.
+    */
+   foreach(relation, relations)
+       check_columns_exist(lfirst(relation));
The full list could be processed in there.

+static void
+check_columns_exist(RelationAndColumns *relation)
[...]
+       Relation rel;
+       ListCell *lc;
+
+       rel = try_relation_open(lfirst_oid(oid), NoLock);
+       if (!rel)
This really meritates a comment. In short: why is it fine to not take
a lock here? The answer I think is that even if the relation does not
exist vacuum would do nothing, but this should be written out.

+   foreach(relation, relations)
+   {
+       if (((RelationAndColumns *) lfirst(relation))->va_cols != NIL &&
+           !(options & VACOPT_ANALYZE))
Saving the data in a variable makes for a better style and easier
debugging. When doing a bitwise operation, please use as well != 0 or
== 0 as you are looking here for a boolean result.

+           relinfo = makeNode(RelationAndColumns);
+           relinfo->oids = list_make1_oid(HeapTupleGetOid(tuple));
+           *vacrels = lappend(*vacrels, relinfo);
Assigning variables even for nothing is good practice for readability,
and shows the intention behind the code.

- * relid, if not InvalidOid, indicate the relation to process; otherwise,
- * the RangeVar is used.  (The latter must always be passed, because it's
- * used for error messages.)
+ * If we intend to process all relations, the 'relations' argument may be
+ * NIL.
This comment actually applies to RelationAndColumns. If the OID is
invalid, then RangeVar is used, and should always be set. You are
breaking that promise actually for autovacuum. The comment here should
say that if relations is NIL all the relations of the database are
processes, and for an ANALYZE all the columns are done.
--
Michael



On Fri, Aug 18, 2017 at 1:56 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> +           /*
> +            * We have already checked the column list in vacuum(...),
> +            * but the columns may have disappeared since then.  If
> +            * this happens, emit a nice warning message and skip the
> +            * undefined column.
> +            */
> I think that this would be reworded. "nice" is cute is this context.
> Why not just saying something like:
> "Do not issue an ERROR if a column is missing but use a WARNING
> instead. At the beginning of the VACUUM run, the code already checked
> for undefined columns and informed about an ERROR, but we want the
> processing to move on for existing columns."

Hmm, I find your (Michael's) suggestion substantially less clear than
the wording to which you are objecting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/18/17, 12:56 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Another pass for this patch.

Thanks!  I've attached v8 of the patch.

On 8/18/17, 1:49 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Fri, Aug 18, 2017 at 1:56 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> +           /*
>> +            * We have already checked the column list in vacuum(...),
>> +            * but the columns may have disappeared since then.  If
>> +            * this happens, emit a nice warning message and skip the
>> +            * undefined column.
>> +            */
>> I think that this would be reworded. "nice" is cute is this context.
>> Why not just saying something like:
>> "Do not issue an ERROR if a column is missing but use a WARNING
>> instead. At the beginning of the VACUUM run, the code already checked
>> for undefined columns and informed about an ERROR, but we want the
>> processing to move on for existing columns."
>
> Hmm, I find your (Michael's) suggestion substantially less clear than
> the wording to which you are objecting.

I'll leave this as-is for now.

> And actually, your patch does not seem to work, and makes things worse:
> =# analyze aa (a, a);
> ERROR:  XX000: tuple already updated by self
> LOCATION:  simple_heap_update, heapam.c:4482

I think the underlying issue is that dedupe_relations(...) does not
handle duplicate columns correctly.  The attached patch should fix that
issue.  I've also added some test cases to cover this.

> +/*
> + * This is used to keep track of a relation and an optional list of
> + * column names, as may be specified in VACUUM and ANALYZE.
> + */
> +typedef struct RelationAndColumns
> +{
> +   NodeTag      type;
> +   RangeVar    *relation;  /* single table to process */
> +   List        *va_cols;   /* list of column names, or NIL for all */
> +   List        *oids;      /* corresponding OIDs (filled in by
> [auto]vacuum.c) */
> +} RelationAndColumns;
> This name is a bit awkward. Say VacuumRelation? I also don't
> understand why there are multiple OIDs here. There should be only one,
> referring to the relation of this RangeVar. Even for inherited
> relations what should be done is to add one entry RelationAndColumns
> (or VacuumRelation) for each child relation.

I'll admit I struggled with naming this struct, so I'm happy to change it.

According to the docs, VACUUM and ANALYZE "do not support recursing over
inheritance hierarchies" [1].  However, we need a list of OIDs for
partitioned tables.  Namely, this piece of code in get_rel_oids(...):

        if (include_parts)
            oid_list = list_concat(oid_list,
                                   find_all_inheritors(relid, NoLock, NULL));
        else
            oid_list = lappend_oid(oid_list, relid);

Since all of these OIDs should correspond to the same partitioned table,
it makes sense to me to leave them together instead of breaking out each
partition into a VacuumRelation.  If we did so, I think we would also need
to duplicate the va_cols list for each partition.  What do you think?

> +   /*
> +    * Check that all specified columns exist so that we can fast-fail
> +    * commands with multiple tables.  If the column disappears before we
> +    * actually process it, we will emit a WARNING and skip it later on.
> +    */
> +   foreach(relation, relations)
> +       check_columns_exist(lfirst(relation));
> The full list could be processed in there.

Sure.  I've made this change.

> +static void
> +check_columns_exist(RelationAndColumns *relation)
> [...]
> +       Relation rel;
> +       ListCell *lc;
> +
> +       rel = try_relation_open(lfirst_oid(oid), NoLock);
> +       if (!rel)
> This really meritates a comment. In short: why is it fine to not take
> a lock here? The answer I think is that even if the relation does not
> exist vacuum would do nothing, but this should be written out.

Right, we are just checking the existence of columns here.  We lock it
later on with the understanding that the columns may have disappeared in
the meantime, in which case they will simply be skipped.  I've added a
comment.

> +   foreach(relation, relations)
> +   {
> +       if (((RelationAndColumns *) lfirst(relation))->va_cols != NIL &&
> +           !(options & VACOPT_ANALYZE))
> Saving the data in a variable makes for a better style and easier
> debugging. When doing a bitwise operation, please use as well != 0 or
> == 0 as you are looking here for a boolean result.
> 
> +           relinfo = makeNode(RelationAndColumns);
> +           relinfo->oids = list_make1_oid(HeapTupleGetOid(tuple));
> +           *vacrels = lappend(*vacrels, relinfo);
> Assigning variables even for nothing is good practice for readability,
> and shows the intention behind the code.

I've made these changes.

> - * relid, if not InvalidOid, indicate the relation to process; otherwise,
> - * the RangeVar is used.  (The latter must always be passed, because it's
> - * used for error messages.)
> + * If we intend to process all relations, the 'relations' argument may be
> + * NIL.
> This comment actually applies to RelationAndColumns. If the OID is
> invalid, then RangeVar is used, and should always be set. You are
> breaking that promise actually for autovacuum. The comment here should
> say that if relations is NIL all the relations of the database are
> processes, and for an ANALYZE all the columns are done.

Makes sense, I've tried to make this comment clearer.

Nathan

[1] https://www.postgresql.org/docs/current/static/ddl-inherit.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Aug 24, 2017 at 12:38 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 8/18/17, 12:56 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> According to the docs, VACUUM and ANALYZE "do not support recursing over
> inheritance hierarchies" [1].  However, we need a list of OIDs for
> partitioned tables.  Namely, this piece of code in get_rel_oids(...):
>
>                 if (include_parts)
>                         oid_list = list_concat(oid_list,
>                                                                    find_all_inheritors(relid, NoLock, NULL));
>                 else
>                         oid_list = lappend_oid(oid_list, relid);
>
> Since all of these OIDs should correspond to the same partitioned table,
> it makes sense to me to leave them together instead of breaking out each
> partition into a VacuumRelation.  If we did so, I think we would also need
> to duplicate the va_cols list for each partition.  What do you think?

Robert, Amit and other folks working on extending the existing
partitioning facility would be in better position to answer that, but
I would think that we should have something as flexible as possible,
and storing a list of relation OID in each VacuumRelation makes it
harder to track the uniqueness of relations vacuumed. I agree that the
concept of a partition with multiple parents induces a lot of
problems. But the patch as proposed worries me as it complicates
vacuum() with a double loop: one for each relation vacuumed, and one
inside it with the list of OIDs. Modules calling vacuum() could also
use flexibility, being able to analyze a custom list of columns for
each relation has value as well.

>> - * relid, if not InvalidOid, indicate the relation to process; otherwise,
>> - * the RangeVar is used.  (The latter must always be passed, because it's
>> - * used for error messages.)
>> + * If we intend to process all relations, the 'relations' argument may be
>> + * NIL.
>> This comment actually applies to RelationAndColumns. If the OID is
>> invalid, then RangeVar is used, and should always be set. You are
>> breaking that promise actually for autovacuum. The comment here should
>> say that if relations is NIL all the relations of the database are
>> processes, and for an ANALYZE all the columns are done.
>
> Makes sense, I've tried to make this comment clearer.

+ * relations is a list of VacuumRelations to process.  If it is NIL, all
+ * relations in the database are processed.
Typo here, VacuumRelation is singular.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/23/17, 11:59 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Robert, Amit and other folks working on extending the existing
> partitioning facility would be in better position to answer that, but
> I would think that we should have something as flexible as possible,
> and storing a list of relation OID in each VacuumRelation makes it
> harder to track the uniqueness of relations vacuumed. I agree that the
> concept of a partition with multiple parents induces a lot of
> problems. But the patch as proposed worries me as it complicates
> vacuum() with a double loop: one for each relation vacuumed, and one
> inside it with the list of OIDs. Modules calling vacuum() could also
> use flexibility, being able to analyze a custom list of columns for
> each relation has value as well.

I understand your concern, and I'll look into this for v9.  I think the
majority of this change will go into get_rel_oids(...).  Like you, I am
also curious to what the partitioning folks think.

> + * relations is a list of VacuumRelations to process.  If it is NIL, all
> + * relations in the database are processed.
> Typo here, VacuumRelation is singular.

I'll make this change in v9.

I should also note that the dedupe_relations(...) function needs another
small fix for column lists.  Since the lack of a column list means that we
should ANALYZE all columns, a duplicate table name with an empty column
list should effectively null out any other specified columns.  For example,
"ANALYZE table (a, b), table;" currently dedupes to the equivalent of
"ANALYZE table (a, b);" when it should dedupe to "ANALYZE table;".

Nathan


Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Aug 24, 2017 at 11:28 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I should also note that the dedupe_relations(...) function needs another
> small fix for column lists.  Since the lack of a column list means that we
> should ANALYZE all columns, a duplicate table name with an empty column
> list should effectively null out any other specified columns.  For example,
> "ANALYZE table (a, b), table;" currently dedupes to the equivalent of
> "ANALYZE table (a, b);" when it should dedupe to "ANALYZE table;".

This makes me think that it could be a good idea to revisit this bit
in a separate patch. ANALYZE fails as well now when the same column is
defined multiple times with an incomprehensible error message.
-- 
Michael



On Thu, Aug 24, 2017 at 12:59 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Robert, Amit and other folks working on extending the existing
> partitioning facility would be in better position to answer that, but
> I would think that we should have something as flexible as possible,
> and storing a list of relation OID in each VacuumRelation makes it
> harder to track the uniqueness of relations vacuumed. I agree that the
> concept of a partition with multiple parents induces a lot of
> problems. But the patch as proposed worries me as it complicates
> vacuum() with a double loop: one for each relation vacuumed, and one
> inside it with the list of OIDs. Modules calling vacuum() could also
> use flexibility, being able to analyze a custom list of columns for
> each relation has value as well.

So ... why have a double loop?  I mean, you could just expand this out
to one entry per relation actually being vacuumed, couldn't you?

What happens if you say VACUUM partitioned_table (a), some_partition (b)?

+    oldcontext = MemoryContextSwitchTo(vac_context);
+    foreach(lc, relations)
+        temp_relations = lappend(temp_relations, copyObject(lfirst(lc)));
+    MemoryContextSwitchTo(oldcontext);
+    relations = temp_relations;

Can't we just copyObject() on the whole list?

-        ListCell   *cur;
-

Why change this?  Generally, declaring a separate variable in an inner
scope seems like better style than reusing one that happens to be
lying around in the outer scope.

+            VacuumRelation *relinfo = (VacuumRelation *) lfirst(lc);

Could use lfirst_node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Sat, Aug 26, 2017 at 8:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 24, 2017 at 12:59 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Robert, Amit and other folks working on extending the existing
>> partitioning facility would be in better position to answer that, but
>> I would think that we should have something as flexible as possible,
>> and storing a list of relation OID in each VacuumRelation makes it
>> harder to track the uniqueness of relations vacuumed. I agree that the
>> concept of a partition with multiple parents induces a lot of
>> problems. But the patch as proposed worries me as it complicates
>> vacuum() with a double loop: one for each relation vacuumed, and one
>> inside it with the list of OIDs. Modules calling vacuum() could also
>> use flexibility, being able to analyze a custom list of columns for
>> each relation has value as well.
>
> So ... why have a double loop?  I mean, you could just expand this out
> to one entry per relation actually being vacuumed, couldn't you?

Yes, if I understand that correctly. That's the point I am exactly
coming at. My suggestion is to have one VacuumRelation entry per
relation vacuumed, even for partitioned tables, and copy the list of
columns to each one.

> +    oldcontext = MemoryContextSwitchTo(vac_context);
> +    foreach(lc, relations)
> +        temp_relations = lappend(temp_relations, copyObject(lfirst(lc)));
> +    MemoryContextSwitchTo(oldcontext);
> +    relations = temp_relations;
>
> Can't we just copyObject() on the whole list?

Yup.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/23/17, 11:59 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> + * relations is a list of VacuumRelations to process.  If it is NIL, all
> + * relations in the database are processed.
> Typo here, VacuumRelation is singular.

This should be fixed in v9.

On 8/24/17, 5:45 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> This makes me think that it could be a good idea to revisit this bit
> in a separate patch. ANALYZE fails as well now when the same column is
> defined multiple times with an incomprehensible error message.

The de-duplication code is now in a separate patch,
dedupe_vacuum_relations_v1.patch.  I believe it fixes the incomprehensible
error message you were experiencing, but please let me know if you are
still hitting it.

On 8/25/17, 6:00 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> +    oldcontext = MemoryContextSwitchTo(vac_context);
> +    foreach(lc, relations)
> +        temp_relations = lappend(temp_relations, copyObject(lfirst(lc)));
> +    MemoryContextSwitchTo(oldcontext);
> +    relations = temp_relations;
>
> Can't we just copyObject() on the whole list?

I've made this change.

> -        ListCell   *cur;
> -
>
> Why change this?  Generally, declaring a separate variable in an inner
> scope seems like better style than reusing one that happens to be
> lying around in the outer scope.

I've removed this change.

> +            VacuumRelation *relinfo = (VacuumRelation *) lfirst(lc);
>
> Could use lfirst_node.

Done.

On 8/28/17, 5:28 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Yes, if I understand that correctly. That's the point I am exactly
> coming at. My suggestion is to have one VacuumRelation entry per
> relation vacuumed, even for partitioned tables, and copy the list of
> columns to each one.

I've made this change in v9.  It does clean up the patch quite a bit.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Sat, Aug 26, 2017 at 8:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> What happens if you say VACUUM partitioned_table (a), some_partition (b)?

Using v9, if you do that:
=# CREATE TABLE parent (id int) PARTITION BY RANGE (id);
CREATE TABLE
=# CREATE TABLE child_1_10001 partition of parent for values from (1)
to (10001);
CREATE TABLE
=# CREATE TABLE child_10001_20001 partition of parent for values from
(10001) to (20001);
CREATE TABLE
=# insert into parent values (generate_series(1,20000));
INSERT 0 20000

Vacuuming the parent vacuums all the children, so any child listed
would get vacuumed twice, still this does not cause an error:
=# vacuum parent, child_10001_20000;
VACUUM
And with the de-duplication patch on top of it, things are vacuumed only once.

On Tue, Aug 29, 2017 at 7:56 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 8/23/17, 11:59 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> + * relations is a list of VacuumRelations to process.  If it is NIL, all
>> + * relations in the database are processed.
>> Typo here, VacuumRelation is singular.
>
> This should be fixed in v9.
>
> On 8/24/17, 5:45 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> This makes me think that it could be a good idea to revisit this bit
>> in a separate patch. ANALYZE fails as well now when the same column is
>> defined multiple times with an incomprehensible error message.
>
> The de-duplication code is now in a separate patch,
> dedupe_vacuum_relations_v1.patch.  I believe it fixes the incomprehensible
> error message you were experiencing, but please let me know if you are
> still hitting it.

It looks that problems in this area are fixed using the second patch.

> On 8/25/17, 6:00 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>> +    oldcontext = MemoryContextSwitchTo(vac_context);
>> +    foreach(lc, relations)
>> +        temp_relations = lappend(temp_relations, copyObject(lfirst(lc)));
>> +    MemoryContextSwitchTo(oldcontext);
>> +    relations = temp_relations;
>>
>> Can't we just copyObject() on the whole list?
>
> I've made this change.
>
>> -        ListCell   *cur;
>> -
>>
>> Why change this?  Generally, declaring a separate variable in an inner
>> scope seems like better style than reusing one that happens to be
>> lying around in the outer scope.
>
> I've removed this change.
>
>> +            VacuumRelation *relinfo = (VacuumRelation *) lfirst(lc);
>>
>> Could use lfirst_node.
>
> Done.
>
> On 8/28/17, 5:28 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> Yes, if I understand that correctly. That's the point I am exactly
>> coming at. My suggestion is to have one VacuumRelation entry per
>> relation vacuumed, even for partitioned tables, and copy the list of
>> columns to each one.
>
> I've made this change in v9.  It does clean up the patch quite a bit.

Here is some input for vacuum_multiple_tables_v9, about which I think
that we are getting to something committable. Here are some minor
comments.
  <para>
-   With no parameter, <command>VACUUM</command> processes every table in the
+   With no parameters, <command>VACUUM</command> processes every table in the   current database that the current user
haspermission to vacuum.
 
-   With a parameter, <command>VACUUM</command> processes only that table.
+   With parameters, <command>VACUUM</command> processes only the tables
+   specified.  </para>
The part about parameters looks fine to me if unchanged.

+   foreach(lc, relations)
+   {
+       VacuumRelation *relation = lfirst_node(VacuumRelation, lc);
+       if (relation->va_cols != NIL && (options & VACOPT_ANALYZE) == 0)
+           ereport(ERROR,
+                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                    errmsg("ANALYZE option must be specified when a
column list is provided")));
+   }
Could you add a hint with the relation name involved here? When many
relations are defined in the VACUUM query this would be useful for the
user.

+           tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+           if (!HeapTupleIsValid(tuple))
+               elog(ERROR, "cache lookup failed for relation %u", relid);
+           classForm = (Form_pg_class) GETSTRUCT(tuple);
+           include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE);
+           ReleaseSysCache(tuple);
It pains me to see that get_rel_relkind does not return an error if
the relation is missing, we could use it here. I would welcome a
refactoring with a missing_ok argument a lot! Now this patch for
VACUUM does not justify breaking potentially many extensions...

+           relinfo = makeNode(VacuumRelation);
+           rel_oid = HeapTupleGetOid(tuple);
+           relinfo->oid = rel_oid;
There are 4 patterns like that in the patch. We could make use of a
makeVacuumRelation.

About the de-duplication patch, I have to admit that I am still not a
fan of doing such a thing. Another road that we could take is to
simply complain with a proper error message if:
- the same column name is specified twice for a relation.
- the same relation is defined twice. In the case of partitions, we
could track the fact that it is already listed as part of a parent,
though perhaps it does not seem worth the extra CPU cost especially
when there are multiple nesting levels with partitions.
Autovacuum has also the advantage, if I recall correctly, to select
all columns for analyze, and skip parent partitions when scanning for
relations so that's a safe bet from this side. Opinions welcome.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/28/17, 11:26 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Here is some input for vacuum_multiple_tables_v9, about which I think
> that we are getting to something committable. Here are some minor
> comments.

Thanks for another review.

>    <para>
> -   With no parameter, <command>VACUUM</command> processes every table in the
> +   With no parameters, <command>VACUUM</command> processes every table in the
>     current database that the current user has permission to vacuum.
> -   With a parameter, <command>VACUUM</command> processes only that table.
> +   With parameters, <command>VACUUM</command> processes only the tables
> +   specified.
>    </para>
> The part about parameters looks fine to me if unchanged.

This is reverted in v10.

> +   foreach(lc, relations)
> +   {
> +       VacuumRelation *relation = lfirst_node(VacuumRelation, lc);
> +       if (relation->va_cols != NIL && (options & VACOPT_ANALYZE) == 0)
> +           ereport(ERROR,
> +                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                    errmsg("ANALYZE option must be specified when a
> column list is provided")));
> +   }
> Could you add a hint with the relation name involved here? When many
> relations are defined in the VACUUM query this would be useful for the
> user.

Added in v10.

> +           relinfo = makeNode(VacuumRelation);
> +           rel_oid = HeapTupleGetOid(tuple);
> +           relinfo->oid = rel_oid;
> There are 4 patterns like that in the patch. We could make use of a
> makeVacuumRelation.

Agreed, I've added one.

> About the de-duplication patch, I have to admit that I am still not a
> fan of doing such a thing. Another road that we could take is to
> simply complain with a proper error message if:
> - the same column name is specified twice for a relation.
> - the same relation is defined twice. In the case of partitions, we
> could track the fact that it is already listed as part of a parent,
> though perhaps it does not seem worth the extra CPU cost especially
> when there are multiple nesting levels with partitions.
> Autovacuum has also the advantage, if I recall correctly, to select
> all columns for analyze, and skip parent partitions when scanning for
> relations so that's a safe bet from this side. Opinions welcome.

I lean towards favoring the de-duplication patch, but maybe I am biased
as the author.  I can see the following advantages:

1. Ease of use.  By taking care of de-duplicating on behalf of the user,
they needn't worry about inheritance structures or accidentally
specifying the same relation or column twice.  This might be especially
useful if a large number of relations or columns must be specified.
2. Resource conservation.  By de-duplicating, VACUUM and ANALYZE are
doing roughly the same thing but with less work.
3. The obnoxious errors you were experiencing are resolved.  This seems
like the strongest argument to me, as it fixes an existing issue.

Disadvantages might include:

1. Users cannot schedule repeated VACUUMs on the same relation (e.g.
'VACUUM table, table, table;').  However, I cannot think of a time when
I needed this, and it seems like something else is wrong with VACUUM if
folks are resorting to this.  In the end, you could still achieve this
via several VACUUM statements.
2. Any inferred ordering for how the relations are processed will not
be accurate if there are duplicates.  Ultimately, users might lose some
amount of control here, but I am not sure how prevalent this use case
might be.  In the worst case, you could achieve this via several
individual VACUUM statements as well.

Your suggestion to ERROR seems like a reasonable compromise, but I
could see it causing frustration in some cases, especially with
partitioning.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Wed, Aug 30, 2017 at 1:47 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 8/28/17, 11:26 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> About the de-duplication patch, I have to admit that I am still not a
>> fan of doing such a thing. Another road that we could take is to
>> simply complain with a proper error message if:
>> - the same column name is specified twice for a relation.
>> - the same relation is defined twice. In the case of partitions, we
>> could track the fact that it is already listed as part of a parent,
>> though perhaps it does not seem worth the extra CPU cost especially
>> when there are multiple nesting levels with partitions.
>> Autovacuum has also the advantage, if I recall correctly, to select
>> all columns for analyze, and skip parent partitions when scanning for
>> relations so that's a safe bet from this side. Opinions welcome.
>
> I lean towards favoring the de-duplication patch, but maybe I am biased
> as the author.

I can be biased as reviewer then.

> I can see the following advantages:
>
> 1. Ease of use.  By taking care of de-duplicating on behalf of the user,
> they needn't worry about inheritance structures or accidentally
> specifying the same relation or column twice.  This might be especially
> useful if a large number of relations or columns must be specified.
> 2. Resource conservation.  By de-duplicating, VACUUM and ANALYZE are
> doing roughly the same thing but with less work.
> 3. The obnoxious errors you were experiencing are resolved.  This seems
> like the strongest argument to me, as it fixes an existing issue.
>
> Disadvantages might include:
>
> 1. Users cannot schedule repeated VACUUMs on the same relation (e.g.
> 'VACUUM table, table, table;').  However, I cannot think of a time when
> I needed this, and it seems like something else is wrong with VACUUM if
> folks are resorting to this.  In the end, you could still achieve this
> via several VACUUM statements.
> 2. Any inferred ordering for how the relations are processed will not
> be accurate if there are duplicates.  Ultimately, users might lose some
> amount of control here, but I am not sure how prevalent this use case
> might be.  In the worst case, you could achieve this via several
> individual VACUUM statements as well.
>
> Your suggestion to ERROR seems like a reasonable compromise, but I
> could see it causing frustration in some cases, especially with
> partitioning.

Yeah... Each approach has its cost and its advantages. It may be
better to wait for more opinions, no many people have complained yet
that for example a list of columns using twice the same one fails.

+VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable
class="PARAMETER">table_name</replaceable> ] [, ...]
I just noticed that... But regarding the docs, I think that you have
misplaced the position of "[, ...]", which should be inside the
table_name portion in the case of what I quote here, no?

+VacuumRelation *
+makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
+{
+   VacuumRelation *vacrel = makeNode(VacuumRelation);
+   vacrel->relation = relation;
+   vacrel->va_cols = va_cols;
+   vacrel->oid = oid;
+   return vacrel;
+}
Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
used for node constructions like that.

Those are minor tweaks, I'll be fine to move that as ready for
committer after for those points are addressed for the main patch.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/30/17, 5:37 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Yeah... Each approach has its cost and its advantages. It may be
> better to wait for more opinions, no many people have complained yet
> that for example a list of columns using twice the same one fails.

Sounds good to me.

> +VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable
> class="PARAMETER">table_name</replaceable> ] [, ...]
> I just noticed that... But regarding the docs, I think that you have
> misplaced the position of "[, ...]", which should be inside the
> table_name portion in the case of what I quote here, no?

I think that's what I had initially, but it was changed somewhere along
the line.  It is a little more complicated for the versions that accept
column lists.

VACUUM ... ANALYZE [ [ table_name [ (column_name [, ...] ) ] ] [, ...] ]

ISTM that we need the extra brackets here to clarify that the table and
column list combination is what can be provided in a list.  Does that
make sense?  Or do you think we can omit the outermost brackets here?

> +VacuumRelation *
> +makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
> +{
> +   VacuumRelation *vacrel = makeNode(VacuumRelation);
> +   vacrel->relation = relation;
> +   vacrel->va_cols = va_cols;
> +   vacrel->oid = oid;
> +   return vacrel;
> +}
> Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
> used for node constructions like that.

Ah, yes.  That is a much better place.  I'll make this change.

Nathan


Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"David G. Johnston"
Date:
On Wed, Aug 30, 2017 at 4:08 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
On 8/30/17, 5:37 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Yeah... Each approach has its cost and its advantages. It may be
> better to wait for more opinions, no many people have complained yet
> that for example a list of columns using twice the same one fails.

Sounds good to me.

> +VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable
> class="PARAMETER">table_name</replaceable> ] [, ...]
> I just noticed that... But regarding the docs, I think that you have
> misplaced the position of "[, ...]", which should be inside the
> table_name portion in the case of what I quote here, no?

I think that's what I had initially, but it was changed somewhere along
the line.  It is a little more complicated for the versions that accept
column lists.

VACUUM ... ANALYZE [ [ table_name [ (column_name [, ...] ) ] ] [, ...] ]

ISTM that we need the extra brackets here to clarify that the table and
column list combination is what can be provided in a list.  Does that
make sense?  Or do you think we can omit the outermost brackets here?

​Inspired by the syntax documentation for EXPLAIN:

VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]

where option can be one of:
    FULL
    FREEZE
    VERBOSE
    DISABLE_PAGE_SKIPPING

and where table_def is:
    table_name [ ( column_name [, ... ] ) ]

David J.

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Aug 31, 2017 at 8:35 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Inspired by the syntax documentation for EXPLAIN:
>
> VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]
>
> where option can be one of:
>     FULL
>     FREEZE
>     VERBOSE
>     DISABLE_PAGE_SKIPPING
>
> and where table_def is:
>     table_name [ ( column_name [, ... ] ) ]

Yes, splitting things would be nice with the column list. I need more coffee.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/30/17, 5:37 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> +VacuumRelation *
> +makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
> +{
> +   VacuumRelation *vacrel = makeNode(VacuumRelation);
> +   vacrel->relation = relation;
> +   vacrel->va_cols = va_cols;
> +   vacrel->oid = oid;
> +   return vacrel;
> +}
> Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
> used for node constructions like that.

This function is moved in v11.

On 8/30/17, 6:52 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> On Thu, Aug 31, 2017 at 8:35 AM, David G. Johnston
> <david.g.johnston@gmail.com> wrote:
>> Inspired by the syntax documentation for EXPLAIN:
>>
>> VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]
>>
>> where option can be one of:
>>     FULL
>>     FREEZE
>>     VERBOSE
>>     DISABLE_PAGE_SKIPPING
>>
>> and where table_def is:
>>     table_name [ ( column_name [, ... ] ) ]
>
> Yes, splitting things would be nice with the column list. I need more coffee.

I've made this change in v11 as well.

v2 of the de-duplication patch seems to still apply cleanly, so I haven't
made any further changes to it.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Masahiko Sawada
Date:
On Thu, Aug 31, 2017 at 10:52 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 8/30/17, 5:37 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> +VacuumRelation *
>> +makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
>> +{
>> +   VacuumRelation *vacrel = makeNode(VacuumRelation);
>> +   vacrel->relation = relation;
>> +   vacrel->va_cols = va_cols;
>> +   vacrel->oid = oid;
>> +   return vacrel;
>> +}
>> Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
>> used for node constructions like that.
>
> This function is moved in v11.
>
> On 8/30/17, 6:52 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> On Thu, Aug 31, 2017 at 8:35 AM, David G. Johnston
>> <david.g.johnston@gmail.com> wrote:
>>> Inspired by the syntax documentation for EXPLAIN:
>>>
>>> VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]
>>>
>>> where option can be one of:
>>>     FULL
>>>     FREEZE
>>>     VERBOSE
>>>     DISABLE_PAGE_SKIPPING
>>>
>>> and where table_def is:
>>>     table_name [ ( column_name [, ... ] ) ]
>>
>> Yes, splitting things would be nice with the column list. I need more coffee.
>
> I've made this change in v11 as well.
>
> v2 of the de-duplication patch seems to still apply cleanly, so I haven't
> made any further changes to it.
>

I reviewed these patches and found a issue.

autovacuum worker seems not to work fine. I got an error message;

ERROR:  unrecognized node type: 0
CONTEXT:  automatic analyze of table "postgres.public.hoge"

I think we should set T_RangeVar to rangevar.type in
autovacuum_do_vac_analyze function.

Also, there is a small typo in dedupe_vacuum_relations_v2.patch.

+                       /* if already procesed or not equal, skip */
+                       if (list_member_int(duplicates, i) ||
relation->oid != nth_rel->oid)
+                               continue;

s/procesed/processed/g

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 8/31/17, 2:24 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
> I reviewed these patches and found a issue.

Thanks for reviewing.

> autovacuum worker seems not to work fine. I got an error message;
>
> ERROR:  unrecognized node type: 0
> CONTEXT:  automatic analyze of table "postgres.public.hoge"
>
> I think we should set T_RangeVar to rangevar.type in
> autovacuum_do_vac_analyze function.

Yes, it looks like the NodeTag is not getting set on the RangeVar.
I went ahead and switched this to makeRangeVar(...) instead of
keeping it manually allocated on the stack.  Autovacuum seems to be
working as usual now.

> Also, there is a small typo in dedupe_vacuum_relations_v2.patch.
>
> +                       /* if already procesed or not equal, skip */
> +                       if (list_member_int(duplicates, i) ||
> relation->oid != nth_rel->oid)
> +                               continue;
>
> s/procesed/processed/g

This should be fixed in v3.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Fri, Sep 1, 2017 at 12:25 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 8/31/17, 2:24 AM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:
>> I reviewed these patches and found a issue.
>
> Thanks for reviewing.
>
>> autovacuum worker seems not to work fine. I got an error message;
>>
>> ERROR:  unrecognized node type: 0
>> CONTEXT:  automatic analyze of table "postgres.public.hoge"
>>
>> I think we should set T_RangeVar to rangevar.type in
>> autovacuum_do_vac_analyze function.
>
> Yes, it looks like the NodeTag is not getting set on the RangeVar.
> I went ahead and switched this to makeRangeVar(...) instead of
> keeping it manually allocated on the stack.  Autovacuum seems to be
> working as usual now.

Hm. Here is the diff between v11 and v12:static voidautovacuum_do_vac_analyze(autovac_table *tab, BufferAccessStrategy
bstrategy){
-   RangeVar    rangevar;
-   VacuumRelation *rel;
-
-   /* Set up command parameters --- use local variables instead of palloc */
-   MemSet(&rangevar, 0, sizeof(rangevar));
-
-   rangevar.schemaname = tab->at_nspname;
-   rangevar.relname = tab->at_relname;
-   rangevar.location = -1;
+   RangeVar    *rangevar;
+   VacuumRelation  *rel;
   /* Let pgstat know what we're doing */   autovac_report_activity(tab);

-   rel = makeVacuumRelation(&rangevar, NIL, tab->at_relid);
+   rangevar = makeRangeVar(tab->at_nspname, tab->at_relname, -1);
+   rel = makeVacuumRelation(rangevar, NIL, tab->at_relid);
But there is this commit in vacuum.c:* It is the caller's responsibility that all parameters are allocated
in a* memory context that will not disappear at transaction commit.
And I don't think we want to break that promise as newNode() uses
palloc0fast() which allocates data in the current memory context (see
4873c96f). I think that you had better just use NodeSetTag here and be
done with it. Also, it seems to me that this could be fixed as a
separate patch. It is definitely an incorrect pattern...

-                   $$ = (Node *)n;
+                   $$ = (Node *) n;
Spurious noise. And the coding pattern in gram.y is to not add a space
(make new code look like its surroundings as the documentation says).
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/1/17, 12:11 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Hm. Here is the diff between v11 and v12:
>  static void
>  autovacuum_do_vac_analyze(autovac_table *tab, BufferAccessStrategy bstrategy)
>  {
> -   RangeVar    rangevar;
> -   VacuumRelation *rel;
> -
> -   /* Set up command parameters --- use local variables instead of palloc */
> -   MemSet(&rangevar, 0, sizeof(rangevar));
> -
> -   rangevar.schemaname = tab->at_nspname;
> -   rangevar.relname = tab->at_relname;
> -   rangevar.location = -1;
> +   RangeVar    *rangevar;
> +   VacuumRelation  *rel;
>
>     /* Let pgstat know what we're doing */
>     autovac_report_activity(tab);
>
> -   rel = makeVacuumRelation(&rangevar, NIL, tab->at_relid);
> +   rangevar = makeRangeVar(tab->at_nspname, tab->at_relname, -1);
> +   rel = makeVacuumRelation(rangevar, NIL, tab->at_relid);
> But there is this commit in vacuum.c:
>  * It is the caller's responsibility that all parameters are allocated
> in a
>  * memory context that will not disappear at transaction commit.
> And I don't think we want to break that promise as newNode() uses
> palloc0fast() which allocates data in the current memory context (see
> 4873c96f). I think that you had better just use NodeSetTag here and be
> done with it. Also, it seems to me that this could be fixed as a
> separate patch. It is definitely an incorrect pattern...

Don't we have a similar problem with makeVacuumRelation() and list_make1()?

I went ahead and moved the RangeVar, VacuumRelation, and List into local
variables for now, but I agree that this could be improved in a separate
patch.  Perhaps these could be allocated in AutovacMemCxt.  I see from
4873c96f that autovacuum_do_vac_analyze() used to allocate the list of OIDs
in that "long-lived" memory context.

> -                   $$ = (Node *)n;
> +                   $$ = (Node *) n;
> Spurious noise. And the coding pattern in gram.y is to not add a space
> (make new code look like its surroundings as the documentation says).

I've fixed this in v13.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Sat, Sep 2, 2017 at 3:00 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> Don't we have a similar problem with makeVacuumRelation() and list_make1()?

Yeah, indeed. I forgot about this portion.

> I went ahead and moved the RangeVar, VacuumRelation, and List into local
> variables for now, but I agree that this could be improved in a separate
> patch.  Perhaps these could be allocated in AutovacMemCxt.  I see from
> 4873c96f that autovacuum_do_vac_analyze() used to allocate the list of OIDs
> in that "long-lived" memory context.

Indeed, and this has been removed in 9319fd89 by Álvaro as this API
did not need to be this complicated at this point, but now we have to.

I did not consider first that the list portion also needed to be
modified, perhaps because I am not coding that myself... So now that
it is becoming more complicated what about just using AutovacMemCxt?
This would simplify the list of VacuumRelation entries and the
RangeVar creation as well, and honestly this is ugly and there are no
other similar patterns in the backend code:
+   MemSet(&rel_list, 0, sizeof(rel_list));
+   NodeSetTag(&rel_list, T_List);
+   rel_list.length = 1;
+   rel_list.head = &lc;
+   rel_list.tail = &lc;
+
+   MemSet(&lc, 0, sizeof(lc));
+   lfirst(rel_list.head) = &rel;
This would become way more readable by using makeRangeVar() and the
new makeVacuumRelation. As this is partly my fault that we are at this
state, I am fine as well to remove this burden from you, Nathan, and
fix that myself in a new version. And I don't want to step on your
toes either :)

>> -                   $$ = (Node *)n;
>> +                   $$ = (Node *) n;
>> Spurious noise. And the coding pattern in gram.y is to not add a space
>> (make new code look like its surroundings as the documentation says).
>
> I've fixed this in v13.

Thanks.
--
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/3/17, 11:46 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> I did not consider first that the list portion also needed to be
> modified, perhaps because I am not coding that myself... So now that
> it is becoming more complicated what about just using AutovacMemCxt?
> This would simplify the list of VacuumRelation entries and the
> RangeVar creation as well, and honestly this is ugly and there are no
> other similar patterns in the backend code:

+1

> This would become way more readable by using makeRangeVar() and the
> new makeVacuumRelation. As this is partly my fault that we are at this
> state, I am fine as well to remove this burden from you, Nathan, and
> fix that myself in a new version. And I don't want to step on your
> toes either :)

No worries, I can take care of it.  I appreciate your patience with all
of these reviews.

Nathan


Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/4/17, 7:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 9/3/17, 11:46 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> I did not consider first that the list portion also needed to be
>> modified, perhaps because I am not coding that myself... So now that
>> it is becoming more complicated what about just using AutovacMemCxt?
>> This would simplify the list of VacuumRelation entries and the
>> RangeVar creation as well, and honestly this is ugly and there are no
>> other similar patterns in the backend code:
>
> +1

I've made this change in v14 of the main patch.

In case others had opinions regarding the de-duplication patch, I've
attached that again as well.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Mon, Sep 4, 2017 at 11:47 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I've made this change in v14 of the main patch.
>
> In case others had opinions regarding the de-duplication patch, I've
> attached that again as well.

+   /*
+    * Create the relation list in a long-lived memory context so that it
+    * survives transaction boundaries.
+    */
+   old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
+   rangevar = makeRangeVar(tab->at_nspname, tab->at_relname, -1);
+   rel = makeVacuumRelation(rangevar, NIL, tab->at_relid);
+   rel_list = list_make1(rel);
+   MemoryContextSwitchTo(old_cxt);
That's way better, thanks for the new patch.

So vacuum_multiple_tables_v14.patch is good for a committer in my
opinion. With this patch, if the same relation is specified multiple
times, then it gets vacuum'ed that many times. Using the same column
multi-times results in an error as on HEAD, but that's not a new
problem with this patch.

So I would tend to think that the same column specified multiple times
should cause an error, and that we could let VACUUM run work N times
on a relation if it is specified this much. This feels more natural,
at least to me, and it keeps the code simple.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Masahiko Sawada
Date:
On Tue, Sep 5, 2017 at 10:16 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Sep 4, 2017 at 11:47 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> I've made this change in v14 of the main patch.
>>
>> In case others had opinions regarding the de-duplication patch, I've
>> attached that again as well.
>
> +   /*
> +    * Create the relation list in a long-lived memory context so that it
> +    * survives transaction boundaries.
> +    */
> +   old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
> +   rangevar = makeRangeVar(tab->at_nspname, tab->at_relname, -1);
> +   rel = makeVacuumRelation(rangevar, NIL, tab->at_relid);
> +   rel_list = list_make1(rel);
> +   MemoryContextSwitchTo(old_cxt);
> That's way better, thanks for the new patch.
>
> So vacuum_multiple_tables_v14.patch is good for a committer in my
> opinion.

In get_rel_oids() we often switch the memory context to vac_context
and switch back. As a result almost code in get_rel_oids() is working
in vac_context. Maybe we can switch memory context before and after
the calling get_rel_oids?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Tue, Sep 5, 2017 at 12:05 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> In get_rel_oids() we often switch the memory context to vac_context
> and switch back. As a result almost code in get_rel_oids() is working
> in vac_context. Maybe we can switch memory context before and after
> the calling get_rel_oids?

I thought about that as well, and it seemed to me that the current
patch approach is less bug-prone for the future if get_rel_oids() gets
called in some future code paths.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Simon Riggs
Date:
On 5 September 2017 at 02:16, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Mon, Sep 4, 2017 at 11:47 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> I've made this change in v14 of the main patch.
>>
>> In case others had opinions regarding the de-duplication patch, I've
>> attached that again as well.
>
> +   /*
> +    * Create the relation list in a long-lived memory context so that it
> +    * survives transaction boundaries.
> +    */
> +   old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
> +   rangevar = makeRangeVar(tab->at_nspname, tab->at_relname, -1);
> +   rel = makeVacuumRelation(rangevar, NIL, tab->at_relid);
> +   rel_list = list_make1(rel);
> +   MemoryContextSwitchTo(old_cxt);
> That's way better, thanks for the new patch.
>
> So vacuum_multiple_tables_v14.patch is good for a committer in my
> opinion. With this patch, if the same relation is specified multiple
> times, then it gets vacuum'ed that many times. Using the same column
> multi-times results in an error as on HEAD, but that's not a new
> problem with this patch.
>
> So I would tend to think that the same column specified multiple times
> should cause an error, and that we could let VACUUM run work N times
> on a relation if it is specified this much. This feels more natural,
> at least to me, and it keeps the code simple.

ISTM there is no difference between VACUUM a, b
and VACUUM a; VACUUM b;

If we want to keep the code simple we must surely consider whether the
patch has any utility.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Masahiko Sawada
Date:
On Tue, Sep 5, 2017 at 12:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Sep 5, 2017 at 12:05 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> In get_rel_oids() we often switch the memory context to vac_context
>> and switch back. As a result almost code in get_rel_oids() is working
>> in vac_context. Maybe we can switch memory context before and after
>> the calling get_rel_oids?
>
> I thought about that as well, and it seemed to me that the current
> patch approach is less bug-prone for the future if get_rel_oids() gets
> called in some future code paths.

Okay, I agree. Also I found that dedupe_relations() eventually
allocates the list in current memory context that may not be
vac_context and set it to *relations at the end of that function. I
think we should switch the memory context to vac_context before doing
that. Or to more simplify the code maybe we can do the all treatment
of the relations list after switching to vac_context. For example,

oldcontext = MemoryContextSwtichTo(vac_context)
relations = copyObject(relations);
get_rel_oids(&relations);
check_colums_exist(relations);
dedupe_relations(&relations);
MemoryContextSwtichTo(oldcontext);

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/4/17, 10:32 PM, "Simon Riggs" <simon@2ndquadrant.com> wrote:
> ISTM there is no difference between
>   VACUUM a, b
> and
>   VACUUM a; VACUUM b;
>
> If we want to keep the code simple we must surely consider whether the
> patch has any utility.

Yes, this is true, but I think the convenience factor is a bit
understated with that example.  For example, if you need to manually
cleanup several tables for XID purposes,VACUUM FREEZE VERBOSE table1;VACUUM FREEZE VERBOSE table2;VACUUM FREEZE VERBOSE
table3;VACUUMFREEZE VERBOSE table4;VACUUM FREEZE VERBOSE table5;
 
becomesVACUUM FREEZE VERBOSE table1, table2, table3, table4, table5;

I would consider even this to be a relatively modest example compared
to the sorts of things users might do.

In addition, I'd argue that this feels like a natural extension of the
VACUUM command, one that I, like others much earlier in this thread,
was surprised to learn wasn't supported.

Nathan


"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 9/4/17, 10:32 PM, "Simon Riggs" <simon@2ndquadrant.com> wrote:
>> If we want to keep the code simple we must surely consider whether the
>> patch has any utility.

> ... I'd argue that this feels like a natural extension of the
> VACUUM command, one that I, like others much earlier in this thread,
> was surprised to learn wasn't supported.

Yeah.  To me, one big argument for allowing multiple target tables is that
we allow it for other common utility commands such as TRUNCATE or LOCK
TABLE.
        regards, tom lane



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/4/17, 8:16 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> So vacuum_multiple_tables_v14.patch is good for a committer in my
> opinion. With this patch, if the same relation is specified multiple
> times, then it gets vacuum'ed that many times. Using the same column
> multi-times results in an error as on HEAD, but that's not a new
> problem with this patch.

Thanks!

> So I would tend to think that the same column specified multiple times
> should cause an error, and that we could let VACUUM run work N times
> on a relation if it is specified this much. This feels more natural,
> at least to me, and it keeps the code simple.

I think that is a reasonable approach.  Another option I was thinking
about was to de-duplicate only the individual column lists.  This
alternative approach might be a bit more user-friendly, but I am
beginning to agree with you that perhaps we should not try to infer
the intent of the user in these "duplicate" scenarios.

I'll work on converting the existing de-duplication patch into
something more like what you suggested.

Nathan


Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Alvaro Herrera
Date:
Tom Lane wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
> > On 9/4/17, 10:32 PM, "Simon Riggs" <simon@2ndquadrant.com> wrote:
> >> If we want to keep the code simple we must surely consider whether the
> >> patch has any utility.
> 
> > ... I'd argue that this feels like a natural extension of the
> > VACUUM command, one that I, like others much earlier in this thread,
> > was surprised to learn wasn't supported.
> 
> Yeah.  To me, one big argument for allowing multiple target tables is that
> we allow it for other common utility commands such as TRUNCATE or LOCK
> TABLE.

TRUNCATE has actual an feature behind its multi-table ability: you can
truncate tables linked by FKs that way, and not otherwise.  VACUUM, like
LOCK TABLE, have no such benefit.

(If one is programatically locking multiple tables, it is easier to do
one table per command than many in one command, anyway.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Wed, Sep 6, 2017 at 2:36 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/4/17, 8:16 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> So I would tend to think that the same column specified multiple times
>> should cause an error, and that we could let VACUUM run work N times
>> on a relation if it is specified this much. This feels more natural,
>> at least to me, and it keeps the code simple.
>
> I think that is a reasonable approach.  Another option I was thinking
> about was to de-duplicate only the individual column lists.  This
> alternative approach might be a bit more user-friendly, but I am
> beginning to agree with you that perhaps we should not try to infer
> the intent of the user in these "duplicate" scenarios.
>
> I'll work on converting the existing de-duplication patch into
> something more like what you suggested.

Cool. I'll look at anything you have.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/5/17, 5:53 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> On Wed, Sep 6, 2017 at 2:36 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> On 9/4/17, 8:16 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>>> So I would tend to think that the same column specified multiple times
>>> should cause an error, and that we could let VACUUM run work N times
>>> on a relation if it is specified this much. This feels more natural,
>>> at least to me, and it keeps the code simple.
>>
>> I think that is a reasonable approach.  Another option I was thinking
>> about was to de-duplicate only the individual column lists.  This
>> alternative approach might be a bit more user-friendly, but I am
>> beginning to agree with you that perhaps we should not try to infer
>> the intent of the user in these "duplicate" scenarios.
>>
>> I'll work on converting the existing de-duplication patch into
>> something more like what you suggested.
>
> Cool. I'll look at anything you have.

I've attached v1 of this patch.  I think we might want to refactor the
code for retrieving the relation name from a RangeVar, but it would
probably be better to do that in a separate patch.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Sep 7, 2017 at 9:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I've attached v1 of this patch.  I think we might want to refactor the
> code for retrieving the relation name from a RangeVar, but it would
> probably be better to do that in a separate patch.

Using the patch checking for duplicate columns:
=# create table aa (a int);
CREATE TABLE
=# vacuum ANALYZE aa(z, z);
ERROR:  0A000: column lists cannot have duplicate entries
HINT:  the column list specified for relation "aa" contains duplicates
LOCATION:  check_column_lists, vacuum.c:619
Shouldn't the priority be given to undefined columns instead of
duplicates? You may want to add a test for that as well.
-- 
Michael



Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/7/17, 2:33 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Using the patch checking for duplicate columns:
> =# create table aa (a int);
> CREATE TABLE
> =# vacuum ANALYZE aa(z, z);
> ERROR:  0A000: column lists cannot have duplicate entries
> HINT:  the column list specified for relation "aa" contains duplicates
> LOCATION:  check_column_lists, vacuum.c:619
> Shouldn't the priority be given to undefined columns instead of
> duplicates? You may want to add a test for that as well.

I agree.  I've fixed this and added a couple relevant tests cases in
v2.

I've also attached a v15 of the main patch.  In check_columns_exist(),
there was a 'return' that should be a 'continue'.  This caused us to
skip the column existence checks for column lists defined after a table
with no column list.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Fri, Sep 8, 2017 at 7:27 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/7/17, 2:33 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> Using the patch checking for duplicate columns:
>> =# create table aa (a int);
>> CREATE TABLE
>> =# vacuum ANALYZE aa(z, z);
>> ERROR:  0A000: column lists cannot have duplicate entries
>> HINT:  the column list specified for relation "aa" contains duplicates
>> LOCATION:  check_column_lists, vacuum.c:619
>> Shouldn't the priority be given to undefined columns instead of
>> duplicates? You may want to add a test for that as well.
>
> I agree.  I've fixed this and added a couple relevant tests cases in
> v2.

Thanks. This looks now correct to me. Except that:
+           ereport(ERROR,
+               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                errmsg("column lists cannot have duplicate entries"),
+                errhint("the column list specified for relation
\"%s\" contains duplicates",
+                   relation->relation->relname)));
This should use ERRCODE_DUPLICATE_COLUMN.

> I've also attached a v15 of the main patch.  In check_columns_exist(),
> there was a 'return' that should be a 'continue'.  This caused us to
> skip the column existence checks for column lists defined after a table
> with no column list.

I can see that. Nicely spotted.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/8/17, 1:27 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Thanks. This looks now correct to me. Except that:
> +           ereport(ERROR,
> +               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                errmsg("column lists cannot have duplicate entries"),
> +                errhint("the column list specified for relation
> \"%s\" contains duplicates",
> +                   relation->relation->relname)));
> This should use ERRCODE_DUPLICATE_COLUMN.

Absolutely.  This is fixed in v3.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Sat, Sep 9, 2017 at 2:05 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/8/17, 1:27 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> Thanks. This looks now correct to me. Except that:
>> +           ereport(ERROR,
>> +               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +                errmsg("column lists cannot have duplicate entries"),
>> +                errhint("the column list specified for relation
>> \"%s\" contains duplicates",
>> +                   relation->relation->relname)));
>> This should use ERRCODE_DUPLICATE_COLUMN.
>
> Absolutely.  This is fixed in v3.

In the duplicate patch, it seems to me that you can save one lookup at
the list of VacuumRelation items by checking for column duplicates
after checking that all the columns are defined. If you put the
duplicate check before closing the relation you can also use the
schema name associated with the Relation.

+           if (i == InvalidAttrNumber)
+               ereport(ERROR,
+                   (errcode(ERRCODE_UNDEFINED_COLUMN),
+                    errmsg("column \"%s\" of relation \"%s\" does not exist",
+                       col, RelationGetRelationName(rel))));
This could use the schema name unconditionally as you hold a Relation
here, using RelationGetNamespace().
   if (!onerel)
+   {
+       /*
+        * If one of the relations specified by the user has disappeared
+        * since we last looked it up, let them know so that they do not
+        * think it was actually analyzed.
+        */
+       if (!IsAutoVacuumWorkerProcess() && relation)
+           ereport(WARNING,
+                 (errmsg("skipping \"%s\" --- relation no longer exists",
+                         relation->relname)));
+       return;
+   }
Hm. So if the relation with the defined OID is not found, then we'd
use the RangeVar, but this one may not be set here:
+           /*
+            * It is safe to leave everything except the OID empty here.
+            * Since no tables were specified in the VacuumStmt, we know
+            * we don't have any columns to keep track of.  Also, we do
+            * not need the RangeVar, because it is only used for error
+            * messaging when specific relations are chosen.
+            */
+           rel_oid = HeapTupleGetOid(tuple);
+           relinfo = makeVacuumRelation(NULL, NIL, rel_oid);
+           vacrels_tmp = lappend(vacrels_tmp, relinfo);
So if the relation is analyzed but skipped, we would have no idea that
it actually got skipped because there are no reports about it. That's
not really user-friendly. I am wondering if we should not instead have
analyze_rel also enforce the presence of a RangeVar, and adding an
assert at the beginning of the function to undertline that, and also
do the same for vacuum(). It would make things also consistent with
vacuum() which now implies on HEAD that a RangeVar *must* be
specified.

Sorry for noticing that just now, I am switching the patch back to
waiting on author.

Are there opinions about back-patching the patch checking for
duplicate columns? Stable branches now complain about an unhelpful
error message.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/9/17, 7:28 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> In the duplicate patch, it seems to me that you can save one lookup at
> the list of VacuumRelation items by checking for column duplicates
> after checking that all the columns are defined. If you put the
> duplicate check before closing the relation you can also use the
> schema name associated with the Relation.

I did this so that the ERROR prioritization would be enforced across all
relations.  For example:
VACUUM ANALYZE table1 (a, a), table2 (does_not_exist);

If I combine the 'for' loops to save a lookup, this example behaves
differently.  Instead of an ERROR for the nonexistent column, you would
hit an ERROR for the duplicate column in table1's list.  However, I do
not mind changing this.

> +           if (i == InvalidAttrNumber)
> +               ereport(ERROR,
> +                   (errcode(ERRCODE_UNDEFINED_COLUMN),
> +                    errmsg("column \"%s\" of relation \"%s\" does not exist",
> +                       col, RelationGetRelationName(rel))));
> This could use the schema name unconditionally as you hold a Relation
> here, using RelationGetNamespace().

Sure, I think this is a good idea.  I'll make this change in the next
version of the patch.

>     if (!onerel)
> +   {
> +       /*
> +        * If one of the relations specified by the user has disappeared
> +        * since we last looked it up, let them know so that they do not
> +        * think it was actually analyzed.
> +        */
> +       if (!IsAutoVacuumWorkerProcess() && relation)
> +           ereport(WARNING,
> +                 (errmsg("skipping \"%s\" --- relation no longer exists",
> +                         relation->relname)));
> +
>         return;
> +   }
> Hm. So if the relation with the defined OID is not found, then we'd
> use the RangeVar, but this one may not be set here:
> +           /*
> +            * It is safe to leave everything except the OID empty here.
> +            * Since no tables were specified in the VacuumStmt, we know
> +            * we don't have any columns to keep track of.  Also, we do
> +            * not need the RangeVar, because it is only used for error
> +            * messaging when specific relations are chosen.
> +            */
> +           rel_oid = HeapTupleGetOid(tuple);
> +           relinfo = makeVacuumRelation(NULL, NIL, rel_oid);
> +           vacrels_tmp = lappend(vacrels_tmp, relinfo);
> So if the relation is analyzed but skipped, we would have no idea that
> it actually got skipped because there are no reports about it. That's
> not really user-friendly. I am wondering if we should not instead have
> analyze_rel also enforce the presence of a RangeVar, and adding an
> assert at the beginning of the function to undertline that, and also
> do the same for vacuum(). It would make things also consistent with
> vacuum() which now implies on HEAD that a RangeVar *must* be
> specified.

I agree that it is nice to see when relations are skipped, but I do not
know if the WARNING messages would provide much value for this
particular use case (i.e. 'VACUUM;').  If a user does not provide a list
of tables to VACUUM, they might not care too much about WARNINGs for
dropped tables.

> Are there opinions about back-patching the patch checking for
> duplicate columns? Stable branches now complain about an unhelpful
> error message.

I wouldn't mind drafting something up for the stable branches.

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Mon, Sep 11, 2017 at 9:38 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> I agree that it is nice to see when relations are skipped, but I do not
> know if the WARNING messages would provide much value for this
> particular use case (i.e. 'VACUUM;').  If a user does not provide a list
> of tables to VACUUM, they might not care too much about WARNINGs for
> dropped tables.

Some users trigger manual VACUUM with cron jobs in moments of
low-activity as autovacuum may sometimes not be able to keep up with
hte bloat cleanup. It seems to me that getting WARNING messages is
particularly important for partitioned tables.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/9/17, 7:28 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> In the duplicate patch, it seems to me that you can save one lookup at
> the list of VacuumRelation items by checking for column duplicates
> after checking that all the columns are defined. If you put the
> duplicate check before closing the relation you can also use the
> schema name associated with the Relation.

I've made these changes in v4 of the duplicate patch.

> +           if (i == InvalidAttrNumber)
> +               ereport(ERROR,
> +                   (errcode(ERRCODE_UNDEFINED_COLUMN),
> +                    errmsg("column \"%s\" of relation \"%s\" does not exist",
> +                       col, RelationGetRelationName(rel))));
> This could use the schema name unconditionally as you hold a Relation
> here, using RelationGetNamespace().

This is added in v16 of the main patch.

> So if the relation is analyzed but skipped, we would have no idea that
> it actually got skipped because there are no reports about it. That's
> not really user-friendly. I am wondering if we should not instead have
> analyze_rel also enforce the presence of a RangeVar, and adding an
> assert at the beginning of the function to undertline that, and also
> do the same for vacuum(). It would make things also consistent with
> vacuum() which now implies on HEAD that a RangeVar *must* be
> specified.

I've made these changes in v16 of the main patch.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Mon, Sep 11, 2017 at 2:05 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> +           if (i == InvalidAttrNumber)
>> +               ereport(ERROR,
>> +                   (errcode(ERRCODE_UNDEFINED_COLUMN),
>> +                    errmsg("column \"%s\" of relation \"%s\" does not exist",
>> +                       col, RelationGetRelationName(rel))));
>> This could use the schema name unconditionally as you hold a Relation
>> here, using RelationGetNamespace().
>
> This is added in v16 of the main patch.
>
>> So if the relation is analyzed but skipped, we would have no idea that
>> it actually got skipped because there are no reports about it. That's
>> not really user-friendly. I am wondering if we should not instead have
>> analyze_rel also enforce the presence of a RangeVar, and adding an
>> assert at the beginning of the function to undertline that, and also
>> do the same for vacuum(). It would make things also consistent with
>> vacuum() which now implies on HEAD that a RangeVar *must* be
>> specified.
>
> I've made these changes in v16 of the main patch.

+           if (include_parts)
+           {
+               List *partition_oids = find_all_inheritors(relid, NoLock, NULL);
+               ListCell *part_lc;
+               foreach(part_lc, partition_oids)
+               {
+                   VacuumRelation *tmp = copyObject(relinfo);
+                   Oid part_oid = lfirst_oid(part_lc);
+                   tmp->oid = part_oid;
+                   vacrels_tmp = lappend(vacrels_tmp, tmp);
+               }
+           }
I thought that you would have changed that as well, but that's not
completely complete... In my opinion, HEAD is wrong in using the same
RangeVar for error reporting for a parent table and its partitions, so
that's not completely the fault of your patch. But I think that as
this patch makes vacuum routines smarter, you should create a new
RangeVar using makeRangeVar as you hold the OID of the child partition
in this code path. This would allow error reports to actually use the
data of the partition saved here instead of the parent data.

The rest looks fine to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/11/17, 9:28 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> +           if (include_parts)
> +           {
> +               List *partition_oids = find_all_inheritors(relid, NoLock, NULL);
> +               ListCell *part_lc;
> +               foreach(part_lc, partition_oids)
> +               {
> +                   VacuumRelation *tmp = copyObject(relinfo);
> +                   Oid part_oid = lfirst_oid(part_lc);
> +                   tmp->oid = part_oid;
> +                   vacrels_tmp = lappend(vacrels_tmp, tmp);
> +               }
> +           }
> I thought that you would have changed that as well, but that's not
> completely complete... In my opinion, HEAD is wrong in using the same
> RangeVar for error reporting for a parent table and its partitions, so
> that's not completely the fault of your patch. But I think that as
> this patch makes vacuum routines smarter, you should create a new
> RangeVar using makeRangeVar as you hold the OID of the child partition
> in this code path. This would allow error reports to actually use the
> data of the partition saved here instead of the parent data.

Good catch.  I had missed this.  It is added in v17.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
Sorry for the spam.  I am re-sending these patches with modified names so that
the apply order is obvious to the new automated testing framework (and to
everybody else).

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Wed, Sep 13, 2017 at 12:31 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> Sorry for the spam.  I am re-sending these patches with modified names so that
> the apply order is obvious to the new automated testing framework (and to
> everybody else).

- * relid, if not InvalidOid, indicate the relation to process; otherwise,
- * the RangeVar is used.  (The latter must always be passed, because it's
- * used for error messages.)
[...]
+typedef struct VacuumRelation
+{
+   NodeTag      type;
+   RangeVar    *relation;  /* single table to process */
+   List        *va_cols;   /* list of column names, or NIL for all */
+   Oid      oid;       /* corresponding OID (filled in by [auto]vacuum.c) */
+} VacuumRelation;
We lose a bit of information here. I think that it would be good to
mention in the declaration of VacuumRelation that the RangeVar is used
for error processing, and needs to be filled. I have complained about
that upthread already, perhaps this has slipped away when rebasing.

+           int i = attnameAttNum(rel, col, false);
+
+           if (i != InvalidAttrNumber)
+               continue;
Nit: allocating "i" makes little sense here. You are not using it for
any other checks.
/*
- * Build a list of Oids for each relation to be processed
+ * Determine the OID for each relation to be processed * * The list is built in vac_context so that it will survive
acrossour * per-relation transactions. */
 
-static List *
-get_rel_oids(Oid relid, const RangeVar *vacrel)
+static void
+get_rel_oids(List **vacrels)
Yeah, that's not completely correct either. This would be more like
"Fill in the list of VacuumRelation entries with their corresponding
OIDs, adding extra entries for partitioned tables".

Those are minor points. The patch seems to be in good shape, and
passes all my tests, including some pgbench'ing to make sure that
nothing goes weird. So I'll be happy to finally switch both patches to
"ready for committer" once those minor points are addressed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/12/17, 9:47 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> - * relid, if not InvalidOid, indicate the relation to process; otherwise,
> - * the RangeVar is used.  (The latter must always be passed, because it's
> - * used for error messages.)
> [...]
> +typedef struct VacuumRelation
> +{
> +   NodeTag      type;
> +   RangeVar    *relation;  /* single table to process */
> +   List        *va_cols;   /* list of column names, or NIL for all */
> +   Oid      oid;       /* corresponding OID (filled in by [auto]vacuum.c) */
> +} VacuumRelation;
> We lose a bit of information here. I think that it would be good to
> mention in the declaration of VacuumRelation that the RangeVar is used
> for error processing, and needs to be filled. I have complained about
> that upthread already, perhaps this has slipped away when rebasing.

I've added a comment to this effect in v18 of the main patch.

> +           int i = attnameAttNum(rel, col, false);
> +
> +           if (i != InvalidAttrNumber)
> +               continue;
> Nit: allocating "i" makes little sense here. You are not using it for
> any other checks.

True.  I've removed "i" in v18.

>  /*
> - * Build a list of Oids for each relation to be processed
> + * Determine the OID for each relation to be processed
>   *
>   * The list is built in vac_context so that it will survive across our
>   * per-relation transactions.
>   */
> -static List *
> -get_rel_oids(Oid relid, const RangeVar *vacrel)
> +static void
> +get_rel_oids(List **vacrels)
> Yeah, that's not completely correct either. This would be more like
> "Fill in the list of VacuumRelation entries with their corresponding
> OIDs, adding extra entries for partitioned tables".

I've added some more accurate comments for get_rel_oids() in v18.

> Those are minor points. The patch seems to be in good shape, and
> passes all my tests, including some pgbench'ing to make sure that
> nothing goes weird. So I'll be happy to finally switch both patches to
> "ready for committer" once those minor points are addressed.

Awesome.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Kyotaro HORIGUCHI
Date:
Hello, I began to look on this. (But it seems almost ready for committer..)

At Wed, 13 Sep 2017 11:47:11 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqTYbJRU14SG0qwueTLbZHutZ8OWCV0L9NiK1MQ_nzqCkw@mail.gmail.com>
> On Wed, Sep 13, 2017 at 12:31 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> > Sorry for the spam.  I am re-sending these patches with modified names so that
> > the apply order is obvious to the new automated testing framework (and to
> > everybody else).
> 
> - * relid, if not InvalidOid, indicate the relation to process; otherwise,
> - * the RangeVar is used.  (The latter must always be passed, because it's
> - * used for error messages.)
> [...]
> +typedef struct VacuumRelation
> +{
> +   NodeTag      type;
> +   RangeVar    *relation;  /* single table to process */
> +   List        *va_cols;   /* list of column names, or NIL for all */
> +   Oid      oid;       /* corresponding OID (filled in by [auto]vacuum.c) */
> +} VacuumRelation;
> We lose a bit of information here. I think that it would be good to
> mention in the declaration of VacuumRelation that the RangeVar is used
> for error processing, and needs to be filled. I have complained about
> that upthread already, perhaps this has slipped away when rebasing.
> 
> +           int i = attnameAttNum(rel, col, false);
> +
> +           if (i != InvalidAttrNumber)
> +               continue;
> Nit: allocating "i" makes little sense here. You are not using it for
> any other checks.
> 
>  /*
> - * Build a list of Oids for each relation to be processed
> + * Determine the OID for each relation to be processed
>   *
>   * The list is built in vac_context so that it will survive across our
>   * per-relation transactions.
>   */
> -static List *
> -get_rel_oids(Oid relid, const RangeVar *vacrel)
> +static void
> +get_rel_oids(List **vacrels)
> Yeah, that's not completely correct either. This would be more like
> "Fill in the list of VacuumRelation entries with their corresponding
> OIDs, adding extra entries for partitioned tables".
> 
> Those are minor points. The patch seems to be in good shape, and
> passes all my tests, including some pgbench'ing to make sure that
> nothing goes weird. So I'll be happy to finally switch both patches to
> "ready for committer" once those minor points are addressed.

May I ask one question?

This patch creates a new memory context "Vacuum" under
PortalContext in vacuum.c, but AFAICS the current context there
is PortalHeapMemory, which has the same expected lifetime with
the new context (that is, a child of PotalContext and dropeed in
PortalDrop). On the other hand the PortalMemory's lifetime is not
PortalStart to PortaDrop but the backend lifetime (initialized in
InitPostgres).

>  /*
>   * Create special memory context for cross-transaction storage.
>   *
>   * Since it is a child of PortalContext, it will go away eventually even
>   * if we suffer an error; there's no need for special abort cleanup logic.
>   */
>  vac_context = AllocSetContextCreate(PortalContext,
>                    "Vacuum",
>                    ALLOCSET_DEFAULT_SIZES);

So this seems to work as opposite to the expectation. Am I
missing something?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Wed, Sep 13, 2017 at 1:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> This patch creates a new memory context "Vacuum" under
> PortalContext in vacuum.c, but AFAICS the current context there
> is PortalHeapMemory, which has the same expected lifetime with
> the new context (that is, a child of PotalContext and dropeed in
> PortalDrop). On the other hand the PortalMemory's lifetime is not
> PortalStart to PortaDrop but the backend lifetime (initialized in
> InitPostgres).

Which patch are you looking to? This introduces no new memory context,
be it in 0001 or 0002 in its last versions. I don't recall during the
successive reviews seeing that pattern as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Wed, Sep 13, 2017 at 1:12 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/12/17, 9:47 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> Those are minor points. The patch seems to be in good shape, and
>> passes all my tests, including some pgbench'ing to make sure that
>> nothing goes weird. So I'll be happy to finally switch both patches to
>> "ready for committer" once those minor points are addressed.
>
> Awesome.

OK, let's roll in then. There are a couple of points or comments that
could be tweaked like the name of get_rel_oids, but I usually have bad
taste when it comes to that, and the current things make sense as
well. So both patches are now marked as ready for committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Kyotaro HORIGUCHI
Date:
At Wed, 13 Sep 2017 13:16:52 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqSi7N1dVk=sYxoBj-Arkri341ydNO5rdnoCfo1sXmbv_A@mail.gmail.com>
> On Wed, Sep 13, 2017 at 1:13 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > This patch creates a new memory context "Vacuum" under
> > PortalContext in vacuum.c, but AFAICS the current context there
> > is PortalHeapMemory, which has the same expected lifetime with
> > the new context (that is, a child of PotalContext and dropeed in
> > PortalDrop). On the other hand the PortalMemory's lifetime is not
> > PortalStart to PortaDrop but the backend lifetime (initialized in
> > InitPostgres).
> 
> Which patch are you looking to? This introduces no new memory context,
> be it in 0001 or 0002 in its last versions. I don't recall during the
> successive reviews seeing that pattern as well.

Sorry. 0001 patch is just using the context. The context has been
introduced by the commit e415b469 and used only for buffer access
strategy object. I was confused by the following comment in the
patch.

| +     * Move our relation list to our special memory context so that we do
| +     * not lose it among our per-relation transactions.


The context exists there before the patch but anyway using the
context as per-portal context that doesn't need freeing seems to
result in memory leak.

regrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Kyotaro HORIGUCHI
Date:
Hello,

At Wed, 13 Sep 2017 17:28:20 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20170913.172820.141647434.horiguchi.kyotaro@lab.ntt.co.jp>
> The context exists there before the patch but anyway using the
> context as per-portal context that doesn't need freeing seems to
> result in memory leak.

It is released at the end of vacuum.
So it's no problem.
Sorry for the noise.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

"Bossart, Nathan" <bossartn@amazon.com> writes:
> [ 0001-vacuum_multiple_tables_v18.patch ]

I started to look at this, but soon became pretty desperately unhappy with
the way that it makes a bunch of tests on the relations and then releases
all locks on them.  That either makes the tests pointless, or requires you
to invent completely bizarre semantics, like this:
 * Check that all specified columns exist so that we can fast-fail * commands with multiple tables.  If the column
disappearsbefore we * actually process it, we will emit a WARNING and skip it later on.
 

I think that the way this ought to work is you process the VacuumRelation
structs purely sequentially, each in its own transaction, so you don't
need leaps of faith about what to do if a relation changes between the
first time you look at it and when you actually come to process it.
The idea of "fast failing" because some later VacuumRelation struct might
contain an error seems like useless complication, both in terms of the
implementation and the user-visible behavior.

It looks like some of this stuff might be the fault of the
partitioned-tables patch more than your own changes, but no time like
the present to try to improve matters.

As for the other patch, I wonder if it might not be better to
silently ignore duplicate column names.  But in either case, we don't
need a pre-check, IMO.  I'd just leave it to the lookup loop in
do_analyze_rel to notice if it's looked up the same column number
twice.  That would be more likely to lead to a back-patchable fix,
too.  0002 as it stands is useless as a back-patch basis, since it
proposes modifying code that doesn't even exist in the back branches.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/20/17, 2:29 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> I started to look at this...

Thanks for taking a look.

> I think that the way this ought to work is you process the VacuumRelation
> structs purely sequentially, each in its own transaction, so you don't
> need leaps of faith about what to do if a relation changes between the
> first time you look at it and when you actually come to process it.

How might this work for VACUUM statements with no tables specified?  It
seems like we must either handle the case when the relation changes before
it is processed or hold a lock for the duration of the vacuuming.

> The idea of "fast failing" because some later VacuumRelation struct might
> contain an error seems like useless complication, both in terms of the
> implementation and the user-visible behavior.

I agree that the patch might be simpler without this, but the user-visible
behavior is the reason I had included it.  In short, my goal was to avoid
errors halfway through a long-running VACUUM statement because the user
misspelled a relation/column name or the relation/column was dropped.
It's true that the tests become mostly pointless if the relations or
columns change before they are processed, but this seems like it would be
a rarer issue in typical use cases.

I'll continue to look into switching to a more sequential approach and
eliminating the leaps of faith.

> As for the other patch, I wonder if it might not be better to
> silently ignore duplicate column names.  But in either case, we don't
> need a pre-check, IMO.  I'd just leave it to the lookup loop in
> do_analyze_rel to notice if it's looked up the same column number
> twice.  That would be more likely to lead to a back-patchable fix,
> too.  0002 as it stands is useless as a back-patch basis, since it
> proposes modifying code that doesn't even exist in the back branches.

I'd be happy to write something up that is more feasibly back-patched.

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

"Bossart, Nathan" <bossartn@amazon.com> writes:
> On 9/20/17, 2:29 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> I think that the way this ought to work is you process the VacuumRelation
>> structs purely sequentially, each in its own transaction, so you don't
>> need leaps of faith about what to do if a relation changes between the
>> first time you look at it and when you actually come to process it.

> How might this work for VACUUM statements with no tables specified?  It
> seems like we must either handle the case when the relation changes before
> it is processed or hold a lock for the duration of the vacuuming.

I don't see a need to change the way that vacuum-with-no-table operates.
It essentially collects a list of relation OIDs that exist at the start
of vacuuming, and then vacuums all the ones that still exist when their
turn comes.  Since no information beyond the bare OID is saved across the
preceding transactions, there's not really any schema-change risk involved.

We could possibly adapt that concept to the inheritance/partitioning cases
for vacuum with a table name or list of names: when we first come to a
VacuumRelation, collect a list of child table OIDs, and then process each
one unless it's disappeared by the time its turn comes.  But in any case,
we should not be doing any checks on a particular relation until we've got
it open and locked with intent to vacuum.

>> The idea of "fast failing" because some later VacuumRelation struct might
>> contain an error seems like useless complication, both in terms of the
>> implementation and the user-visible behavior.

> I agree that the patch might be simpler without this, but the user-visible
> behavior is the reason I had included it.  In short, my goal was to avoid
> errors halfway through a long-running VACUUM statement because the user
> misspelled a relation/column name or the relation/column was dropped.

I don't particularly buy that argument, because it's not the case that
the preceding processing was wasted when that happens.  We've done and
committed the vacuuming work for the earlier relations.

> It's true that the tests become mostly pointless if the relations or
> columns change before they are processed, but this seems like it would be
> a rarer issue in typical use cases.

Nonetheless, we'd have to explain this behavior to people, and I think
it's mostly useless complication.  With what I'm proposing, if vacuum
complains about the third table in the list, you know it has done the
ones before that.  What what you want to do, maybe it did the ones
before that, or maybe it didn't.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Sep 21, 2017 at 8:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
>> On 9/20/17, 2:29 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> The idea of "fast failing" because some later VacuumRelation struct might
>>> contain an error seems like useless complication, both in terms of the
>>> implementation and the user-visible behavior.
>
>> I agree that the patch might be simpler without this, but the user-visible
>> behavior is the reason I had included it.  In short, my goal was to avoid
>> errors halfway through a long-running VACUUM statement because the user
>> misspelled a relation/column name or the relation/column was dropped.
>
> I don't particularly buy that argument, because it's not the case that
> the preceding processing was wasted when that happens.  We've done and
> committed the vacuuming work for the earlier relations.

I think that the problem can be seen differently though: the next
relations on the list would not be processed as well. For example in
parallel of a manual VACUUM triggered by a cron job, say that a rogue
admin removes a column for a relation to be VACUUM-ed. The relations
processed before the relation redefined would have been vacuumed and
the transaction doing the vacuum committed, but the ones listed after
would not have been updated in this nightly VACUUM. From this angle
this sounds like a fair concern to me. I agree that the rogue admin
should not have done that to begin with.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, Sep 21, 2017 at 8:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Bossart, Nathan" <bossartn@amazon.com> writes:
>>> I agree that the patch might be simpler without this, but the user-visible
>>> behavior is the reason I had included it.  In short, my goal was to avoid
>>> errors halfway through a long-running VACUUM statement because the user
>>> misspelled a relation/column name or the relation/column was dropped.

>> I don't particularly buy that argument, because it's not the case that
>> the preceding processing was wasted when that happens.  We've done and
>> committed the vacuuming work for the earlier relations.

> I think that the problem can be seen differently though: the next
> relations on the list would not be processed as well. For example in
> parallel of a manual VACUUM triggered by a cron job, say that a rogue
> admin removes a column for a relation to be VACUUM-ed. The relations
> processed before the relation redefined would have been vacuumed and
> the transaction doing the vacuum committed, but the ones listed after
> would not have been updated in this nightly VACUUM.

Um ... so?  With Nathan's proposed behavior, there are two cases depending
on just when the unexpected schema change happens:

1. *None* of the work gets done.

2. The work before the troublesome relation gets done, and the work after
doesn't.

I think it'll be much easier to understand if the behavior is always (2).
And I don't see any particular advantage to (1) anyway, especially not
for an unattended vacuum script.

Keep in mind that there were not-entirely-unjustified complaints upthread
about whether we needed to add any complexity here at all.  I'd just as
soon keep the added complexity to a minimum, especially when it's in
service of behaviors that are not clearly improvements.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Sep 21, 2017 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Thu, Sep 21, 2017 at 8:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> "Bossart, Nathan" <bossartn@amazon.com> writes:
>>>> I agree that the patch might be simpler without this, but the user-visible
>>>> behavior is the reason I had included it.  In short, my goal was to avoid
>>>> errors halfway through a long-running VACUUM statement because the user
>>>> misspelled a relation/column name or the relation/column was dropped.
>
>>> I don't particularly buy that argument, because it's not the case that
>>> the preceding processing was wasted when that happens.  We've done and
>>> committed the vacuuming work for the earlier relations.
>
>> I think that the problem can be seen differently though: the next
>> relations on the list would not be processed as well. For example in
>> parallel of a manual VACUUM triggered by a cron job, say that a rogue
>> admin removes a column for a relation to be VACUUM-ed. The relations
>> processed before the relation redefined would have been vacuumed and
>> the transaction doing the vacuum committed, but the ones listed after
>> would not have been updated in this nightly VACUUM.
>
> Um ... so?  With Nathan's proposed behavior, there are two cases depending
> on just when the unexpected schema change happens:
>
> 1. *None* of the work gets done.
>
> 2. The work before the troublesome relation gets done, and the work after
> doesn't.
>
> I think it'll be much easier to understand if the behavior is always (2).
> And I don't see any particular advantage to (1) anyway, especially not
> for an unattended vacuum script.

You may be missing one which is closer to what autovacuum does:
3) Issue a warning for the troublesome relation, and get the work done
a maximum.

> Keep in mind that there were not-entirely-unjustified complaints upthread
> about whether we needed to add any complexity here at all. I'd just as
> soon keep the added complexity to a minimum, especially when it's in
> service of behaviors that are not clearly improvements.

Yeah, I have sympathy for that argument as well. At some point during
the review I am sure that I complained about such things :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, Sep 21, 2017 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Um ... so?  With Nathan's proposed behavior, there are two cases depending
>> on just when the unexpected schema change happens:
>> 1. *None* of the work gets done.
>> 2. The work before the troublesome relation gets done, and the work after
>> doesn't.

> You may be missing one which is closer to what autovacuum does:
> 3) Issue a warning for the troublesome relation, and get the work done
> a maximum.

Well, we could certainly discuss whether the behavior on detecting a
conflict ought to be "error" or "warning and continue".  But I do not buy
the value of "it might be one or the other depending on timing".
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Sep 21, 2017 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Thu, Sep 21, 2017 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Um ... so?  With Nathan's proposed behavior, there are two cases depending
>>> on just when the unexpected schema change happens:
>>> 1. *None* of the work gets done.
>>> 2. The work before the troublesome relation gets done, and the work after
>>> doesn't.
>
>> You may be missing one which is closer to what autovacuum does:
>> 3) Issue a warning for the troublesome relation, and get the work done
>> a maximum.
>
> Well, we could certainly discuss whether the behavior on detecting a
> conflict ought to be "error" or "warning and continue".  But I do not buy
> the value of "it might be one or the other depending on timing".

I definitely agree with that, and I don't want this point to be a
blocker for the proposed patch either. So if you feel that for now the
focus should be on simplicity rather than reliability (my word may be
incorrect here, I mean having a consistent and continuous flow), let's
do so then. We could always change the implemented behavior later on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/20/17, 7:58 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Michael Paquier <michael.paquier@gmail.com> writes:
>>> On Thu, Sep 21, 2017 at 9:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Um ... so?  With Nathan's proposed behavior, there are two cases depending
>>>> on just when the unexpected schema change happens:
>>>> 1. *None* of the work gets done.
>>>> 2. The work before the troublesome relation gets done, and the work after
>>>> doesn't.
>>
>>> You may be missing one which is closer to what autovacuum does:
>>> 3) Issue a warning for the troublesome relation, and get the work done
>>> a maximum.
>>
>> Well, we could certainly discuss whether the behavior on detecting a
>> conflict ought to be "error" or "warning and continue".  But I do not buy
>> the value of "it might be one or the other depending on timing".
>
> I definitely agree with that, and I don't want this point to be a
> blocker for the proposed patch either. So if you feel that for now the
> focus should be on simplicity rather than reliability (my word may be
> incorrect here, I mean having a consistent and continuous flow), let's
> do so then. We could always change the implemented behavior later on.

Alright, here is something that I think is more like what Tom envisioned.
The duplicate column checks have been moved to do_analyze_rel(), and I am
hoping that it is more feasible to back-patch.  The main path now
processes each specified relation individually, which admittedly makes the
patch a bit simpler.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
"Bossart, Nathan" <bossartn@amazon.com> writes:
> [ 0001-error_on_duplicate_columns_in_analyze_v6.patch ]

I've pushed (and back-patched) the 0001 patch, with some significant
changes:

* The list_concat_unique implementation is O(N^2), and seems a bit obscure
as well.  Perhaps there will never be so many column list entries that
the speed is a big issue, but I'm not sure.  I replaced the list with a
bitmap to avoid the risk.

* I did not see the point of having "ANALYZE t (x,x,x,nosuchcolumn)"
complain about nosuchcolumn rather than the duplicated x entries,
especially not if it meant you couldn't say which column was duplicated.
So I folded the test into the primary loop and made the error look
more like what you get for a nonexistent column.

* I didn't like the test case at all: it was repetitive and it actually
failed to exhibit the originally problematic behavior with unpatched
code, which is generally a minimum expectation for a regression test.
(I think that's because you used an empty table, for which ANALYZE will
not try to make any pg_statistic entries.)  So I rewrote that, and made
it use an existing table rather than create a whole new one just for this.

I'll take a look at the updated 0002 tomorrow, but if anyone living in
a different timezone wants to review it meanwhile, feel free.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Fri, Sep 22, 2017 at 7:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Bossart, Nathan" <bossartn@amazon.com> writes:
>> [ 0001-error_on_duplicate_columns_in_analyze_v6.patch ]
>
> I've pushed (and back-patched) the 0001 patch, with some significant
> changes:

Thanks. Arrived here too late to give feedback on the proposed patch.

> I'll take a look at the updated 0002 tomorrow, but if anyone living in
> a different timezone wants to review it meanwhile, feel free.

Here is some feedback. 0002 fails to apply after 0001 has been
committed in its regression tests, that can easily be resolved.

A couple of tests could be removed:
+VACUUM (FREEZE, ANALYZE) vactst (i);
+VACUUM (FREEZE, ANALYZE) vactst (i, i, i);
+VACUUM vactst (i);
void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
-      List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
+vacuum(int options, VacuumRelation *relation, VacuumParams *params,
+      BufferAccessStrategy bstrategy, bool isTopLevel)
[...]
-   relations = get_rel_oids(relid, relation);
+   relations = get_rel_oids(relation);
I still think that ExecVacuum() should pass a list of VacuumRelation
objects to vacuum(), and get_rel_oids() should take in input a list,
and return a completed lists. This way the decision-making of doing
everything in the same transaction should happens once in vacuum().
And actually, if several relations are defined with VACUUM, your patch
would not use one transaction per table as use_own_xacts would be set
to false. I think that Tom meant that relations whose processed has
finished have to be committed immediately. Per your patch, the commit
happens once all relations are committed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/21/17, 9:55 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> I still think that ExecVacuum() should pass a list of VacuumRelation
> objects to vacuum(), and get_rel_oids() should take in input a list,
> and return a completed lists. This way the decision-making of doing
> everything in the same transaction should happens once in vacuum().
> And actually, if several relations are defined with VACUUM, your patch
> would not use one transaction per table as use_own_xacts would be set
> to false. I think that Tom meant that relations whose processed has
> finished have to be committed immediately. Per your patch, the commit
> happens once all relations are committed.

Sorry, I must have misunderstood.  I've attached an updated patch that
looks more like what you described.  I also cleaned up the test cases
a bit.

IIUC the only time use_own_xacts will be false is when we are only
doing ANALYZE and at least one of the following is true:

    1. We are in a transaction block.
    2. We are processing only one relation.

From the code, it appears that vacuum_rel() always starts and commits a
new transaction for each relation:

         * vacuum_rel expects to be entered with no transaction active; it will
         * start and commit its own transaction.  But we are called by an SQL

So, by ensuring that get_rel_oids() returns a list whenever multiple
tables are specified, we are making sure that commands like

    ANALYZE table1, table2, table3;

create transactions for each processed relation (as long as they are
not inside a transaction block).  I suppose the alternative would be
to call vacuum() for each relation and to remove the restriction that
we must be processing more than one relation for use_own_xacts to be
true.

Am I understanding this correctly?

Nathan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/22/17, 10:56 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Sorry, I must have misunderstood.  I've attached an updated patch that
> looks more like what you described.  I also cleaned up the test cases
> a bit.

Here is a version of the patch without the switch to AutovacMemCxt in
autovacuum_do_vac_analyze(), which should no longer be necessary after
335f3d04.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Sat, Sep 23, 2017 at 12:56 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/21/17, 9:55 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> I still think that ExecVacuum() should pass a list of VacuumRelation
>> objects to vacuum(), and get_rel_oids() should take in input a list,
>> and return a completed lists. This way the decision-making of doing
>> everything in the same transaction should happens once in vacuum().
>> And actually, if several relations are defined with VACUUM, your patch
>> would not use one transaction per table as use_own_xacts would be set
>> to false. I think that Tom meant that relations whose processed has
>> finished have to be committed immediately. Per your patch, the commit
>> happens once all relations are committed.
>
> Sorry, I must have misunderstood.  I've attached an updated patch that
> looks more like what you described.  I also cleaned up the test cases
> a bit.
>
> IIUC the only time use_own_xacts will be false is when we are only
> doing ANALYZE and at least one of the following is true:
>
>         1. We are in a transaction block.
>         2. We are processing only one relation.

Yes.

> From the code, it appears that vacuum_rel() always starts and commits a
> new transaction for each relation:
>
>          * vacuum_rel expects to be entered with no transaction active; it will
>          * start and commit its own transaction.  But we are called by an SQL

Yes.

> So, by ensuring that get_rel_oids() returns a list whenever multiple
> tables are specified, we are making sure that commands like
>
>         ANALYZE table1, table2, table3;
>
> create transactions for each processed relation (as long as they are
> not inside a transaction block).

Yes.

> I suppose the alternative would be
> to call vacuum() for each relation and to remove the restriction that
> we must be processing more than one relation for use_own_xacts to be
> true.

The main point of my comment is that like ExecVacuum(), vacuum()
should be a high-level function where is decided if multiple
transactions should be run or not. By calling vacuum() multiple times
you break this promise. vacuum_rel should be the one working with
individual transactions.

Here is the diff between v19 and v21 that matters here:   /* Now go through the common routine */
-   if (vacstmt->rels == NIL)
-       vacuum(vacstmt->options, NULL, ¶ms, NULL, isTopLevel);
-   else
-   {
-       ListCell *lc;
-       foreach(lc, vacstmt->rels)
-           vacuum(vacstmt->options, lfirst_node(VacuumRelation, lc),
-                  ¶ms, NULL, isTopLevel);
-   }
+   vacuum(vacstmt->options, vacstmt->rels, ¶ms, NULL, isTopLevel);
If you do so, for an ANALYZE with multiple relations you would finish
by using the same transaction for all relations. I think that we had
better be consistent with VACUUM when not using an outer transaction
so as tables are analyzed and committed one by one. This does not
happen here: a unique transaction is used when using a list of
non-partitioned tables.

On Sun, Sep 24, 2017 at 4:37 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> Here is a version of the patch without the switch to AutovacMemCxt in
> autovacuum_do_vac_analyze(), which should no longer be necessary after
> 335f3d04.

Thanks for the new version.

+       if (!IsAutoVacuumWorkerProcess())
+           ereport(WARNING,
+                 (errmsg("skipping \"%s\" --- relation no longer exists",
+                         relation->relname)));
I like the use of WARNING here, but we could use as well a LOG to be
consistent when a lock obtention is skipped.

+            * going to commit this transaction and begin a new one between now
+            * and then.
+            */
+           relid = RangeVarGetRelid(relinfo->relation, NoLock, false);
We will likely have to wait that the matters discussed in
https://www.postgresql.org/message-id/25023.1506107590@sss.pgh.pa.us
are settled.

+VACUUM FULL vactst, vactst, vactst, vactst;
This is actually a waste of cycles.

I think I don't have much other comments about this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/25/17, 12:42 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> +       if (!IsAutoVacuumWorkerProcess())
> +           ereport(WARNING,
> +                 (errmsg("skipping \"%s\" --- relation no longer exists",
> +                         relation->relname)));
> I like the use of WARNING here, but we could use as well a LOG to be
> consistent when a lock obtention is skipped.

It looks like the LOG statement is only emitted for autovacuum, so maybe
we should keep this at WARNING for consistency with the permission checks
below it.

> +            * going to commit this transaction and begin a new one between now
> +            * and then.
> +            */
> +           relid = RangeVarGetRelid(relinfo->relation, NoLock, false);
> We will likely have to wait that the matters discussed in
> https://www.postgresql.org/message-id/25023.1506107590@sss.pgh.pa.us
> are settled.

Makes sense.

> +VACUUM FULL vactst, vactst, vactst, vactst;
> This is actually a waste of cycles.

I'll clean this up in v22.

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
Here's a v22.  Beyond a rebase, the only real difference is some cleanup
in the test cases.

On 9/26/17, 1:38 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 9/25/17, 12:42 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> +       if (!IsAutoVacuumWorkerProcess())
>> +           ereport(WARNING,
>> +                 (errmsg("skipping \"%s\" --- relation no longer exists",
>> +                         relation->relname)));
>> I like the use of WARNING here, but we could use as well a LOG to be
>> consistent when a lock obtention is skipped.
>
> It looks like the LOG statement is only emitted for autovacuum, so maybe
> we should keep this at WARNING for consistency with the permission checks
> below it.

I've left this as-is for now.  I considered emitting this statement as a
LOG for autovacuum, but I'm not sure there is terribly much value in
having autovacuum explain that it is skipping a relation because it was
concurrently dropped.  Perhaps this is something we should emit at a
DEBUG level.  What do you think?

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Thu, Sep 28, 2017 at 1:20 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/26/17, 1:38 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>> On 9/25/17, 12:42 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>>> +       if (!IsAutoVacuumWorkerProcess())
>>> +           ereport(WARNING,
>>> +                 (errmsg("skipping \"%s\" --- relation no longer exists",
>>> +                         relation->relname)));
>>> I like the use of WARNING here, but we could use as well a LOG to be
>>> consistent when a lock obtention is skipped.
>>
>> It looks like the LOG statement is only emitted for autovacuum, so maybe
>> we should keep this at WARNING for consistency with the permission checks
>> below it.
>
> I've left this as-is for now.  I considered emitting this statement as a
> LOG for autovacuum, but I'm not sure there is terribly much value in
> having autovacuum explain that it is skipping a relation because it was
> concurrently dropped.  Perhaps this is something we should emit at a
> DEBUG level.  What do you think?

DEBUG would be fine as well for me. Now that your patch provides a
RangeVar consistently for all code paths, the message could show up
unconditionally.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/28/17, 12:20 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> On Thu, Sep 28, 2017 at 1:20 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> On 9/26/17, 1:38 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>>> On 9/25/17, 12:42 AM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>>>> +       if (!IsAutoVacuumWorkerProcess())
>>>> +           ereport(WARNING,
>>>> +                 (errmsg("skipping \"%s\" --- relation no longer exists",
>>>> +                         relation->relname)));
>>>> I like the use of WARNING here, but we could use as well a LOG to be
>>>> consistent when a lock obtention is skipped.
>>>
>>> It looks like the LOG statement is only emitted for autovacuum, so maybe
>>> we should keep this at WARNING for consistency with the permission checks
>>> below it.
>>
>> I've left this as-is for now.  I considered emitting this statement as a
>> LOG for autovacuum, but I'm not sure there is terribly much value in
>> having autovacuum explain that it is skipping a relation because it was
>> concurrently dropped.  Perhaps this is something we should emit at a
>> DEBUG level.  What do you think?
>
> DEBUG would be fine as well for me. Now that your patch provides a
> RangeVar consistently for all code paths, the message could show up
> unconditionally.

Alright, I've added logging for autovacuum in v23.  I ended up needing to
do a little restructuring to handle the case when the relation was skipped
because the lock could not be obtained.  While doing so, I became
convinced that LOG was probably the right level for autovacuum logs.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Fri, Sep 29, 2017 at 2:44 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> Alright, I've added logging for autovacuum in v23.  I ended up needing to
> do a little restructuring to handle the case when the relation was skipped
> because the lock could not be obtained.  While doing so, I became
> convinced that LOG was probably the right level for autovacuum logs.

+       if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+           elevel = LOG;
+       else if (!IsAutoVacuumWorkerProcess())
+           elevel = WARNING;
+       else
+           elevel = 0;
OK, of course let's not change the existing log levels. This could be
always tuned later on depending on feedback from others. I can see
that guc.c also uses elevel == 0 for some logic, so we could rely on
that as you do.

@@ -116,6 +116,9 @@ analyze_rel(Oid relid, RangeVar *relation, int options,   int         elevel;
AcquireSampleRowsFuncacquirefunc = NULL;   BlockNumber relpages = 0;
 
+   bool        rel_lock;
+
+   Assert(relation != NULL);
I can see that this is new in your patch. Definitely adapted.

In short, I am switching it back to "ready for committer". We may want
the locking issues when building the relation list to be settled
first.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Sep 29, 2017 at 2:44 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>> Alright, I've added logging for autovacuum in v23.  I ended up needing to
>> do a little restructuring to handle the case when the relation was skipped
>> because the lock could not be obtained.  While doing so, I became
>> convinced that LOG was probably the right level for autovacuum logs.

> OK, of course let's not change the existing log levels. This could be
> always tuned later on depending on feedback from others. I can see
> that guc.c also uses elevel == 0 for some logic, so we could rely on
> that as you do.

FWIW, I don't think this patch should be mucking with logging behavior
at all; that's not within its headline charter, and I doubt many people
are paying attention.  I propose to commit it without that.  If you feel
hot about changing the logging behavior, you can resubmit that as a new
patch in a new thread where it will get some visibility and debate on
its own merits.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Fri, Sep 29, 2017 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Fri, Sep 29, 2017 at 2:44 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>> Alright, I've added logging for autovacuum in v23.  I ended up needing to
>>> do a little restructuring to handle the case when the relation was skipped
>>> because the lock could not be obtained.  While doing so, I became
>>> convinced that LOG was probably the right level for autovacuum logs.
>
>> OK, of course let's not change the existing log levels. This could be
>> always tuned later on depending on feedback from others. I can see
>> that guc.c also uses elevel == 0 for some logic, so we could rely on
>> that as you do.
>
> FWIW, I don't think this patch should be mucking with logging behavior
> at all; that's not within its headline charter, and I doubt many people
> are paying attention.  I propose to commit it without that.  If you feel
> hot about changing the logging behavior, you can resubmit that as a new
> patch in a new thread where it will get some visibility and debate on
> its own merits.

Okay. I am fine with that as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/28/17, 8:46 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> On Fri, Sep 29, 2017 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Michael Paquier <michael.paquier@gmail.com> writes:
>>> On Fri, Sep 29, 2017 at 2:44 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>>> Alright, I've added logging for autovacuum in v23.  I ended up needing to
>>>> do a little restructuring to handle the case when the relation was skipped
>>>> because the lock could not be obtained.  While doing so, I became
>>>> convinced that LOG was probably the right level for autovacuum logs.
>>
>>> OK, of course let's not change the existing log levels. This could be
>>> always tuned later on depending on feedback from others. I can see
>>> that guc.c also uses elevel == 0 for some logic, so we could rely on
>>> that as you do.
>>
>> FWIW, I don't think this patch should be mucking with logging behavior
>> at all; that's not within its headline charter, and I doubt many people
>> are paying attention.  I propose to commit it without that.  If you feel
>> hot about changing the logging behavior, you can resubmit that as a new
>> patch in a new thread where it will get some visibility and debate on
>> its own merits.
>
> Okay. I am fine with that as well.

Sure, that seems reasonable to me.

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/28/17, 10:05 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 9/28/17, 8:46 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> On Fri, Sep 29, 2017 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Michael Paquier <michael.paquier@gmail.com> writes:
>>>> On Fri, Sep 29, 2017 at 2:44 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
>>>>> Alright, I've added logging for autovacuum in v23.  I ended up needing to
>>>>> do a little restructuring to handle the case when the relation was skipped
>>>>> because the lock could not be obtained.  While doing so, I became
>>>>> convinced that LOG was probably the right level for autovacuum logs.
>>>
>>>> OK, of course let's not change the existing log levels. This could be
>>>> always tuned later on depending on feedback from others. I can see
>>>> that guc.c also uses elevel == 0 for some logic, so we could rely on
>>>> that as you do.
>>>
>>> FWIW, I don't think this patch should be mucking with logging behavior
>>> at all; that's not within its headline charter, and I doubt many people
>>> are paying attention.  I propose to commit it without that.  If you feel
>>> hot about changing the logging behavior, you can resubmit that as a new
>>> patch in a new thread where it will get some visibility and debate on
>>> its own merits.
>>
>> Okay. I am fine with that as well.
>
> Sure, that seems reasonable to me.

Here's a version without the logging changes in vacuum_rel() and
analyze_rel().  I’ll look into submitting those in the next commitfest.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 9/29/17, 9:33 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Here's a version without the logging changes in vacuum_rel() and
> analyze_rel().  I’ll look into submitting those in the next commitfest.

Since get_rel_oids() was altered in 19de0ab2, here is a new version of
the patch.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Mon, Oct 2, 2017 at 1:43 PM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/29/17, 9:33 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
>> Here's a version without the logging changes in vacuum_rel() and
>> analyze_rel().  I’ll look into submitting those in the next commitfest.
>
> Since get_rel_oids() was altered in 19de0ab2, here is a new version of
> the patch.

Okay, I think that you got that right.
--
Michael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

"Bossart, Nathan" <bossartn@amazon.com> writes:
> Since get_rel_oids() was altered in 19de0ab2, here is a new version of
> the patch.

I thought it would be a good idea to get this done before walking away
from the commitfest and letting all this info get swapped out of my
head.  So I've reviewed and pushed this.

I took out most of the infrastructure you'd put in for constructing
RangeVars for tables not explicitly named on the command line.  It
was buggy (eg you can't assume a relcache entry will stick around)
and I don't believe it's necessary.  I don't think that warnings
should be issued for any tables not explicitly named.

In any case, though, the extent to which we should add more warning
or log output seems like a fit topic for a new thread and a separate
patch.  Let's call this one done.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
"Bossart, Nathan"
Date:
On 10/3/17, 5:59 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> I thought it would be a good idea to get this done before walking away
> from the commitfest and letting all this info get swapped out of my
> head.  So I've reviewed and pushed this.

Thanks!

> I took out most of the infrastructure you'd put in for constructing
> RangeVars for tables not explicitly named on the command line.  It
> was buggy (eg you can't assume a relcache entry will stick around)
> and I don't believe it's necessary.  I don't think that warnings
> should be issued for any tables not explicitly named.
>
> In any case, though, the extent to which we should add more warning
> or log output seems like a fit topic for a new thread and a separate
> patch.  Let's call this one done.

I'll look into submitting that to the next commitfest.

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands

From
Michael Paquier
Date:
On Wed, Oct 4, 2017 at 9:52 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 10/3/17, 5:59 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> I thought it would be a good idea to get this done before walking away
>> from the commitfest and letting all this info get swapped out of my
>> head.  So I've reviewed and pushed this.
>
> Thanks!

+1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers