Thread: [Proposal] vacuumdb --schema only

[Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Hi,


When we want to vacuum and/or analyze all tables in a dedicated schema, 
let's say pg_catalog for example, there is no easy way to do that. The 
VACUUM command doesn't allow it so we have to use \gexec or a SQL script 
to do that. We have an external command vacuumdb that could be used to 
simplify this task. For example the following command can be used to 
clean all tables stored in the pg_catalog schema:

     vacuumdb --schema pg_catalog -d foo

The attached patch implements that. Option -n | --schema can be used 
multiple time and can not be used together with options -a or -t.


Common use cases are an application that creates lot of temporary 
objects then drop them which can bloat a lot the catalog or which have 
heavy work in some schemas only. Of course the good practice is to find 
the bloated tables and execute VACUUM on each table but if most of the 
tables in the schema are regularly bloated the use of the vacuumdb 
--schema script can save time.


I do not propose to extend the VACUUM and ANALYZE commands because their 
current syntax doesn't allow me to see an easy way to do that and also 
because I'm not really in favor of such change. But if there is interest 
in improving these commands I will be pleased to do that, with the 
syntax suggested.


Best regards,

-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Justin Pryzby
Date:
On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:
> The attached patch implements that. Option -n | --schema can be used
> multiple time and can not be used together with options -a or -t.

Yes, thanks.

I suggest there should also be an --exclude-schema.

> I do not propose to extend the VACUUM and ANALYZE commands because their
> current syntax doesn't allow me to see an easy way to do that

I think this would be easy with the parenthesized syntax.
I'm not suggesting to do it there, though.

> +    /*
> +     * When filtereing on schema name, filter by table is not allowed.
> +     * The schema name can already be set in a fqdn table name.

set *to*

-- 
Justin



Re: [Proposal] vacuumdb --schema only

From
Dinesh Chemuduru
Date:
On Fri, 4 Mar 2022 at 14:41, Gilles Darold <gilles@migops.com> wrote:
Hi,


When we want to vacuum and/or analyze all tables in a dedicated schema,
let's say pg_catalog for example, there is no easy way to do that. The
VACUUM command doesn't allow it so we have to use \gexec or a SQL script
to do that. We have an external command vacuumdb that could be used to
simplify this task. For example the following command can be used to
clean all tables stored in the pg_catalog schema:

     vacuumdb --schema pg_catalog -d foo


+1
This gives much better flexibility to users.


 
The attached patch implements that. Option -n | --schema can be used
multiple time and can not be used together with options -a or -t.


Common use cases are an application that creates lot of temporary
objects then drop them which can bloat a lot the catalog or which have
heavy work in some schemas only. Of course the good practice is to find
the bloated tables and execute VACUUM on each table but if most of the
tables in the schema are regularly bloated the use of the vacuumdb
--schema script can save time.


I do not propose to extend the VACUUM and ANALYZE commands because their
current syntax doesn't allow me to see an easy way to do that and also
because I'm not really in favor of such change. But if there is interest
in improving these commands I will be pleased to do that, with the
syntax suggested.


Best regards,

--
Gilles Darold

Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 04/03/2022 à 11:56, Justin Pryzby a écrit :
> On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:
>> The attached patch implements that. Option -n | --schema can be used
>> multiple time and can not be used together with options -a or -t.
> Yes, thanks.
>
> I suggest there should also be an --exclude-schema.


Ok, I will add it too.


>
>> I do not propose to extend the VACUUM and ANALYZE commands because their
>> current syntax doesn't allow me to see an easy way to do that
> I think this would be easy with the parenthesized syntax.
> I'm not suggesting to do it there, though.


Yes this is what I've though, something a la EXPLAIN, for example : 
"VACUUM (ANALYZE, SCHEMA foo)" but this is a change in the VACUUM syntax 
that needs to keep the compatibility with the current syntax. We will 
have two syntax something like "VACUUM ANALYZE FULL dbname" and "VACUUM 
(ANALYZE, FULL) dbname". The other syntax "problem" is to be able to use 
multiple schema values in the VACUUM command, perhaps "VACUUM (ANALYZE, 
SCHEMA (foo,bar))".


>> +    /*
>> +     * When filtereing on schema name, filter by table is not allowed.
>> +     * The schema name can already be set in a fqdn table name.
> set *to*

Thanks, will be fixed in next patch version.


-- 
Gilles Darold




Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 04/03/2022 à 11:56, Justin Pryzby a écrit :
> On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:
>> The attached patch implements that. Option -n | --schema can be used
>> multiple time and can not be used together with options -a or -t.
> Yes, thanks.
>
> I suggest there should also be an --exclude-schema.
>
>> I do not propose to extend the VACUUM and ANALYZE commands because their
>> current syntax doesn't allow me to see an easy way to do that
> I think this would be easy with the parenthesized syntax.
> I'm not suggesting to do it there, though.
>
>> +    /*
>> +     * When filtereing on schema name, filter by table is not allowed.
>> +     * The schema name can already be set in a fqdn table name.
> set *to*
>

Attached a new patch version that adds the -N | --exclude-schema option
to the vacuumdb command as suggested. Documentation updated too.


I will add this patch to the commitfest unless there is cons about
adding these options.


-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Justin Pryzby
Date:
On Sun, Mar 06, 2022 at 09:39:37AM +0100, Gilles Darold wrote:
> Attached a new patch version that adds the -N | --exclude-schema option
> to the vacuumdb command as suggested. Documentation updated too.
> 
> +        pg_log_error("cannot vacuum all tables in schema(s) and and exclude specific schema(s) at the same time");

and and

It's odd that schema_exclusion is a global var, but schemas/excluded are not.

Also, it seems unnecessary to have two schemas vars, since they can't be used
together.  Maybe there's a better way than what I did in 003.

> +        for (cell = schemas ? schemas->head : NULL; cell; cell = cell->next)

It's preferred to write cell != NULL

> +       bool            schemas_listed = false;
...
> +        for (cell = schemas ? schemas->head : NULL; cell; cell = cell->next)
> +        {
> +            if (!schemas_listed) {
> +                appendPQExpBufferStr(&catalog_query,
> +                                     " AND pg_catalog.quote_ident(ns.nspname)");
> +                if (schema_exclusion)
> +                    appendPQExpBufferStr(&catalog_query, " NOT IN (");
> +                else
> +                    appendPQExpBufferStr(&catalog_query, " IN (");
> +
> +                schemas_listed = true;
> +            }
> +            else
> +                appendPQExpBufferStr(&catalog_query, ", ");
> +
> +            appendStringLiteralConn(&catalog_query, cell->val, conn);
> +            appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace::pg_catalog.name");
> +
> +        }
> +        /* Finish formatting schema filter */
> +        if (schemas_listed)
> +            appendPQExpBufferStr(&catalog_query, ")\n");
>      }

Maybe it's clearer to write this with =ANY() / != ALL() ?
See 002.

-- 
Justin

Attachment

Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 06/03/2022 à 16:04, Justin Pryzby a écrit :
> On Sun, Mar 06, 2022 at 09:39:37AM +0100, Gilles Darold wrote:
>> Attached a new patch version that adds the -N | --exclude-schema option
>> to the vacuumdb command as suggested. Documentation updated too.
>>
>> +        pg_log_error("cannot vacuum all tables in schema(s) and and exclude specific schema(s) at the same time");
> and and
>
> It's odd that schema_exclusion is a global var, but schemas/excluded are not.
>
> Also, it seems unnecessary to have two schemas vars, since they can't be used
> together.  Maybe there's a better way than what I did in 003.
>
>> +        for (cell = schemas ? schemas->head : NULL; cell; cell = cell->next)
> It's preferred to write cell != NULL
>
>> +       bool            schemas_listed = false;
> ...
>> +        for (cell = schemas ? schemas->head : NULL; cell; cell = cell->next)
>> +        {
>> +            if (!schemas_listed) {
>> +                appendPQExpBufferStr(&catalog_query,
>> +                                     " AND pg_catalog.quote_ident(ns.nspname)");
>> +                if (schema_exclusion)
>> +                    appendPQExpBufferStr(&catalog_query, " NOT IN (");
>> +                else
>> +                    appendPQExpBufferStr(&catalog_query, " IN (");
>> +
>> +                schemas_listed = true;
>> +            }
>> +            else
>> +                appendPQExpBufferStr(&catalog_query, ", ");
>> +
>> +            appendStringLiteralConn(&catalog_query, cell->val, conn);
>> +            appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace::pg_catalog.name");
>> +
>> +        }
>> +        /* Finish formatting schema filter */
>> +        if (schemas_listed)
>> +            appendPQExpBufferStr(&catalog_query, ")\n");
>>      }
> Maybe it's clearer to write this with =ANY() / != ALL() ?
> See 002.
>

I have applied your changes and produced a new version v3 of the patch,
thanks for the improvements. The patch have been added to commitfest
interface, see here https://commitfest.postgresql.org/38/3587/


-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Hi,

New version v4 of the patch to fix a typo in a comment.

-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Justin Pryzby
Date:
On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
> > Maybe it's clearer to write this with =ANY() / != ALL() ?
> > See 002.
> 
> I have applied your changes and produced a new version v3 of the patch,
> thanks for the improvements. The patch have been added to commitfest
> interface, see here https://commitfest.postgresql.org/38/3587/

I wondered whether my patches were improvements, and it occurred to me that
your patch didn't fail if the specified schema didn't exist.  That's arguably
preferable, but that's the pre-existing behavior for tables.  So I think the
behavior of my patch is more consistent.

$ ./src/bin/scripts/vacuumdb -h /tmp -d postgres --table foo
vacuumdb: vacuuming database "postgres"
2022-03-09 15:04:06.922 CST client backend[25540] vacuumdb ERROR:  relation "foo" does not exist at character 60

$ ./src/bin/scripts/vacuumdb -h /tmp -d postgres --schema foo
vacuumdb: vacuuming database "postgres"
2022-03-09 15:02:59.926 CST client backend[23516] vacuumdb ERROR:  schema "foo" does not exist at character 335



Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 09/03/2022 à 22:10, Justin Pryzby a écrit :
On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
Maybe it's clearer to write this with =ANY() / != ALL() ?
See 002.
I have applied your changes and produced a new version v3 of the patch,
thanks for the improvements. The patch have been added to commitfest
interface, see here https://commitfest.postgresql.org/38/3587/
I wondered whether my patches were improvements, and it occurred to me that
your patch didn't fail if the specified schema didn't exist.  That's arguably
preferable, but that's the pre-existing behavior for tables.  So I think the
behavior of my patch is more consistent.

+1

-- 
Gilles Darold

Re: [Proposal] vacuumdb --schema only

From
Robert Treat
Date:
On Thu, Mar 10, 2022 at 1:32 AM Gilles Darold <gilles@migops.com> wrote:
>
> Le 09/03/2022 à 22:10, Justin Pryzby a écrit :
>
> On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
>
> Maybe it's clearer to write this with =ANY() / != ALL() ?
> See 002.
>
> I have applied your changes and produced a new version v3 of the patch,
> thanks for the improvements. The patch have been added to commitfest
> interface, see here https://commitfest.postgresql.org/38/3587/
>
> I wondered whether my patches were improvements, and it occurred to me that
> your patch didn't fail if the specified schema didn't exist.  That's arguably
> preferable, but that's the pre-existing behavior for tables.  So I think the
> behavior of my patch is more consistent.
>
> +1
>

+1 for consistency.

Robert Treat
https://xzilla.net



Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
I took a look at the v4 patch.

'git-apply' complains about whitespace errors:

    0001-vacuumdb-schema-only-v4.patch:17: tab in indent.
           <arg choice="plain">
    0001-vacuumdb-schema-only-v4.patch:18: tab in indent.
            <arg choice="opt">
    0001-vacuumdb-schema-only-v4.patch:19: tab in indent.
             <group choice="plain">
    0001-vacuumdb-schema-only-v4.patch:20: tab in indent.
              <arg choice="plain"><option>-n</option></arg>
    0001-vacuumdb-schema-only-v4.patch:21: tab in indent.
              <arg choice="plain"><option>--schema</option></arg>
    warning: squelched 13 whitespace errors
    warning: 18 lines add whitespace errors.

+    printf(_("  -N, --exclude-schema=PATTERN    do NOT vacuum tables in the specified schema(s)\n"));

I'm personally -1 for the --exclude-schema option.  I don't see any
existing "exclude" options in vacuumdb, and the uses for such an option
seem rather limited.  If we can point to specific use-cases for this
option, I might be willing to change my vote.

+   <para>
+    To clean all tables in the <literal>Foo</literal> and <literal>bar</literal> schemas
+    only in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>vacuumdb --schema='"Foo"' --schema='bar' xyzzy</userinput>
+</screen></para>

nitpicks: I think the phrasing should be "To only clean tables in the...".
Also, is there any reason to use a schema name with a capital letter as an
example?  IMO that just adds unnecessary complexity to the example.

+$node->issues_sql_like(
+    [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+    qr/VACUUM "Foo".*/,
+    'vacuumdb --schema schema only');

IIUC there should only be one table in the schema.  Can we avoid matching
"*" and check for the exact command instead?

I think there should be a few more test cases.  For example, we should test
using -n and -N at the same time, and we should test what happens when
those options are used for missing schemas.

+    /*
+     * When filtering on schema name, filter by table is not allowed.
+     * The schema name can already be set to a fqdn table name.
+     */
+    if (tbl_count && (schemas.head != NULL))
+    {
+        pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+        exit(1);
+    }

I think there might be some useful refactoring we can do that would
simplify adding similar options in the future.  Specifically, can we have a
global variable that stores the type of vacuumdb command (e.g., all,
tables, or schemas)?  If so, perhaps the tables list could be renamed and
reused for schemas (and any other objects that need listing in the future).

+        if (schemas != NULL && schemas->head != NULL)
+        {
+            appendPQExpBufferStr(&catalog_query,
+                                 " AND c.relnamespace");
+            if (schema_is_exclude == TRI_YES)
+                appendPQExpBufferStr(&catalog_query,
+                                    " OPERATOR(pg_catalog.!=) ALL (ARRAY[");
+            else if (schema_is_exclude == TRI_NO)
+                appendPQExpBufferStr(&catalog_query,
+                                    " OPERATOR(pg_catalog.=) ANY (ARRAY[");
+
+            for (cell = schemas->head; cell != NULL; cell = cell->next)
+            {
+                appendStringLiteralConn(&catalog_query, cell->val, conn);
+
+                if (cell->next != NULL)
+                    appendPQExpBufferStr(&catalog_query, ", ");
+            }
+
+            /* Finish formatting schema filter */
+            appendPQExpBufferStr(&catalog_query, "]::pg_catalog.regnamespace[])\n");
+        }

IMO we should use a CTE for specified schemas like we do for the specified
tables.  I wonder if we could even have a mostly-shared CTE code path for
all vacuumdb commands with a list of names.

-    /*
-     * If no tables were listed, filter for the relevant relation types.  If
-     * tables were given via --table, don't bother filtering by relation type.
-     * Instead, let the server decide whether a given relation can be
-     * processed in which case the user will know about it.
-     */
-    if (!tables_listed)
+    else
     {
+        /*
+         * If no tables were listed, filter for the relevant relation types.  If
+         * tables were given via --table, don't bother filtering by relation type.
+         * Instead, let the server decide whether a given relation can be
+         * processed in which case the user will know about it.
+         */

nitpick: This change seems unnecessary.

I noticed upthread that there was some discussion around adding a way to
specify a schema in VACUUM and ANALYZE commands.  I think this patch is
useful even if such an option is eventually added, as we'll still want
vacuumdb to obtain the full list of tables to process so that it can
effectively parallelize.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Justin Pryzby
Date:
On Wed, Mar 30, 2022 at 02:22:58PM -0700, Nathan Bossart wrote:
> I'm personally -1 for the --exclude-schema option.  I don't see any
> existing "exclude" options in vacuumdb, and the uses for such an option
> seem rather limited.  If we can point to specific use-cases for this
> option, I might be willing to change my vote.

I suggested it because I would consider using it, even though I don't currently
use the vacuumdb script at all.  I think this would allow partially
retiring/simplifying our existing vacuum script.

We 1) put all our partitions in a separate "child" schema (so \d is more
usable), and also 2) put some short-lived tables into their own schemas.  Some
of those tables may only exist for ~1 day so I'd perfer to neither vacuum nor
analyze them (they're only used for SELECT *).  But there can be a lot of them,
so a nightly job could do something like vacuumdb --schema public or vacuumdb
--exclude-schema ephemeral.

Everything would be processed nightly using vacuumdb --min-xid (to keep the
monitoring system happy).

The non-partitioned tables could be vacuumed nightly (without min-xid), with
--exclude ephemeral.

The partitioned tables could be processed monthly with vacuumdb --analyze.

I'd also want to be able to run vacuumdb --analyze nightly, but I'd want to
exclude the schema with short-lived tables.   I'd also need a way to exclude
our partitioned tables from nightly analyze (they should run monthly only).

Maybe this could share something with this patch:
https://commitfest.postgresql.org/37/2573/
pg_dump - read data for some options from external file

The goal of that patch was to put it in a file, which isn't really needed here.
But if there were common infrastructure for matching tables, it could be
shared.  The interesting part for this patch is to avoid adding separate
commandline arguments for --include-table, --exclude-table, --include-schema,
--exclude-schema (and anything else?)



Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Fri, Apr 01, 2022 at 10:01:28AM -0500, Justin Pryzby wrote:
> On Wed, Mar 30, 2022 at 02:22:58PM -0700, Nathan Bossart wrote:
>> I'm personally -1 for the --exclude-schema option.  I don't see any
>> existing "exclude" options in vacuumdb, and the uses for such an option
>> seem rather limited.  If we can point to specific use-cases for this
>> option, I might be willing to change my vote.
> 
> I suggested it because I would consider using it, even though I don't currently
> use the vacuumdb script at all.  I think this would allow partially
> retiring/simplifying our existing vacuum script.
> 
> We 1) put all our partitions in a separate "child" schema (so \d is more
> usable), and also 2) put some short-lived tables into their own schemas.  Some
> of those tables may only exist for ~1 day so I'd perfer to neither vacuum nor
> analyze them (they're only used for SELECT *).  But there can be a lot of them,
> so a nightly job could do something like vacuumdb --schema public or vacuumdb
> --exclude-schema ephemeral.
> 
> Everything would be processed nightly using vacuumdb --min-xid (to keep the
> monitoring system happy).
> 
> The non-partitioned tables could be vacuumed nightly (without min-xid), with
> --exclude ephemeral.
> 
> The partitioned tables could be processed monthly with vacuumdb --analyze.
> 
> I'd also want to be able to run vacuumdb --analyze nightly, but I'd want to
> exclude the schema with short-lived tables.   I'd also need a way to exclude
> our partitioned tables from nightly analyze (they should run monthly only).

Thanks for elaborating.  I retract my -1 vote.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 30/03/2022 à 23:22, Nathan Bossart a écrit :
> I took a look at the v4 patch.
>
> 'git-apply' complains about whitespace errors:

Fixed.


> +   <para>
> +    To clean all tables in the <literal>Foo</literal> and <literal>bar</literal> schemas
> +    only in a database named <literal>xyzzy</literal>:
> +<screen>
> +<prompt>$ </prompt><userinput>vacuumdb --schema='"Foo"' --schema='bar' xyzzy</userinput>
> +</screen></para>
>
> nitpicks: I think the phrasing should be "To only clean tables in the...".
> Also, is there any reason to use a schema name with a capital letter as an
> example?  IMO that just adds unnecessary complexity to the example.

I have though that an example of a schema with case sensitivity was
missing in the documentation but I agree with your comment, this is
probably not he best place to do that. Fixed.


> +$node->issues_sql_like(
> +    [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
> +    qr/VACUUM "Foo".*/,
> +    'vacuumdb --schema schema only');
>
> IIUC there should only be one table in the schema.  Can we avoid matching
> "*" and check for the exact command instead?

Fixed.


> I think there should be a few more test cases.  For example, we should test
> using -n and -N at the same time, and we should test what happens when
> those options are used for missing schemas.

Fixed


> +    /*
> +     * When filtering on schema name, filter by table is not allowed.
> +     * The schema name can already be set to a fqdn table name.
> +     */
> +    if (tbl_count && (schemas.head != NULL))
> +    {
> +        pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +        exit(1);
> +    }
>
> I think there might be some useful refactoring we can do that would
> simplify adding similar options in the future.  Specifically, can we have a
> global variable that stores the type of vacuumdb command (e.g., all,
> tables, or schemas)?  If so, perhaps the tables list could be renamed and
> reused for schemas (and any other objects that need listing in the future).

I don't think there will be much more options like this one that will be
added to this command but anyway I have changed the patch that way.


> +        if (schemas != NULL && schemas->head != NULL)
> +        {
> +            appendPQExpBufferStr(&catalog_query,
> +                                 " AND c.relnamespace");
> +            if (schema_is_exclude == TRI_YES)
> +                appendPQExpBufferStr(&catalog_query,
> +                                    " OPERATOR(pg_catalog.!=) ALL (ARRAY[");
> +            else if (schema_is_exclude == TRI_NO)
> +                appendPQExpBufferStr(&catalog_query,
> +                                    " OPERATOR(pg_catalog.=) ANY (ARRAY[");
> +
> +            for (cell = schemas->head; cell != NULL; cell = cell->next)
> +            {
> +                appendStringLiteralConn(&catalog_query, cell->val, conn);
> +
> +                if (cell->next != NULL)
> +                    appendPQExpBufferStr(&catalog_query, ", ");
> +            }
> +
> +            /* Finish formatting schema filter */
> +            appendPQExpBufferStr(&catalog_query, "]::pg_catalog.regnamespace[])\n");
> +        }
>
> IMO we should use a CTE for specified schemas like we do for the specified
> tables.  I wonder if we could even have a mostly-shared CTE code path for
> all vacuumdb commands with a list of names.

Fixed


> -    /*
> -     * If no tables were listed, filter for the relevant relation types.  If
> -     * tables were given via --table, don't bother filtering by relation type.
> -     * Instead, let the server decide whether a given relation can be
> -     * processed in which case the user will know about it.
> -     */
> -    if (!tables_listed)
> +    else
>      {
> +        /*
> +         * If no tables were listed, filter for the relevant relation types.  If
> +         * tables were given via --table, don't bother filtering by relation type.
> +         * Instead, let the server decide whether a given relation can be
> +         * processed in which case the user will know about it.
> +         */
> nitpick: This change seems unnecessary.

Fixed


Thanks for the review, all these changes are available in new version v6
of the patch and attached here.


Best regards,

-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Justin Pryzby
Date:
On Wed, Apr 06, 2022 at 07:43:42PM +0200, Gilles Darold wrote:
> Thanks for the review, all these changes are available in new version v6
> of the patch and attached here.

This is failing in CI (except on macos, which is strangely passing).
http://cfbot.cputube.org/gilles-darold.html

https://api.cirrus-ci.com/v1/artifact/task/5379693443547136/log/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb

not ok 59 - vacuumdb --schema "Foo" postgres exit code 0

#   Failed test 'vacuumdb --schema "Foo" postgres exit code 0'
#   at t/100_vacuumdb.pl line 151.
not ok 60 - vacuumdb --schema schema only: SQL found in server log

#   Failed test 'vacuumdb --schema schema only: SQL found in server log'
#   at t/100_vacuumdb.pl line 151.
#                   '2022-04-06 18:15:36.313 UTC [34857][not initialized] [[unknown]][:0] LOG:  connection received:
host=[local]
# 2022-04-06 18:15:36.314 UTC [34857][client backend] [[unknown]][3/2801:0] LOG:  connection authorized: user=postgres
database=postgresapplication_name=100_vacuumdb.pl
 
# 2022-04-06 18:15:36.318 UTC [34857][client backend] [100_vacuumdb.pl][3/2802:0] LOG:  statement: SELECT
pg_catalog.set_config('search_path','', false);
 
# 2022-04-06 18:15:36.586 UTC [34857][client backend] [100_vacuumdb.pl][:0] LOG:  disconnection: session time:
0:00:00.273user=postgres database=postgres host=[local]
 
# '
#     doesn't match '(?^:VACUUM "Foo".bar)'



Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 08/04/2022 à 02:46, Justin Pryzby a écrit :
> On Wed, Apr 06, 2022 at 07:43:42PM +0200, Gilles Darold wrote:
>> Thanks for the review, all these changes are available in new version v6
>> of the patch and attached here.
> This is failing in CI (except on macos, which is strangely passing).
> http://cfbot.cputube.org/gilles-darold.html
>
>
https://api.cirrus-ci.com/v1/artifact/task/5379693443547136/log/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb
>
> not ok 59 - vacuumdb --schema "Foo" postgres exit code 0
>
> #   Failed test 'vacuumdb --schema "Foo" postgres exit code 0'
> #   at t/100_vacuumdb.pl line 151.
> not ok 60 - vacuumdb --schema schema only: SQL found in server log
>
> #   Failed test 'vacuumdb --schema schema only: SQL found in server log'
> #   at t/100_vacuumdb.pl line 151.
> #                   '2022-04-06 18:15:36.313 UTC [34857][not initialized] [[unknown]][:0] LOG:  connection received:
host=[local]
> # 2022-04-06 18:15:36.314 UTC [34857][client backend] [[unknown]][3/2801:0] LOG:  connection authorized:
user=postgresdatabase=postgres application_name=100_vacuumdb.pl
 
> # 2022-04-06 18:15:36.318 UTC [34857][client backend] [100_vacuumdb.pl][3/2802:0] LOG:  statement: SELECT
pg_catalog.set_config('search_path','', false);
 
> # 2022-04-06 18:15:36.586 UTC [34857][client backend] [100_vacuumdb.pl][:0] LOG:  disconnection: session time:
0:00:00.273user=postgres database=postgres host=[local]
 
> # '
> #     doesn't match '(?^:VACUUM "Foo".bar)'

I'm surprised because make check do do not reports errors running on an 
Ubuntu 20.04 and CentOs 8:


t/010_clusterdb.pl ........ ok
t/011_clusterdb_all.pl .... ok
t/020_createdb.pl ......... ok
t/040_createuser.pl ....... ok
t/050_dropdb.pl ........... ok
t/070_dropuser.pl ......... ok
t/080_pg_isready.pl ....... ok
t/090_reindexdb.pl ........ ok
t/091_reindexdb_all.pl .... ok
t/100_vacuumdb.pl ......... ok
t/101_vacuumdb_all.pl ..... ok
t/102_vacuumdb_stages.pl .. ok
t/200_connstr.pl .......... ok
All tests successful.
Files=13, Tests=233, 17 wallclock secs ( 0.09 usr  0.02 sys + 6.63 cusr  
2.68 csys =  9.42 CPU)
Result: PASS


In tmp_check/log/regress_log_100_vacuumdb:

# Running: vacuumdb --schema "Foo" postgres
vacuumdb: vacuuming database "postgres"
ok 59 - vacuumdb --schema "Foo" postgres exit code 0
ok 60 - vacuumdb --schema schema only: SQL found in server log

In PG log:

2022-04-08 11:01:44.519 CEST [17223] 100_vacuumdb.pl LOG: statement: 
RESET search_path;
2022-04-08 11:01:44.519 CEST [17223] 100_vacuumdb.pl LOG: statement: 
WITH listed_objects (object_oid, column_list) AS (
           VALUES ('"Foo"'::pg_catalog.regnamespace::pg_catalog.oid, 
NULL::pg_catalog.text)
         )
         SELECT c.relname, ns.nspname, listed_objects.column_list FROM 
pg_catalog.pg_class c
          JOIN pg_catalog.pg_namespace ns ON c.relnamespace 
OPERATOR(pg_catalog.=) ns.oid
          LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid 
OPERATOR(pg_catalog.=) t.oid
          JOIN listed_objects ON listed_objects.object_oid 
OPERATOR(pg_catalog.=) ns.oid
          WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
          ORDER BY c.relpages DESC;
2022-04-08 11:01:44.521 CEST [17223] 100_vacuumdb.pl LOG: statement: 
SELECT pg_catalog.set_config('search_path', '', false);
2022-04-08 11:01:44.521 CEST [17223] 100_vacuumdb.pl LOG: statement: 
VACUUM "Foo".bar;

And if I run the command manually:

$ /usr/local/pgsql/bin/vacuumdb -e -h localhost --schema '"Foo"' -d 
contrib_regress -U postgres
SELECT pg_catalog.set_config('search_path', '', false);
vacuumdb: vacuuming database "contrib_regress"
RESET search_path;
WITH listed_objects (object_oid, column_list) AS (
   VALUES ('"Foo"'::pg_catalog.regnamespace::pg_catalog.oid, 
NULL::pg_catalog.text)
)
SELECT c.relname, ns.nspname, listed_objects.column_list FROM 
pg_catalog.pg_class c
  JOIN pg_catalog.pg_namespace ns ON c.relnamespace 
OPERATOR(pg_catalog.=) ns.oid
  LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid 
OPERATOR(pg_catalog.=) t.oid
  JOIN listed_objects ON listed_objects.object_oid 
OPERATOR(pg_catalog.=) ns.oid
  WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
  ORDER BY c.relpages DESC;
SELECT pg_catalog.set_config('search_path', '', false);

VACUUM "Foo".bar;


$ echo $?
0

I don't know what happen on cfbot, investigating...


-- 
Gilles Darold




Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 08/04/2022 à 02:46, Justin Pryzby a écrit :
> On Wed, Apr 06, 2022 at 07:43:42PM +0200, Gilles Darold wrote:
>> Thanks for the review, all these changes are available in new version v6
>> of the patch and attached here.
> This is failing in CI (except on macos, which is strangely passing).
> http://cfbot.cputube.org/gilles-darold.html
>
>
https://api.cirrus-ci.com/v1/artifact/task/5379693443547136/log/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb
>
> not ok 59 - vacuumdb --schema "Foo" postgres exit code 0
>
> #   Failed test 'vacuumdb --schema "Foo" postgres exit code 0'
> #   at t/100_vacuumdb.pl line 151.
> not ok 60 - vacuumdb --schema schema only: SQL found in server log
>
> #   Failed test 'vacuumdb --schema schema only: SQL found in server log'
> #   at t/100_vacuumdb.pl line 151.
> #                   '2022-04-06 18:15:36.313 UTC [34857][not initialized] [[unknown]][:0] LOG:  connection received:
host=[local]
> # 2022-04-06 18:15:36.314 UTC [34857][client backend] [[unknown]][3/2801:0] LOG:  connection authorized:
user=postgresdatabase=postgres application_name=100_vacuumdb.pl
 
> # 2022-04-06 18:15:36.318 UTC [34857][client backend] [100_vacuumdb.pl][3/2802:0] LOG:  statement: SELECT
pg_catalog.set_config('search_path','', false);
 
> # 2022-04-06 18:15:36.586 UTC [34857][client backend] [100_vacuumdb.pl][:0] LOG:  disconnection: session time:
0:00:00.273user=postgres database=postgres host=[local]
 
> # '
> #     doesn't match '(?^:VACUUM "Foo".bar)'


Ok, got it with the help of rjuju. Actually it was compiling well using 
gcc but clang give some warnings. A fix of these warning makes CI happy.


Attached v7 of the patch that should pass cfbot.

-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Fri, Apr 08, 2022 at 05:16:06PM +0200, Gilles Darold wrote:
> Attached v7 of the patch that should pass cfbot.

Thanks for the new patch!  Unfortunately, it looks like some recent changes
have broken it again.

> +enum trivalue schema_is_exclude = TRI_DEFAULT;
> +
> +/*
> + * The kind of object filter to use. '0': none, 'n': schema, 't': table
> + * these values correspond to the -n | -N and -t command line options.
> + */
> +char objfilter = '0';

I think these should be combined into a single enum for simplicity and
readability (e.g., OBJFILTER_NONE, OBJFILTER_INCLUDE_SCHEMA,
OBJFILTER_EXCLUDE_SCHEMA, OBJFILTER_TABLE).

>       * Instead, let the server decide whether a given relation can be
>       * processed in which case the user will know about it.
>       */
> -    if (!tables_listed)
> +    if (!objects_listed || objfilter == 'n')
>      {
>          appendPQExpBufferStr(&catalog_query, " WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array["
>                               CppAsString2(RELKIND_RELATION) ", "

I think this deserveѕ a comment.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 11/04/2022 à 20:37, Nathan Bossart a écrit :
> On Fri, Apr 08, 2022 at 05:16:06PM +0200, Gilles Darold wrote:
>> Attached v7 of the patch that should pass cfbot.
> Thanks for the new patch!  Unfortunately, it looks like some recent changes
> have broken it again.
>
>> +enum trivalue schema_is_exclude = TRI_DEFAULT;
>> +
>> +/*
>> + * The kind of object filter to use. '0': none, 'n': schema, 't': table
>> + * these values correspond to the -n | -N and -t command line options.
>> + */
>> +char objfilter = '0';
> I think these should be combined into a single enum for simplicity and
> readability (e.g., OBJFILTER_NONE, OBJFILTER_INCLUDE_SCHEMA,
> OBJFILTER_EXCLUDE_SCHEMA, OBJFILTER_TABLE).
>
>>       * Instead, let the server decide whether a given relation can be
>>       * processed in which case the user will know about it.
>>       */
>> -    if (!tables_listed)
>> +    if (!objects_listed || objfilter == 'n')
>>      {
>>          appendPQExpBufferStr(&catalog_query, " WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array["
>>                               CppAsString2(RELKIND_RELATION) ", "
> I think this deserveѕ a comment.
>

Attached v8 of the patch that tries to address the remarks above, fixes
patch apply failure to master and replace calls to pg_log_error+exit
with pg_fatal.


.Thanks.

-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Thu, Apr 14, 2022 at 10:27:46PM +0200, Gilles Darold wrote:
> Attached v8 of the patch that tries to address the remarks above, fixes
> patch apply failure to master and replace calls to pg_log_error+exit
> with pg_fatal.

Thanks for the new patch.

> +enum trivalue schema_is_exclude = TRI_DEFAULT;
> +
> +/*
> + * The kind of object use in the command line filter.
> + *   OBJFILTER_NONE: no filter used
> + *   OBJFILTER_SCHEMA: -n | --schema or -N | --exclude-schema
> + *   OBJFILTER_TABLE: -t | --table
> + */
> +enum VacObjectFilter
> +{
> +    OBJFILTER_NONE,
> +    OBJFILTER_TABLE,
> +    OBJFILTER_SCHEMA
> +};
> +
> +enum VacObjectFilter objfilter = OBJFILTER_NONE;

I still think we ought to remove schema_is_exclude in favor of adding
OBJFILTER_SCHEMA_EXCLUDE to the enum.  I think that would simplify some of
the error handling and improve readability.  IMO we should add
OBJFILTER_ALL, too.

> -                    simple_string_list_append(&tables, optarg);
> +                    /* When filtering on schema name, filter by table is not allowed. */
> +                    if (schema_is_exclude != TRI_DEFAULT)
> +                        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +                    simple_string_list_append(&objects, optarg);
> +                    objfilter = OBJFILTER_TABLE;
>                      tbl_count++;
>                      break;
>                  }
> @@ -202,6 +224,32 @@ main(int argc, char *argv[])
>                                        &vacopts.parallel_workers))
>                      exit(1);
>                  break;
> +            case 'n':            /* include schema(s) */
> +                {
> +                    /* When filtering on schema name, filter by table is not allowed. */
> +                    if (tbl_count)
> +                        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +
> +                    if (schema_is_exclude == TRI_YES)
> +                        pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same
time");
> +                    simple_string_list_append(&objects, optarg);
> +                    objfilter = OBJFILTER_SCHEMA;
> +                    schema_is_exclude = TRI_NO;
> +                    break;
> +                }
> +            case 'N':            /* exclude schema(s) */
> +                {
> +                    /* When filtering on schema name, filter by table is not allowed. */
> +                    if (tbl_count)
> +                        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +                    if (schema_is_exclude == TRI_NO)
> +                        pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same
time");
> +
> +                    simple_string_list_append(&objects, optarg);
> +                    objfilter = OBJFILTER_SCHEMA;
> +                    schema_is_exclude = TRI_YES;
> +                    break;

I was expecting these to check objfilter.  For example:

    case 'N':
        {
            if (objfilter == OBJFILTER_TABLE)
                pg_fatal("...");
            else if (objfilter == OBJFILTER_SCHEMA)
                pg_fatal("...");
            else if (objfilter == OBJFILTER_ALL)
                pg_fatal("...");

            simple_string_list_append(&objects, optarg);
            objfilter = OBJFILTER_SCHEMA_EXCLUDE;
            break;
        }

Another possible improvement could be to move the pg_fatal() calls to a
helper function that generates the message based on the current objfilter
setting and the current option.  I'm envisioning something like
check_objfilter(VacObjFilter curr_objfilter, VacObjFilter curr_option).

> +    /*
> +     * When filtering on schema name, filter by table is not allowed.
> +     * The schema name can already be set to a fqdn table name.
> +     */
> +    if (tbl_count && objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
> +        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");

Isn't this redundant with the error in the option handling?

> -        if (tables.head != NULL)
> +        if (objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
> +        {
> +            if (schema_is_exclude == TRI_YES)
> +                pg_fatal("cannot exclude from vacuum specific schema(s) in all databases");
> +            else if (schema_is_exclude == TRI_NO)
> +                pg_fatal("cannot vacuum specific schema(s) in all databases");
> +        }
> +
> +        if (objfilter == OBJFILTER_TABLE && objects.head != NULL)
>              pg_fatal("cannot vacuum specific table(s) in all databases");

I think we could move all these into check_objfilter(), too.

nitpick: Why do we need to check that objects.head is not NULL?  Isn't the
objfilter check enough?

>      /*
> -     * If no tables were listed, filter for the relevant relation types.  If
> -     * tables were given via --table, don't bother filtering by relation type.
> -     * Instead, let the server decide whether a given relation can be
> -     * processed in which case the user will know about it.
> +     * If no tables were listed or that a filter by schema is used, filter
> +     * for the relevant relation types.  If tables were given via --table,
> +     * don't bother filtering by relation type.  Instead, let the server
> +     * decide whether a given relation can be processed in which case the
> +     * user will know about it.  If there is a filter by schema the use of
> +     * --table is not possible so we have to filter by relation type too.
>       */
> -    if (!tables_listed)
> +    if (!objects_listed || objfilter == OBJFILTER_SCHEMA)

Do we need to check for objects_listed here?  IIUC we can just check for
objfilter != OBJFILTER_TABLE.

Unless I'm missing something, schema_is_exclude appears to only be used for
error checking and doesn't impact the generated catalog query.  It looks
like the relevant logic disappeared after v4 of the patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 18/04/2022 à 23:56, Nathan Bossart a écrit :
> On Thu, Apr 14, 2022 at 10:27:46PM +0200, Gilles Darold wrote:
>> Attached v8 of the patch that tries to address the remarks above, fixes
>> patch apply failure to master and replace calls to pg_log_error+exit
>> with pg_fatal.
> Thanks for the new patch.
>
>> +enum trivalue schema_is_exclude = TRI_DEFAULT;
>> +
>> +/*
>> + * The kind of object use in the command line filter.
>> + *   OBJFILTER_NONE: no filter used
>> + *   OBJFILTER_SCHEMA: -n | --schema or -N | --exclude-schema
>> + *   OBJFILTER_TABLE: -t | --table
>> + */
>> +enum VacObjectFilter
>> +{
>> +    OBJFILTER_NONE,
>> +    OBJFILTER_TABLE,
>> +    OBJFILTER_SCHEMA
>> +};
>> +
>> +enum VacObjectFilter objfilter = OBJFILTER_NONE;
> I still think we ought to remove schema_is_exclude in favor of adding
> OBJFILTER_SCHEMA_EXCLUDE to the enum.  I think that would simplify some of
> the error handling and improve readability.  IMO we should add
> OBJFILTER_ALL, too.

Fixed.


>> -                    simple_string_list_append(&tables, optarg);
>> +                    /* When filtering on schema name, filter by table is not allowed. */
>> +                    if (schema_is_exclude != TRI_DEFAULT)
>> +                        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
>> +                    simple_string_list_append(&objects, optarg);
>> +                    objfilter = OBJFILTER_TABLE;
>>                       tbl_count++;
>>                       break;
>>                   }
>> @@ -202,6 +224,32 @@ main(int argc, char *argv[])
>>                                         &vacopts.parallel_workers))
>>                       exit(1);
>>                   break;
>> +            case 'n':            /* include schema(s) */
>> +                {
>> +                    /* When filtering on schema name, filter by table is not allowed. */
>> +                    if (tbl_count)
>> +                        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
>> +
>> +                    if (schema_is_exclude == TRI_YES)
>> +                        pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same
time");
>> +                    simple_string_list_append(&objects, optarg);
>> +                    objfilter = OBJFILTER_SCHEMA;
>> +                    schema_is_exclude = TRI_NO;
>> +                    break;
>> +                }
>> +            case 'N':            /* exclude schema(s) */
>> +                {
>> +                    /* When filtering on schema name, filter by table is not allowed. */
>> +                    if (tbl_count)
>> +                        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
>> +                    if (schema_is_exclude == TRI_NO)
>> +                        pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same
time");
>> +
>> +                    simple_string_list_append(&objects, optarg);
>> +                    objfilter = OBJFILTER_SCHEMA;
>> +                    schema_is_exclude = TRI_YES;
>> +                    break;
> I was expecting these to check objfilter.

Fixed.


> Another possible improvement could be to move the pg_fatal() calls to a
> helper function that generates the message based on the current objfilter
> setting and the current option.  I'm envisioning something like
> check_objfilter(VacObjFilter curr_objfilter, VacObjFilter curr_option).

I agree, done.


>> +    /*
>> +     * When filtering on schema name, filter by table is not allowed.
>> +     * The schema name can already be set to a fqdn table name.
>> +     */
>> +    if (tbl_count && objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
>> +        pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> Isn't this redundant with the error in the option handling?

Fixed.


>> -        if (tables.head != NULL)
>> +        if (objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
>> +        {
>> +            if (schema_is_exclude == TRI_YES)
>> +                pg_fatal("cannot exclude from vacuum specific schema(s) in all databases");
>> +            else if (schema_is_exclude == TRI_NO)
>> +                pg_fatal("cannot vacuum specific schema(s) in all databases");
>> +        }
>> +
>> +        if (objfilter == OBJFILTER_TABLE && objects.head != NULL)
>>               pg_fatal("cannot vacuum specific table(s) in all databases");
> I think we could move all these into check_objfilter(), too.
>
> nitpick: Why do we need to check that objects.head is not NULL?  Isn't the
> objfilter check enough?

Done.


>>       /*
>> -     * If no tables were listed, filter for the relevant relation types.  If
>> -     * tables were given via --table, don't bother filtering by relation type.
>> -     * Instead, let the server decide whether a given relation can be
>> -     * processed in which case the user will know about it.
>> +     * If no tables were listed or that a filter by schema is used, filter
>> +     * for the relevant relation types.  If tables were given via --table,
>> +     * don't bother filtering by relation type.  Instead, let the server
>> +     * decide whether a given relation can be processed in which case the
>> +     * user will know about it.  If there is a filter by schema the use of
>> +     * --table is not possible so we have to filter by relation type too.
>>        */
>> -    if (!tables_listed)
>> +    if (!objects_listed || objfilter == OBJFILTER_SCHEMA)
> Do we need to check for objects_listed here?  IIUC we can just check for
> objfilter != OBJFILTER_TABLE.

Yes we need it otherwise test 'vacuumdb with view' fail because we are 
not trying to vacuum the view so the PG doesn't report:

     WARNING:  cannot vacuum non-tables or special system tables


> Unless I'm missing something, schema_is_exclude appears to only be used for
> error checking and doesn't impact the generated catalog query.  It looks
> like the relevant logic disappeared after v4 of the patch.

Right, removed.


New patch attached v9.


-- 
Gilles Darold

Attachment

Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
Thanks for the new patch!  I think this is on the right track.

On Wed, Apr 20, 2022 at 05:15:02PM +0200, Gilles Darold wrote:
> Le 18/04/2022 à 23:56, Nathan Bossart a écrit :
>> > -    if (!tables_listed)
>> > +    if (!objects_listed || objfilter == OBJFILTER_SCHEMA)
>> Do we need to check for objects_listed here?  IIUC we can just check for
>> objfilter != OBJFILTER_TABLE.
> 
> Yes we need it otherwise test 'vacuumdb with view' fail because we are not
> trying to vacuum the view so the PG doesn't report:
> 
>     WARNING:  cannot vacuum non-tables or special system tables

My point is that the only time we don't want to filter for relevant
relation types is when the user provides a list of tables.  So my
suggestion would be to simplify this to the following:

    if (objfilter != OBJFILTER_TABLE)
    {
        appendPQExpBufferStr(...);
        has_where = true;
    }

>> Unless I'm missing something, schema_is_exclude appears to only be used for
>> error checking and doesn't impact the generated catalog query.  It looks
>> like the relevant logic disappeared after v4 of the patch.
> 
> Right, removed.

I don't think -N works at the moment.  I tested it out, and vacuumdb was
still processing tables in schemas I excluded.  Can we add a test case for
this, too?

> +/*
> + * Verify that the filters used at command line are compatible
> + */
> +void
> +check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter curr_option)
> +{
> +    switch (curr_option)
> +    {
> +        case OBJFILTER_NONE:
> +            break;
> +        case OBJFILTER_DATABASE:
> +            /* When filtering on database name, vacuum on all database is not allowed. */
> +            if (curr_objfilter == OBJFILTER_ALL)
> +                pg_fatal("cannot vacuum all databases and a specific one at the same time");
> +            break;
> +        case OBJFILTER_ALL:
> +            /* When vacuuming all database, filter on database name is not allowed. */
> +            if (curr_objfilter == OBJFILTER_DATABASE)
> +                pg_fatal("cannot vacuum all databases and a specific one at the same time");
> +            /* When vacuuming all database, filter on schema name is not allowed. */
> +            if (curr_objfilter == OBJFILTER_SCHEMA)
> +                pg_fatal("cannot vacuum specific schema(s) in all databases");
> +            /* When vacuuming all database, schema exclusion is not allowed. */
> +            if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> +                pg_fatal("cannot exclude from vacuum specific schema(s) in all databases");
> +            /* When vacuuming all database, filter on table name is not allowed. */
> +            if (curr_objfilter == OBJFILTER_TABLE)
> +                pg_fatal("cannot vacuum specific table(s) in all databases");
> +            break;
> +        case OBJFILTER_TABLE:
> +            /* When filtering on table name, filter by schema is not allowed. */
> +            if (curr_objfilter == OBJFILTER_SCHEMA)
> +                pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +            /* When filtering on table name, schema exclusion is not allowed. */
> +            if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> +                pg_fatal("cannot vacuum specific table(s) and exclude specific schema(s) at the same time");
> +            break;
> +        case OBJFILTER_SCHEMA:
> +            /* When filtering on schema name, filter by table is not allowed. */
> +            if (curr_objfilter == OBJFILTER_TABLE)
> +                pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +            /* When filtering on schema name, schema exclusion is not allowed. */
> +            if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> +                pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same time");
> +            /* filtering on schema name can not be use on all database. */
> +            if (curr_objfilter == OBJFILTER_ALL)
> +                pg_fatal("cannot vacuum specific schema(s) in all databases");
> +            break;
> +        case OBJFILTER_SCHEMA_EXCLUDE:
> +            /* When filtering on schema exclusion, filter by table is not allowed. */
> +            if (curr_objfilter == OBJFILTER_TABLE)
> +                pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> +            /* When filetring on schema exclusion, filter by schema is not allowed. */
> +            if (curr_objfilter == OBJFILTER_SCHEMA)
> +                pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same time");
> +            break;
> +    }
> +}

I don't think this handles all combinations.  For example, the following
command does not fail:

    vacuumdb -a -N test postgres

Furthermore, do you think it'd be possible to dynamically generate the
message?  If it doesn't add too much complexity, this might be a nice way
to simplify this function.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Justin Pryzby
Date:
On Wed, Apr 20, 2022 at 10:38:46AM -0700, Nathan Bossart wrote:
> > +void
> > +check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter curr_option)
> > +{
> > +    switch (curr_option)
> > +    {
> > +        case OBJFILTER_NONE:
> > +            break;
> > +        case OBJFILTER_DATABASE:
> > +            /* When filtering on database name, vacuum on all database is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_ALL)
> > +                pg_fatal("cannot vacuum all databases and a specific one at the same time");
> > +            break;
> > +        case OBJFILTER_ALL:
> > +            /* When vacuuming all database, filter on database name is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_DATABASE)
> > +                pg_fatal("cannot vacuum all databases and a specific one at the same time");
> > +            /* When vacuuming all database, filter on schema name is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_SCHEMA)
> > +                pg_fatal("cannot vacuum specific schema(s) in all databases");
> > +            /* When vacuuming all database, schema exclusion is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> > +                pg_fatal("cannot exclude from vacuum specific schema(s) in all databases");
> > +            /* When vacuuming all database, filter on table name is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_TABLE)
> > +                pg_fatal("cannot vacuum specific table(s) in all databases");
> > +            break;
> > +        case OBJFILTER_TABLE:
> > +            /* When filtering on table name, filter by schema is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_SCHEMA)
> > +                pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> > +            /* When filtering on table name, schema exclusion is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> > +                pg_fatal("cannot vacuum specific table(s) and exclude specific schema(s) at the same time");
> > +            break;
> > +        case OBJFILTER_SCHEMA:
> > +            /* When filtering on schema name, filter by table is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_TABLE)
> > +                pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> > +            /* When filtering on schema name, schema exclusion is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_SCHEMA_EXCLUDE)
> > +                pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same
time");
> > +            /* filtering on schema name can not be use on all database. */
> > +            if (curr_objfilter == OBJFILTER_ALL)
> > +                pg_fatal("cannot vacuum specific schema(s) in all databases");
> > +            break;
> > +        case OBJFILTER_SCHEMA_EXCLUDE:
> > +            /* When filtering on schema exclusion, filter by table is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_TABLE)
> > +                pg_fatal("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
> > +            /* When filetring on schema exclusion, filter by schema is not allowed. */
> > +            if (curr_objfilter == OBJFILTER_SCHEMA)
> > +                pg_fatal("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same
time");
> > +            break;
> > +    }
> > +}
> 
> I don't think this handles all combinations.  For example, the following
> command does not fail:
> 
>     vacuumdb -a -N test postgres
> 
> Furthermore, do you think it'd be possible to dynamically generate the
> message?

Not in the obvious way, because that breaks translatability.

-- 
Justin



Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Wed, Apr 20, 2022 at 12:40:52PM -0500, Justin Pryzby wrote:
> On Wed, Apr 20, 2022 at 10:38:46AM -0700, Nathan Bossart wrote:
>> Furthermore, do you think it'd be possible to dynamically generate the
>> message?
> 
> Not in the obvious way, because that breaks translatability.

Ah, right.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
Le 20/04/2022 à 19:38, Nathan Bossart a écrit :
Thanks for the new patch!  I think this is on the right track.

On Wed, Apr 20, 2022 at 05:15:02PM +0200, Gilles Darold wrote:
Le 18/04/2022 à 23:56, Nathan Bossart a écrit :
-	if (!tables_listed)
+	if (!objects_listed || objfilter == OBJFILTER_SCHEMA)
Do we need to check for objects_listed here?  IIUC we can just check for
objfilter != OBJFILTER_TABLE.
Yes we need it otherwise test 'vacuumdb with view' fail because we are not
trying to vacuum the view so the PG doesn't report:

    WARNING:  cannot vacuum non-tables or special system tables
My point is that the only time we don't want to filter for relevant
relation types is when the user provides a list of tables.  So my
suggestion would be to simplify this to the following:
	if (objfilter != OBJFILTER_TABLE)	{		appendPQExpBufferStr(...);		has_where = true;	}


Right, I must have gotten mixed up in the test results. Fixed.


Unless I'm missing something, schema_is_exclude appears to only be used for
error checking and doesn't impact the generated catalog query.  It looks
like the relevant logic disappeared after v4 of the patch.
Right, removed.
I don't think -N works at the moment.  I tested it out, and vacuumdb was
still processing tables in schemas I excluded.  Can we add a test case for
this, too?


Fixed and regression tests added as well as some others to test the filter options compatibility.


+/*
+ * Verify that the filters used at command line are compatible
+ */
+void
+check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter curr_option)
+{
+	switch (curr_option)
+	{
+		case OBJFILTER_NONE:
+			break;
+		case OBJFILTER_DATABASE:
+			/* When filtering on database name, vacuum on all database is not allowed. */
+			if (curr_objfilter == OBJFILTER_ALL)
+				pg_fatal("cannot vacuum all databases and a specific one at the same time");
+			break;
[...]
+	}
+}
I don't think this handles all combinations.  For example, the following
command does not fail:
	vacuumdb -a -N test postgres


Right, I have fix them all in this new patch.


Furthermore, do you think it'd be possible to dynamically generate the
message?  If it doesn't add too much complexity, this might be a nice way
to simplify this function.


I have tried to avoid reusing the same error message several time by using a new enum and function filter_error(). I also use the same messages with --schema and --exclude-schema related errors.


Patch v10 attached.


-- 
Gilles Darold
Attachment

Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Fri, Apr 22, 2022 at 11:57:05AM +0200, Gilles Darold wrote:
> Patch v10 attached.

Thanks!  I've attached a v11 with some minor editorialization.  I think I
was able to improve the error handling for invalid combinations of
command-line options a bit, but please let me know what you think.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Fri, Apr 22, 2022 at 10:57:46PM -0700, Nathan Bossart wrote:
> On Fri, Apr 22, 2022 at 11:57:05AM +0200, Gilles Darold wrote:
>> Patch v10 attached.
> 
> Thanks!  I've attached a v11 with some minor editorialization.  I think I
> was able to improve the error handling for invalid combinations of
> command-line options a bit, but please let me know what you think.

I've attached a v12 of the patch that further simplifieѕ check_objfilter().
Apologies for the noise.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: [Proposal] vacuumdb --schema only

From
Gilles Darold
Date:
goo
Le 25/04/2022 à 03:27, Nathan Bossart a écrit :
> On Fri, Apr 22, 2022 at 10:57:46PM -0700, Nathan Bossart wrote:
>> On Fri, Apr 22, 2022 at 11:57:05AM +0200, Gilles Darold wrote:
>>> Patch v10 attached.
>> Thanks!  I've attached a v11 with some minor editorialization.  I think I
>> was able to improve the error handling for invalid combinations of
>> command-line options a bit, but please let me know what you think.
> I've attached a v12 of the patch that further simplifieѕ check_objfilter().
> Apologies for the noise.
>

Looks good for me, there is a failure on cfbot on FreeBSD but I have run
a CI with latest master and it pass.


Can I change the commitfest status to ready for committers?


-- 
Gilles Darold




Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Mon, Apr 25, 2022 at 08:50:09AM +0200, Gilles Darold wrote:
> Can I change the commitfest status to ready for committers?

I've marked it as ready-for-committer.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: [Proposal] vacuumdb --schema only

From
Michael Paquier
Date:
On Mon, Apr 25, 2022 at 09:18:53AM -0700, Nathan Bossart wrote:
> I've marked it as ready-for-committer.

The refactoring logic to build the queries is clear to follow.  I have
a few comments about the shape of the patch, though.

            case 'a':
-               alldb = true;
+               check_objfilter(OBJFILTER_ALL_DBS);
                break;
The cross-option checks are usually done after all the options
switches are check.  Why does this need to be different?  It does not
strike me as a huge problem to do one filter check at the end.

+void
+check_objfilter(VacObjFilter curr_option)
+{
+   objfilter |= curr_option;
+
+   if ((objfilter & OBJFILTER_ALL_DBS) &&
+       (objfilter & OBJFILTER_DATABASE))
+       pg_fatal("cannot vacuum all databases and a specific one at the same time");
The addition of more OBJFILTER_* (unlikely going to happen, but who
knows) would make it hard to know which option should not interact
with each other.  Wouldn't it be better to use a kind of compatibility
table for that?  As one OBJFILTER_* matches with one option, you could
simplify the number of strings in need of translation by switching to
an error message like "cannot use options %s and %s together", or
something like that?

+$node->command_fails(
+   [ 'vacuumdb', '-a', '-d', 'postgres' ],
+   'cannot use options -a and -d at the same time');
This set of tests had better use command_fails_like() to make sure
that the correct error patterns from check_objfilter() show up?
--
Michael

Attachment

Re: [Proposal] vacuumdb --schema only

From
Nathan Bossart
Date:
On Tue, Apr 26, 2022 at 11:36:02AM +0900, Michael Paquier wrote:
> The refactoring logic to build the queries is clear to follow.  I have
> a few comments about the shape of the patch, though.

Thanks for taking a look!

>             case 'a':
> -               alldb = true;
> +               check_objfilter(OBJFILTER_ALL_DBS);
>                 break;
> The cross-option checks are usually done after all the options
> switches are check.  Why does this need to be different?  It does not
> strike me as a huge problem to do one filter check at the end.

Makes sense.  I fixed this in v13.

> +void
> +check_objfilter(VacObjFilter curr_option)
> +{
> +   objfilter |= curr_option;
> +
> +   if ((objfilter & OBJFILTER_ALL_DBS) &&
> +       (objfilter & OBJFILTER_DATABASE))
> +       pg_fatal("cannot vacuum all databases and a specific one at the same time");
> The addition of more OBJFILTER_* (unlikely going to happen, but who
> knows) would make it hard to know which option should not interact
> with each other.  Wouldn't it be better to use a kind of compatibility
> table for that?  As one OBJFILTER_* matches with one option, you could
> simplify the number of strings in need of translation by switching to
> an error message like "cannot use options %s and %s together", or
> something like that?

I think this might actually make things more complicated.  In addition to
the compatibility table, we'd need to define the strings to use in the
error message somewhere.  I can give this a try if you feel strongly about
it.

> +$node->command_fails(
> +   [ 'vacuumdb', '-a', '-d', 'postgres' ],
> +   'cannot use options -a and -d at the same time');
> This set of tests had better use command_fails_like() to make sure
> that the correct error patterns from check_objfilter() show up?

Yes.  I did this in v13.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: [Proposal] vacuumdb --schema only

From
Andrew Dunstan
Date:
On 2022-04-26 Tu 00:46, Nathan Bossart wrote:
> On Tue, Apr 26, 2022 at 11:36:02AM +0900, Michael Paquier wrote:
>> The refactoring logic to build the queries is clear to follow.  I have
>> a few comments about the shape of the patch, though.
> Thanks for taking a look!
>
>>             case 'a':
>> -               alldb = true;
>> +               check_objfilter(OBJFILTER_ALL_DBS);
>>                 break;
>> The cross-option checks are usually done after all the options
>> switches are check.  Why does this need to be different?  It does not
>> strike me as a huge problem to do one filter check at the end.
> Makes sense.  I fixed this in v13.
>
>> +void
>> +check_objfilter(VacObjFilter curr_option)
>> +{
>> +   objfilter |= curr_option;
>> +
>> +   if ((objfilter & OBJFILTER_ALL_DBS) &&
>> +       (objfilter & OBJFILTER_DATABASE))
>> +       pg_fatal("cannot vacuum all databases and a specific one at the same time");
>> The addition of more OBJFILTER_* (unlikely going to happen, but who
>> knows) would make it hard to know which option should not interact
>> with each other.  Wouldn't it be better to use a kind of compatibility
>> table for that?  As one OBJFILTER_* matches with one option, you could
>> simplify the number of strings in need of translation by switching to
>> an error message like "cannot use options %s and %s together", or
>> something like that?
> I think this might actually make things more complicated.  In addition to
> the compatibility table, we'd need to define the strings to use in the
> error message somewhere.  I can give this a try if you feel strongly about
> it.
>
>> +$node->command_fails(
>> +   [ 'vacuumdb', '-a', '-d', 'postgres' ],
>> +   'cannot use options -a and -d at the same time');
>> This set of tests had better use command_fails_like() to make sure
>> that the correct error patterns from check_objfilter() show up?
> Yes.  I did this in v13.



committed.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com