Re: [Proposal] vacuumdb --schema only - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: [Proposal] vacuumdb --schema only
Date
Msg-id 20220330212258.GA2254@nathanxps13
Whole thread Raw
In response to Re: [Proposal] vacuumdb --schema only  (Robert Treat <rob@xzilla.net>)
Responses Re: [Proposal] vacuumdb --schema only
Re: [Proposal] vacuumdb --schema only
List pgsql-hackers
I took a look at the v4 patch.

'git-apply' complains about whitespace errors:

    0001-vacuumdb-schema-only-v4.patch:17: tab in indent.
           <arg choice="plain">
    0001-vacuumdb-schema-only-v4.patch:18: tab in indent.
            <arg choice="opt">
    0001-vacuumdb-schema-only-v4.patch:19: tab in indent.
             <group choice="plain">
    0001-vacuumdb-schema-only-v4.patch:20: tab in indent.
              <arg choice="plain"><option>-n</option></arg>
    0001-vacuumdb-schema-only-v4.patch:21: tab in indent.
              <arg choice="plain"><option>--schema</option></arg>
    warning: squelched 13 whitespace errors
    warning: 18 lines add whitespace errors.

+    printf(_("  -N, --exclude-schema=PATTERN    do NOT vacuum tables in the specified schema(s)\n"));

I'm personally -1 for the --exclude-schema option.  I don't see any
existing "exclude" options in vacuumdb, and the uses for such an option
seem rather limited.  If we can point to specific use-cases for this
option, I might be willing to change my vote.

+   <para>
+    To clean all tables in the <literal>Foo</literal> and <literal>bar</literal> schemas
+    only in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>vacuumdb --schema='"Foo"' --schema='bar' xyzzy</userinput>
+</screen></para>

nitpicks: I think the phrasing should be "To only clean tables in the...".
Also, is there any reason to use a schema name with a capital letter as an
example?  IMO that just adds unnecessary complexity to the example.

+$node->issues_sql_like(
+    [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+    qr/VACUUM "Foo".*/,
+    'vacuumdb --schema schema only');

IIUC there should only be one table in the schema.  Can we avoid matching
"*" and check for the exact command instead?

I think there should be a few more test cases.  For example, we should test
using -n and -N at the same time, and we should test what happens when
those options are used for missing schemas.

+    /*
+     * When filtering on schema name, filter by table is not allowed.
+     * The schema name can already be set to a fqdn table name.
+     */
+    if (tbl_count && (schemas.head != NULL))
+    {
+        pg_log_error("cannot vacuum all tables in schema(s) and specific table(s) at the same time");
+        exit(1);
+    }

I think there might be some useful refactoring we can do that would
simplify adding similar options in the future.  Specifically, can we have a
global variable that stores the type of vacuumdb command (e.g., all,
tables, or schemas)?  If so, perhaps the tables list could be renamed and
reused for schemas (and any other objects that need listing in the future).

+        if (schemas != NULL && schemas->head != NULL)
+        {
+            appendPQExpBufferStr(&catalog_query,
+                                 " AND c.relnamespace");
+            if (schema_is_exclude == TRI_YES)
+                appendPQExpBufferStr(&catalog_query,
+                                    " OPERATOR(pg_catalog.!=) ALL (ARRAY[");
+            else if (schema_is_exclude == TRI_NO)
+                appendPQExpBufferStr(&catalog_query,
+                                    " OPERATOR(pg_catalog.=) ANY (ARRAY[");
+
+            for (cell = schemas->head; cell != NULL; cell = cell->next)
+            {
+                appendStringLiteralConn(&catalog_query, cell->val, conn);
+
+                if (cell->next != NULL)
+                    appendPQExpBufferStr(&catalog_query, ", ");
+            }
+
+            /* Finish formatting schema filter */
+            appendPQExpBufferStr(&catalog_query, "]::pg_catalog.regnamespace[])\n");
+        }

IMO we should use a CTE for specified schemas like we do for the specified
tables.  I wonder if we could even have a mostly-shared CTE code path for
all vacuumdb commands with a list of names.

-    /*
-     * If no tables were listed, filter for the relevant relation types.  If
-     * tables were given via --table, don't bother filtering by relation type.
-     * Instead, let the server decide whether a given relation can be
-     * processed in which case the user will know about it.
-     */
-    if (!tables_listed)
+    else
     {
+        /*
+         * If no tables were listed, filter for the relevant relation types.  If
+         * tables were given via --table, don't bother filtering by relation type.
+         * Instead, let the server decide whether a given relation can be
+         * processed in which case the user will know about it.
+         */

nitpick: This change seems unnecessary.

I noticed upthread that there was some discussion around adding a way to
specify a schema in VACUUM and ANALYZE commands.  I think this patch is
useful even if such an option is eventually added, as we'll still want
vacuumdb to obtain the full list of tables to process so that it can
effectively parallelize.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Higher level questions around shared memory stats
Next
From: Robert Haas
Date:
Subject: Re: Correct docs re: rewriting indexes when table rewrite is skipped