Re: A few new options for vacuumdb - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: A few new options for vacuumdb
Date
Msg-id 20190107071146.GD22498@paquier.xyz
Whole thread Raw
In response to Re: A few new options for vacuumdb  ("Bossart, Nathan" <bossartn@amazon.com>)
List pgsql-hackers
On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote:
> Here's an updated set of patches with the following changes:
>
>  - 0002 adds the parenthesized syntax for ANALYZE.
>  - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
>  - 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
>  - 0004 alters vacuumdb to always use the catalog query to discover
>    the list of tables to process.
>  - 0003, 0005, and 0006 now fail in vacuum_one_database() if a
>    specified option is not available on the server version.
>  - If tables are listed along with --min-xid-age, --min-mxid-age, or
>    --min-relation-size, each table is processed only if it satisfies
>    the provided options.

I have been looking at the patch set, and 0001 can actually happen
only once 0005 is applied because this modifies the query doing on
HEAD a full scan of pg_class which would include at least catalog
tables so it can never be empty.  For this reason it seems to me that
0001 and 0004 should be merged together as common refactoring, making
sure on the way that all relations exist before processing anything.
As 0005 and 0006 change similar portions of the code, we could also
merge these together in a second patch introducing the new options.
Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I
would likely merge things when they make sense together to reduce
diff chunks.

0002 removes some carefully-written query generation, so as it is
never possible to generate something like ANALYZE FULL.  HEAD splits
ANALYZE and VACUUM clearly, but 0002 merges them together so as
careless coding in vacuumdb.c makes it easier to generate command
strings which would fail in the backend relying on the option checks
to make sure that for example combining --full and --analyze-only
never happens.  Introducing 0002 is mainly here for 0003, so let's
merge them together.

> 0004 introduces a slight change to existing behavior.  Currently, if
> you specify a missing table, vacuumdb will process each table until
> it reaches the nonexistent one, at which point it will fail.  After
> 0004 is applied, vacuumdb will fail during the catalog query, and no
> tables will be processed.  I considered avoiding this change in
> behavior by doing an additional catalog lookup for each specified
> table to see whether it satisfies --min-xid-age, etc.  However, this
> introduced quite a bit more complexity and increased the number of
> catalog queries needed.

Simplicity is always welcome, knowing that tables are missing before
doing any operations is more consistent with plain VACUUM/ANALYZE.

From patch 0004:
+       executeCommand(conn, "RESET search_path;", progname, echo);
+       res = executeQuery(conn, catalog_query.data, progname, echo);
+       termPQExpBuffer(&catalog_query);
+       PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL,
+                               progname, echo));
We should really avoid that.  There are other things like casts which
tend to be forgotten, and if the catalog lookup query gets more
complicated, I feel that this would again be forgotten, reintroducing
the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix.

I have put my hands into what you sent, and worked on putting
0002/0003 first into shape, finishing with the attached.  I have
simplified the query construction a bit, making it IMO easier to read
and to add new options, with comments documenting what's supported
across versions.  I have also added a regression test for
--analyze-only --skip-locked.  Then I have done some edit of the docs.
What do you think for this portion?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: [Proposal] Add accumulated statistics
Next
From: Michael Paquier
Date:
Subject: Re: [Sender Address Forgery]Re: error message when subscriptiontarget is a partitioned table