Thread: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases

While working on some other patches, I found myself wanting to use the
following command to vacuum the catalogs in all databases in a cluster:

    vacuumdb --all --schema pg_catalog

However, this presently fails with the following error:

    cannot vacuum specific schema(s) in all databases

AFAICT there no technical reason to block this, and the resulting behavior
feels intuitive to me, so I wrote 0001 to allow it.  0002 allows specifying
tables to process in all databases in clusterdb, and 0003 allows specifying
tables, indexes, schemas, or the system catalogs to process in all
databases in reindexdb.

I debated also allowing users to specify different types of objects in the
same command (e.g., "vacuumdb --schema myschema --table mytable"), but it
looked like this would require a more substantial rewrite, and I didn't
feel that the behavior was intuitive.  For the example I just gave, does
the user expect us to process both the "myschema" schema and the "mytable"
table, or does the user want us to process the "mytable" table in the
"myschema" schema?  In vacuumdb, this is already blocked, but reindexdb
accepts combinations of tables, schemas, and indexes (yet disallows
specifying --system along with other types of objects).  Since this is
inconsistent with vacuumdb and IMO ambiguous, I've restricted such
combinations in 0003.

Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
At Wed, 28 Jun 2023 16:24:02 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> While working on some other patches, I found myself wanting to use the
> following command to vacuum the catalogs in all databases in a cluster:
> 
>     vacuumdb --all --schema pg_catalog
> 
> However, this presently fails with the following error:
> 
>     cannot vacuum specific schema(s) in all databases
> 
> AFAICT there no technical reason to block this, and the resulting behavior
> feels intuitive to me, so I wrote 0001 to allow it.  0002 allows specifying
> tables to process in all databases in clusterdb, and 0003 allows specifying
> tables, indexes, schemas, or the system catalogs to process in all
> databases in reindexdb.

It seems like useful.

> I debated also allowing users to specify different types of objects in the
> same command (e.g., "vacuumdb --schema myschema --table mytable"), but it
> looked like this would require a more substantial rewrite, and I didn't
> feel that the behavior was intuitive.  For the example I just gave, does
> the user expect us to process both the "myschema" schema and the "mytable"
> table, or does the user want us to process the "mytable" table in the
> "myschema" schema?  In vacuumdb, this is already blocked, but reindexdb

I think spcyfying the two at once is inconsistent if we maintain the
current behavior of those options.

It seems to me that that change clearly modifies the functionality of
the options. As a result, those options look like restriction
filters. For example, "vacuumdb -s s1_* -t t1" will vacuum all table
named "t1" in all schemas matches "s1_*".

> accepts combinations of tables, schemas, and indexes (yet disallows
> specifying --system along with other types of objects).  Since this is
> inconsistent with vacuumdb and IMO ambiguous, I've restricted such
> combinations in 0003.
> 
> Thoughts?

While I think this is useful, primarily for system catalogs, I'm not
entirely convinced about its practicality to user objects. It's
difficult for me to imagine that a situation where all databases share
the same schema would be major.

Assuming this is used for user objects, it may be necessary to safely
exclude databases that lack the specified schema or table, provided
the object present in at least one other database. But the exclusion
should be done with printing some warnings.  It could also be
necessary to safely move to the next object when reindex or cluster
operation fails on a single object due to missing prerequisite
situations. But I don't think we might want to add such complexity to
these "script" tools.

So.. an alternative path might be to introduce a new option like
--syscatalog to specify system catalogs as the only option that can be
combined with --all. In doing so, we can leave the --table and
--schema options untouched.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Thanks for taking a look.

On Thu, Jun 29, 2023 at 02:16:26PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 28 Jun 2023 16:24:02 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
>> I debated also allowing users to specify different types of objects in the
>> same command (e.g., "vacuumdb --schema myschema --table mytable"), but it
>> looked like this would require a more substantial rewrite, and I didn't
>> feel that the behavior was intuitive.  For the example I just gave, does
>> the user expect us to process both the "myschema" schema and the "mytable"
>> table, or does the user want us to process the "mytable" table in the
>> "myschema" schema?  In vacuumdb, this is already blocked, but reindexdb
> 
> I think spcyfying the two at once is inconsistent if we maintain the
> current behavior of those options.
> 
> It seems to me that that change clearly modifies the functionality of
> the options. As a result, those options look like restriction
> filters. For example, "vacuumdb -s s1_* -t t1" will vacuum all table
> named "t1" in all schemas matches "s1_*".

Sorry, I'm not following.  I intentionally avoided allowing combinations of
--schema and --table in the patches I sent.  This is the current behavior
of vacuumdb.  Are you suggesting that they should be treated as restriction
filters?

> While I think this is useful, primarily for system catalogs, I'm not
> entirely convinced about its practicality to user objects. It's
> difficult for me to imagine that a situation where all databases share
> the same schema would be major.
> 
> Assuming this is used for user objects, it may be necessary to safely
> exclude databases that lack the specified schema or table, provided
> the object present in at least one other database. But the exclusion
> should be done with printing some warnings.  It could also be
> necessary to safely move to the next object when reindex or cluster
> operation fails on a single object due to missing prerequisite
> situations. But I don't think we might want to add such complexity to
> these "script" tools.

Perhaps we could add something like a --skip-missing option.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



At Thu, 29 Jun 2023 13:56:38 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> Thanks for taking a look.
> 
> On Thu, Jun 29, 2023 at 02:16:26PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 28 Jun 2023 16:24:02 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> >> I debated also allowing users to specify different types of objects in the
> >> same command (e.g., "vacuumdb --schema myschema --table mytable"), but it
> >> looked like this would require a more substantial rewrite, and I didn't
> >> feel that the behavior was intuitive.  For the example I just gave, does
> >> the user expect us to process both the "myschema" schema and the "mytable"
> >> table, or does the user want us to process the "mytable" table in the
> >> "myschema" schema?  In vacuumdb, this is already blocked, but reindexdb
> > 
> > I think spcyfying the two at once is inconsistent if we maintain the
> > current behavior of those options.
> > 
> > It seems to me that that change clearly modifies the functionality of
> > the options. As a result, those options look like restriction
> > filters. For example, "vacuumdb -s s1_* -t t1" will vacuum all table
> > named "t1" in all schemas matches "s1_*".
> 
> Sorry, I'm not following.  I intentionally avoided allowing combinations of
> --schema and --table in the patches I sent.  This is the current behavior
> of vacuumdb.  Are you suggesting that they should be treated as restriction
> filters?

No. I'm not suggesting. Just saying that they would look appear to
work as a restriction filters if those two options can be specified at
once.

> > While I think this is useful, primarily for system catalogs, I'm not
> > entirely convinced about its practicality to user objects. It's
> > difficult for me to imagine that a situation where all databases share
> > the same schema would be major.
> > 
> > Assuming this is used for user objects, it may be necessary to safely
> > exclude databases that lack the specified schema or table, provided
> > the object present in at least one other database. But the exclusion
> > should be done with printing some warnings.  It could also be
> > necessary to safely move to the next object when reindex or cluster
> > operation fails on a single object due to missing prerequisite
> > situations. But I don't think we might want to add such complexity to
> > these "script" tools.
> 
> Perhaps we could add something like a --skip-missing option.

But isn't it a bit too complicated for the gain?

I don't have a strong objection if we're fine with just allowing
"--all --schema=xxx", knowing that it will works cleanly only for
system catalogs.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Fri, Jun 30, 2023 at 12:05:17PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 29 Jun 2023 13:56:38 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
>> Sorry, I'm not following.  I intentionally avoided allowing combinations of
>> --schema and --table in the patches I sent.  This is the current behavior
>> of vacuumdb.  Are you suggesting that they should be treated as restriction
>> filters?
> 
> No. I'm not suggesting. Just saying that they would look appear to
> work as a restriction filters if those two options can be specified at
> once.

Got it, thanks for clarifying.

>> Perhaps we could add something like a --skip-missing option.
> 
> But isn't it a bit too complicated for the gain?
> 
> I don't have a strong objection if we're fine with just allowing
> "--all --schema=xxx", knowing that it will works cleanly only for
> system catalogs.

Okay.  I haven't scoped out what would be required to support a
--skip-missing option, but it doesn't sound too terribly complicated to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Mon, Oct 23, 2023 at 03:25:42PM -0500, Nathan Bossart wrote:
> rebased

I saw that this thread was referenced elsewhere [0], so I figured I'd take
a fresh look.  From a quick glance, I'd say 0001 and 0002 are pretty
reasonable and could probably be committed for v17.  0003 probably requires
some more attention.  If there is indeed interest in these changes, I'll
try to spend some more time on it.

[0] https://postgr.es/m/E0D2F0CE-D27C-49B1-902B-AD8D2427F07E%40yandex-team.ru

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Tue, 5 Mar 2024 at 02:22, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> I saw that this thread was referenced elsewhere [0], so I figured I'd take
> a fresh look.  From a quick glance, I'd say 0001 and 0002 are pretty
> reasonable and could probably be committed for v17.
>

I'm not sure how useful these changes are, but I don't really object.
You need to update the synopsis section of the docs though.

Regards,
Dean



On Tue, Mar 05, 2024 at 11:20:13PM +0000, Dean Rasheed wrote:
> I'm not sure how useful these changes are, but I don't really object.
> You need to update the synopsis section of the docs though.

Thanks for taking a look.  I updated the synopsis sections in v3.

I also spent some more time on the reindexdb patch (0003).  I previously
had decided to restrict combinations of tables, schemas, and indexes
because I felt it was "ambiguous and inconsistent with vacuumdb," but
looking closer, I think that's the wrong move.  reindexdb already supports
such combinations, which it interprets to mean it should reindex each
listed object.  So, I removed that change in v3.

Even though reindexdb allows combinations of tables, schema, and indexes,
it doesn't allow combinations of "system catalogs" and other objects, and
it's not clear why.  In v3, I've removed this restriction, which ended up
simplifying the 0003 patch a bit.  Like combinations of tables, schemas,
and indexes, reindexdb will now interpret combinations that include
--system to mean it should reindex each listed object as well as the system
catalogs.

Ideally, we'd allow similar combinations in vacuumdb, but I believe that
would require a much more invasive patch, and I've already spent far more
time on this change than I wanted to.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, 6 Mar 2024 at 22:22, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Thanks for taking a look.  I updated the synopsis sections in v3.

OK, that looks good. The vacuumdb synopsis in particular looks a lot
better now that "-N | --exclude-schema" is on its own line, because it
was hard to read previously, and easy to mistakenly think that -n
could be combined with -N.

If I'm nitpicking, "[--verbose | -v]" in the clusterdb synopsis should
be replaced with "[option...]", like the other commands, because there
are other general-purpose options like --quiet and --echo.

> I also spent some more time on the reindexdb patch (0003).  I previously
> had decided to restrict combinations of tables, schemas, and indexes
> because I felt it was "ambiguous and inconsistent with vacuumdb," but
> looking closer, I think that's the wrong move.  reindexdb already supports
> such combinations, which it interprets to mean it should reindex each
> listed object.  So, I removed that change in v3.

Makes sense.

> Even though reindexdb allows combinations of tables, schema, and indexes,
> it doesn't allow combinations of "system catalogs" and other objects, and
> it's not clear why.  In v3, I've removed this restriction, which ended up
> simplifying the 0003 patch a bit.  Like combinations of tables, schemas,
> and indexes, reindexdb will now interpret combinations that include
> --system to mean it should reindex each listed object as well as the system
> catalogs.

OK, that looks useful, especially given that most people will still
probably use this against a single database, and it's making that more
flexible.

I think this is good to go.

Regards,
Dean



On Fri, Mar 08, 2024 at 09:33:19AM +0000, Dean Rasheed wrote:
> If I'm nitpicking, "[--verbose | -v]" in the clusterdb synopsis should
> be replaced with "[option...]", like the other commands, because there
> are other general-purpose options like --quiet and --echo.

Good catch.  I fixed that in v4.  We could probably back-patch this
particular change, but since it's been this way for a while, I don't think
it's terribly important to do so.

> I think this is good to go.

Thanks.  In v4, I've added a first draft of the commit messages, and I am
planning to commit this early next week.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment
On Fri, Mar 08, 2024 at 04:03:22PM -0600, Nathan Bossart wrote:
> On Fri, Mar 08, 2024 at 09:33:19AM +0000, Dean Rasheed wrote:
>> I think this is good to go.
> 
> Thanks.  In v4, I've added a first draft of the commit messages, and I am
> planning to commit this early next week.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com