Thread: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
While working on some other patches, I found myself wanting to use the following command to vacuum the catalogs in all databases in a cluster: vacuumdb --all --schema pg_catalog However, this presently fails with the following error: cannot vacuum specific schema(s) in all databases AFAICT there no technical reason to block this, and the resulting behavior feels intuitive to me, so I wrote 0001 to allow it. 0002 allows specifying tables to process in all databases in clusterdb, and 0003 allows specifying tables, indexes, schemas, or the system catalogs to process in all databases in reindexdb. I debated also allowing users to specify different types of objects in the same command (e.g., "vacuumdb --schema myschema --table mytable"), but it looked like this would require a more substantial rewrite, and I didn't feel that the behavior was intuitive. For the example I just gave, does the user expect us to process both the "myschema" schema and the "mytable" table, or does the user want us to process the "mytable" table in the "myschema" schema? In vacuumdb, this is already blocked, but reindexdb accepts combinations of tables, schemas, and indexes (yet disallows specifying --system along with other types of objects). Since this is inconsistent with vacuumdb and IMO ambiguous, I've restricted such combinations in 0003. Thoughts? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Kyotaro Horiguchi
Date:
At Wed, 28 Jun 2023 16:24:02 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > While working on some other patches, I found myself wanting to use the > following command to vacuum the catalogs in all databases in a cluster: > > vacuumdb --all --schema pg_catalog > > However, this presently fails with the following error: > > cannot vacuum specific schema(s) in all databases > > AFAICT there no technical reason to block this, and the resulting behavior > feels intuitive to me, so I wrote 0001 to allow it. 0002 allows specifying > tables to process in all databases in clusterdb, and 0003 allows specifying > tables, indexes, schemas, or the system catalogs to process in all > databases in reindexdb. It seems like useful. > I debated also allowing users to specify different types of objects in the > same command (e.g., "vacuumdb --schema myschema --table mytable"), but it > looked like this would require a more substantial rewrite, and I didn't > feel that the behavior was intuitive. For the example I just gave, does > the user expect us to process both the "myschema" schema and the "mytable" > table, or does the user want us to process the "mytable" table in the > "myschema" schema? In vacuumdb, this is already blocked, but reindexdb I think spcyfying the two at once is inconsistent if we maintain the current behavior of those options. It seems to me that that change clearly modifies the functionality of the options. As a result, those options look like restriction filters. For example, "vacuumdb -s s1_* -t t1" will vacuum all table named "t1" in all schemas matches "s1_*". > accepts combinations of tables, schemas, and indexes (yet disallows > specifying --system along with other types of objects). Since this is > inconsistent with vacuumdb and IMO ambiguous, I've restricted such > combinations in 0003. > > Thoughts? While I think this is useful, primarily for system catalogs, I'm not entirely convinced about its practicality to user objects. It's difficult for me to imagine that a situation where all databases share the same schema would be major. Assuming this is used for user objects, it may be necessary to safely exclude databases that lack the specified schema or table, provided the object present in at least one other database. But the exclusion should be done with printing some warnings. It could also be necessary to safely move to the next object when reindex or cluster operation fails on a single object due to missing prerequisite situations. But I don't think we might want to add such complexity to these "script" tools. So.. an alternative path might be to introduce a new option like --syscatalog to specify system catalogs as the only option that can be combined with --all. In doing so, we can leave the --table and --schema options untouched. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
Thanks for taking a look. On Thu, Jun 29, 2023 at 02:16:26PM +0900, Kyotaro Horiguchi wrote: > At Wed, 28 Jun 2023 16:24:02 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in >> I debated also allowing users to specify different types of objects in the >> same command (e.g., "vacuumdb --schema myschema --table mytable"), but it >> looked like this would require a more substantial rewrite, and I didn't >> feel that the behavior was intuitive. For the example I just gave, does >> the user expect us to process both the "myschema" schema and the "mytable" >> table, or does the user want us to process the "mytable" table in the >> "myschema" schema? In vacuumdb, this is already blocked, but reindexdb > > I think spcyfying the two at once is inconsistent if we maintain the > current behavior of those options. > > It seems to me that that change clearly modifies the functionality of > the options. As a result, those options look like restriction > filters. For example, "vacuumdb -s s1_* -t t1" will vacuum all table > named "t1" in all schemas matches "s1_*". Sorry, I'm not following. I intentionally avoided allowing combinations of --schema and --table in the patches I sent. This is the current behavior of vacuumdb. Are you suggesting that they should be treated as restriction filters? > While I think this is useful, primarily for system catalogs, I'm not > entirely convinced about its practicality to user objects. It's > difficult for me to imagine that a situation where all databases share > the same schema would be major. > > Assuming this is used for user objects, it may be necessary to safely > exclude databases that lack the specified schema or table, provided > the object present in at least one other database. But the exclusion > should be done with printing some warnings. It could also be > necessary to safely move to the next object when reindex or cluster > operation fails on a single object due to missing prerequisite > situations. But I don't think we might want to add such complexity to > these "script" tools. Perhaps we could add something like a --skip-missing option. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Kyotaro Horiguchi
Date:
At Thu, 29 Jun 2023 13:56:38 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > Thanks for taking a look. > > On Thu, Jun 29, 2023 at 02:16:26PM +0900, Kyotaro Horiguchi wrote: > > At Wed, 28 Jun 2023 16:24:02 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > >> I debated also allowing users to specify different types of objects in the > >> same command (e.g., "vacuumdb --schema myschema --table mytable"), but it > >> looked like this would require a more substantial rewrite, and I didn't > >> feel that the behavior was intuitive. For the example I just gave, does > >> the user expect us to process both the "myschema" schema and the "mytable" > >> table, or does the user want us to process the "mytable" table in the > >> "myschema" schema? In vacuumdb, this is already blocked, but reindexdb > > > > I think spcyfying the two at once is inconsistent if we maintain the > > current behavior of those options. > > > > It seems to me that that change clearly modifies the functionality of > > the options. As a result, those options look like restriction > > filters. For example, "vacuumdb -s s1_* -t t1" will vacuum all table > > named "t1" in all schemas matches "s1_*". > > Sorry, I'm not following. I intentionally avoided allowing combinations of > --schema and --table in the patches I sent. This is the current behavior > of vacuumdb. Are you suggesting that they should be treated as restriction > filters? No. I'm not suggesting. Just saying that they would look appear to work as a restriction filters if those two options can be specified at once. > > While I think this is useful, primarily for system catalogs, I'm not > > entirely convinced about its practicality to user objects. It's > > difficult for me to imagine that a situation where all databases share > > the same schema would be major. > > > > Assuming this is used for user objects, it may be necessary to safely > > exclude databases that lack the specified schema or table, provided > > the object present in at least one other database. But the exclusion > > should be done with printing some warnings. It could also be > > necessary to safely move to the next object when reindex or cluster > > operation fails on a single object due to missing prerequisite > > situations. But I don't think we might want to add such complexity to > > these "script" tools. > > Perhaps we could add something like a --skip-missing option. But isn't it a bit too complicated for the gain? I don't have a strong objection if we're fine with just allowing "--all --schema=xxx", knowing that it will works cleanly only for system catalogs. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
On Fri, Jun 30, 2023 at 12:05:17PM +0900, Kyotaro Horiguchi wrote: > At Thu, 29 Jun 2023 13:56:38 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in >> Sorry, I'm not following. I intentionally avoided allowing combinations of >> --schema and --table in the patches I sent. This is the current behavior >> of vacuumdb. Are you suggesting that they should be treated as restriction >> filters? > > No. I'm not suggesting. Just saying that they would look appear to > work as a restriction filters if those two options can be specified at > once. Got it, thanks for clarifying. >> Perhaps we could add something like a --skip-missing option. > > But isn't it a bit too complicated for the gain? > > I don't have a strong objection if we're fine with just allowing > "--all --schema=xxx", knowing that it will works cleanly only for > system catalogs. Okay. I haven't scoped out what would be required to support a --skip-missing option, but it doesn't sound too terribly complicated to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
On Mon, Oct 23, 2023 at 03:25:42PM -0500, Nathan Bossart wrote: > rebased I saw that this thread was referenced elsewhere [0], so I figured I'd take a fresh look. From a quick glance, I'd say 0001 and 0002 are pretty reasonable and could probably be committed for v17. 0003 probably requires some more attention. If there is indeed interest in these changes, I'll try to spend some more time on it. [0] https://postgr.es/m/E0D2F0CE-D27C-49B1-902B-AD8D2427F07E%40yandex-team.ru -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Dean Rasheed
Date:
On Tue, 5 Mar 2024 at 02:22, Nathan Bossart <nathandbossart@gmail.com> wrote: > > I saw that this thread was referenced elsewhere [0], so I figured I'd take > a fresh look. From a quick glance, I'd say 0001 and 0002 are pretty > reasonable and could probably be committed for v17. > I'm not sure how useful these changes are, but I don't really object. You need to update the synopsis section of the docs though. Regards, Dean
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
On Tue, Mar 05, 2024 at 11:20:13PM +0000, Dean Rasheed wrote: > I'm not sure how useful these changes are, but I don't really object. > You need to update the synopsis section of the docs though. Thanks for taking a look. I updated the synopsis sections in v3. I also spent some more time on the reindexdb patch (0003). I previously had decided to restrict combinations of tables, schemas, and indexes because I felt it was "ambiguous and inconsistent with vacuumdb," but looking closer, I think that's the wrong move. reindexdb already supports such combinations, which it interprets to mean it should reindex each listed object. So, I removed that change in v3. Even though reindexdb allows combinations of tables, schema, and indexes, it doesn't allow combinations of "system catalogs" and other objects, and it's not clear why. In v3, I've removed this restriction, which ended up simplifying the 0003 patch a bit. Like combinations of tables, schemas, and indexes, reindexdb will now interpret combinations that include --system to mean it should reindex each listed object as well as the system catalogs. Ideally, we'd allow similar combinations in vacuumdb, but I believe that would require a much more invasive patch, and I've already spent far more time on this change than I wanted to. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Dean Rasheed
Date:
On Wed, 6 Mar 2024 at 22:22, Nathan Bossart <nathandbossart@gmail.com> wrote: > > Thanks for taking a look. I updated the synopsis sections in v3. OK, that looks good. The vacuumdb synopsis in particular looks a lot better now that "-N | --exclude-schema" is on its own line, because it was hard to read previously, and easy to mistakenly think that -n could be combined with -N. If I'm nitpicking, "[--verbose | -v]" in the clusterdb synopsis should be replaced with "[option...]", like the other commands, because there are other general-purpose options like --quiet and --echo. > I also spent some more time on the reindexdb patch (0003). I previously > had decided to restrict combinations of tables, schemas, and indexes > because I felt it was "ambiguous and inconsistent with vacuumdb," but > looking closer, I think that's the wrong move. reindexdb already supports > such combinations, which it interprets to mean it should reindex each > listed object. So, I removed that change in v3. Makes sense. > Even though reindexdb allows combinations of tables, schema, and indexes, > it doesn't allow combinations of "system catalogs" and other objects, and > it's not clear why. In v3, I've removed this restriction, which ended up > simplifying the 0003 patch a bit. Like combinations of tables, schemas, > and indexes, reindexdb will now interpret combinations that include > --system to mean it should reindex each listed object as well as the system > catalogs. OK, that looks useful, especially given that most people will still probably use this against a single database, and it's making that more flexible. I think this is good to go. Regards, Dean
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
On Fri, Mar 08, 2024 at 09:33:19AM +0000, Dean Rasheed wrote: > If I'm nitpicking, "[--verbose | -v]" in the clusterdb synopsis should > be replaced with "[option...]", like the other commands, because there > are other general-purpose options like --quiet and --echo. Good catch. I fixed that in v4. We could probably back-patch this particular change, but since it's been this way for a while, I don't think it's terribly important to do so. > I think this is good to go. Thanks. In v4, I've added a first draft of the commit messages, and I am planning to commit this early next week. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases
From
Nathan Bossart
Date:
On Fri, Mar 08, 2024 at 04:03:22PM -0600, Nathan Bossart wrote: > On Fri, Mar 08, 2024 at 09:33:19AM +0000, Dean Rasheed wrote: >> I think this is good to go. > > Thanks. In v4, I've added a first draft of the commit messages, and I am > planning to commit this early next week. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com