Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers
From | Álvaro Herrera |
---|---|
Subject | Re: Adding REPACK [concurrently] |
Date | |
Msg-id | 202510070909.biaseppvydvg@alvherre.pgsql Whole thread Raw |
In response to | Re: Adding REPACK [concurrently] (Mihail Nikalayeu <mihailnikalayeu@gmail.com>) |
List | pgsql-hackers |
On 2025-Sep-26, Mihail Nikalayeu wrote: > Should we rename it to repack_context to be aligned with the calling side? Sure, done. > > cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK", > > May be changed to RepackCommandAsString Oh, of course. > Documentation of pg_repackdb contains a lot of "analyze" and even > "--analyze" parameter - but I can't see anything related in the code. Hmm, yeah, that was missing. I added it. In doing so I noticed that because vacuumdb allows a column list to be given, then we should do likewise here, both in pg_repackdb and in the REPACK command, so I added support for that. This changed the grammar a little bit. Note that we still don't allow multiple tables to be given to the SQL command REPACK, so if you want to repack multiple tables, you need to call it without giving a name or give the name of a partitioned table. The pg_repackdb utility allows you to give multiple -t switches, and in that case it calls REPACK once for each name. Also, if you give a column list to pg_repackdb, then you must pass -z. This is consistent with vacuumdb via VACUUM ANALYZE. On 2025-Sep-26, Robert Treat wrote: > #1 > "pg_repackdb --help" does not mention the --index option, although the > flag is accepted. I'm not sure if this is meant to match clusterdb, > but since we need the index option to invoke the clustering behavior, > I think it needs to be there. Oops, yes, added. > #2 > [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer > --index=idx_last_name > pg_repackdb: repacking database "pagila" > INFO: clustering "public.customer" using sequential scan and sort > > [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer > pg_repackdb: repacking database "pagila" > INFO: vacuuming "public.customer" > > This was less confusing once I figured out we could pass the --index > option, but even with that it is a little confusing, I think mostly > because it looks like we are "vacuuming" the table, which in a world > of repack and vacuum (ie. no vacuum full) doesn't make sense. I think > the right thing to do here would be to modify it to be "repacking %s" > in both cases, with the "using sequential scan and sort" as the means > to understand which version of repack is being executed. I changed these messages to always say "repacking", but it will say "using sequential scan and sort", or "using index", or "following physical order", respectively. That said, on this topic, I've always been bothered by our usage of command names as verbs, because they are (IMO) horrible for translation. For instance, in this version of the patch I am making this change: if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared) ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot cluster a shared catalog"))); + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot run %s on a shared catalog", + RepackCommandAsString(cmd))); In the old version, the message is not very translatable because you have to find a native word to say "to cluster" or "to vacuum", and that doesn't always work very well in a direct translation. For instance, in the Spanish message catalog you find this sort of thing: msgid "vacuuming \"%s.%s.%s\"" msgstr "haciendo vacuum a «%s.%s.%s»" which is pretty clear ... but the reason it works, is that I have turned the phrase around before translating it. I would struggle if I had to find a Spanish verb that means "to repack" without contorting the message or saying something absurd and/or against Spanish language rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is correct, but far enough from "repack" that it's annoying and potentially confusing). So I would rather the original used "running REPACK on table using method XYZ", which is very very easy to translate, and then the translator doesn't have to editorialize. > #3 > pg_repackdb does not offer an --analyze option, which istm it should > to match the REPACK command Added, as mentioned above. > #4 Fixed. > #5 > [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index > pg_repackdb: repacking database "pagila" > > In the above scenario, I am repacking without having previously > specified an index. At the SQL level this would throw an error, at the > command line it gives me a heart attack. :-) > It's actually not that bad, because we don't actually do anything, but > maybe we should throw an error? Yeah, I think this is confusing. I think we should make pg_repackdb explicitly indicate what has been done, in all cases, without requiring -v. Otherwise it's too confusing, particularly for the using-index mode which determines which tables to process based on the existance of an index marked indiscluster. > #6 > On the individual command pages (like sql-repack.html), I think there > should be more cross-linking, ie. repack should probably say "see also > cluster" and vice versa. Likely similarly with vacuum and repack. Hmm, I don't necessarily agree -- I think the sql-cluster page should be mostly empty and reference the sql-repack page. We don't need any incoming links to sql-cluster, I think. All the useful info should be in the sql-repack page only. The same applies for VACUUM FULL: an outgoing link in sql-vacuum to sql-repack is good to have, but we don't need links from sql-repack to sql-vacuum. > #7 > Is there some reason you chose to intermingle the repack regression > tests with the existing tests? I feel like it'd be easier to > differentiate potential regressions and new functionality if these > were separated. I admit I haven't paid too much attention to these tests. I think I would rather create a separate src/test/regress/sql/repack.sql file with the tests for this command. Let's consider this part a WIP for now -- clearly more tests are needed both for the SQL command CLUSTER and for pg_repackdb. In the meantime, this version has been rebased to current sources. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
pgsql-hackers by date: