Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Álvaro Herrera
Subject Re: Adding REPACK [concurrently]
Date
Msg-id 202510070909.biaseppvydvg@alvherre.pgsql
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Mihail Nikalayeu <mihailnikalayeu@gmail.com>)
List pgsql-hackers
On 2025-Sep-26, Mihail Nikalayeu wrote:

> Should we rename it to repack_context to be aligned with the calling side?

Sure, done.

> > cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
> 
> May be changed to RepackCommandAsString

Oh, of course.

> Documentation of pg_repackdb contains a lot of "analyze" and even
> "--analyze" parameter - but I can't see anything related in the code.

Hmm, yeah, that was missing.  I added it.  In doing so I noticed that
because vacuumdb allows a column list to be given, then we should do
likewise here, both in pg_repackdb and in the REPACK command, so I added
support for that.  This changed the grammar a little bit.  Note that we
still don't allow multiple tables to be given to the SQL command REPACK,
so if you want to repack multiple tables, you need to call it without
giving a name or give the name of a partitioned table.  The pg_repackdb
utility allows you to give multiple -t switches, and in that case it
calls REPACK once for each name.

Also, if you give a column list to pg_repackdb, then you must pass -z.
This is consistent with vacuumdb via VACUUM ANALYZE.

On 2025-Sep-26, Robert Treat wrote:

> #1
> "pg_repackdb --help" does not mention the --index option, although the
> flag is accepted. I'm not sure if this is meant to match clusterdb,
> but since we need the index option to invoke the clustering behavior,
> I think it needs to be there.

Oops, yes, added.

> #2
> [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
> --index=idx_last_name
> pg_repackdb: repacking database "pagila"
> INFO:  clustering "public.customer" using sequential scan and sort
> 
> [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t customer
> pg_repackdb: repacking database "pagila"
> INFO:  vacuuming "public.customer"
> 
> This was less confusing once I figured out we could pass the --index
> option, but even with that it is a little confusing, I think mostly
> because it looks like we are "vacuuming" the table, which in a world
> of repack and vacuum (ie. no vacuum full) doesn't make sense. I think
> the right thing to do here would be to modify it to be "repacking %s"
> in both cases, with the "using sequential scan and sort" as the means
> to understand which version of repack is being executed.

I changed these messages to always say "repacking", but it will say
"using sequential scan and sort", or "using index", or "following
physical order", respectively.

That said, on this topic, I've always been bothered by our usage of
command names as verbs, because they are (IMO) horrible for translation.
For instance, in this version of the patch I am making this change:

    if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
        ereport(ERROR,
-               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                errmsg("cannot cluster a shared catalog")));
+               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+               errmsg("cannot run %s on a shared catalog",
+                      RepackCommandAsString(cmd)));

In the old version, the message is not very translatable because you
have to find a native word to say "to cluster" or "to vacuum", and that
doesn't always work very well in a direct translation.  For instance, in
the Spanish message catalog you find this sort of thing:

msgid "vacuuming \"%s.%s.%s\""
msgstr "haciendo vacuum a «%s.%s.%s»"

which is pretty clear ... but the reason it works, is that I have turned
the phrase around before translating it.  I would struggle if I had to
find a Spanish verb that means "to repack" without contorting the
message or saying something absurd and/or against Spanish language
rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl
XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is
correct, but far enough from "repack" that it's annoying and potentially
confusing).  So I would rather the original used "running REPACK on
table using method XYZ", which is very very easy to translate, and then
the translator doesn't have to editorialize.

> #3
> pg_repackdb does not offer an --analyze option, which istm it should
> to match the REPACK command

Added, as mentioned above.

> #4

Fixed.

> #5
> [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
> pg_repackdb: repacking database "pagila"
> 
> In the above scenario, I am repacking without having previously
> specified an index. At the SQL level this would throw an error, at the
> command line it gives me a heart attack. :-)
> It's actually not that bad, because we don't actually do anything, but
> maybe we should throw an error?

Yeah, I think this is confusing.  I think we should make pg_repackdb
explicitly indicate what has been done, in all cases, without requiring
-v.  Otherwise it's too confusing, particularly for the using-index mode
which determines which tables to process based on the existance of an
index marked indiscluster.

> #6
> On the individual command pages (like sql-repack.html), I think there
> should be more cross-linking, ie. repack should probably say "see also
> cluster" and vice versa. Likely similarly with vacuum and repack.

Hmm, I don't necessarily agree -- I think the sql-cluster page should be
mostly empty and reference the sql-repack page.  We don't need any
incoming links to sql-cluster, I think.  All the useful info should be
in the sql-repack page only.  The same applies for VACUUM FULL: an
outgoing link in sql-vacuum to sql-repack is good to have, but we don't
need links from sql-repack to sql-vacuum.

> #7
> Is there some reason you chose to intermingle the repack regression
> tests with the existing tests? I feel like it'd be easier to
> differentiate potential regressions and new functionality if these
> were separated.

I admit I haven't paid too much attention to these tests.  I think I
would rather create a separate src/test/regress/sql/repack.sql file with
the tests for this command.  Let's consider this part a WIP for now --
clearly more tests are needed both for the SQL command CLUSTER and for
pg_repackdb.

In the meantime, this version has been rebased to current sources.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Attachment

pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: [PATCH] ecpg: check return value of replace_variables()
Next
From: Tomas Vondra
Date:
Subject: Re: Should we update the random_page_cost default value?