Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Robert Treat
Subject Re: Adding REPACK [concurrently]
Date
Msg-id CABV9wwOo=wvq1hwTRK6HgBWUB=ekzsEebY30EWoc1V9UJQrrrw@mail.gmail.com
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Álvaro Herrera <alvherre@kurilemu.de>)
List pgsql-hackers
On Tue, Oct 7, 2025 at 10:05 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> On 2025-Sep-26, Robert Treat wrote:
<snip>
> That said, on this topic, I've always been bothered by our usage of
> command names as verbs, because they are (IMO) horrible for translation.
> For instance, in this version of the patch I am making this change:
>
>     if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
>         ereport(ERROR,
> -               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -                errmsg("cannot cluster a shared catalog")));
> +               errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +               errmsg("cannot run %s on a shared catalog",
> +                      RepackCommandAsString(cmd)));
>
> In the old version, the message is not very translatable because you
> have to find a native word to say "to cluster" or "to vacuum", and that
> doesn't always work very well in a direct translation.  For instance, in
> the Spanish message catalog you find this sort of thing:
>
> msgid "vacuuming \"%s.%s.%s\""
> msgstr "haciendo vacuum a «%s.%s.%s»"
>
> which is pretty clear ... but the reason it works, is that I have turned
> the phrase around before translating it.  I would struggle if I had to
> find a Spanish verb that means "to repack" without contorting the
> message or saying something absurd and/or against Spanish language
> rules, such as "ejecutando repack en table XYZ" or "repaqueando tabl
> XYZ" (that's not a word!) or "reempaquetando tabla XYZ" (this is
> correct, but far enough from "repack" that it's annoying and potentially
> confusing).  So I would rather the original used "running REPACK on
> table using method XYZ", which is very very easy to translate, and then
> the translator doesn't have to editorialize.
>

I see you didn't do this in the current patch, but +1 for this idea
from me. And if you think it'd help, I'm also +1 on the idea for the
main docs as well, for example doing something like

+  <para>
-   <application>pg_repackdb</application> is a utility for repacking a
+   <application>pg_repackdb</application> is a utility for running REPACK on a
+   <productname>PostgreSQL</productname> database.

I'd be inclined to leave the internal comments alone though, since
they aren't translated.

> > #5
> > [xzilla@zebes] pgsql/bin/pg_repackdb -d pagila -v -t film --index
> > pg_repackdb: repacking database "pagila"
> >
> > In the above scenario, I am repacking without having previously
> > specified an index. At the SQL level this would throw an error, at the
> > command line it gives me a heart attack. :-)
> > It's actually not that bad, because we don't actually do anything, but
> > maybe we should throw an error?
>
> Yeah, I think this is confusing.  I think we should make pg_repackdb
> explicitly indicate what has been done, in all cases, without requiring
> -v.  Otherwise it's too confusing, particularly for the using-index mode
> which determines which tables to process based on the existance of an
> index marked indiscluster.
>

At the moment, clusterdb runs silently, but vacuumdb emits output, so
there is an argument for either way as default behavior. That said, I
think the current behavior of vacuum, which is what we are currently
following in pg_repackdb, is the worst of the two:

 [xzilla@zebes]  pgsql/bin/vacuumdb -t actor  pagila
vacuumdb: vacuuming database "pagila"

Without any additional information, the information we do give is
misleading; I would rather not say anything. We could of course try to
make this more verbose, but I think clusterdb actually gets this
right...
- say nothing by default (follow the "rule of silence.")
- if we want to see commands, pass -e
- if we want to see the details, pass -v
- if we do something that causes an error, return the error
- if we don't want errors, pass -q

This is also how reindexdb works, and I think most of the other
utilities, and I'd argue this is how vacuumdb should work... to the
extent I almost consider it a bug that it doesn't (I leave a little
room since I am not sure why it doesn't operate like the other
utilities). vacuum is a bit outside the purview of what we are doing
here, but I do think following clusterdb/reindexdb is the behavior we
should follow for pg_repackdb.

> I admit I haven't paid too much attention to these tests.  I think I
> would rather create a separate src/test/regress/sql/repack.sql file with
> the tests for this command.  Let's consider this part a WIP for now --
> clearly more tests are needed both for the SQL command CLUSTER and for
> pg_repackdb.

Yeah, istm as long as we have all 3 commands (repack, cluster, vacuum
full) we need regression tests for all 3.

> - pg_stat_progress_cluster is no longer a view on top of the low-level
>   pg_stat_get_progress_info() function.  Instead, it's a view on top of
>   pg_stat_progress_repack.  The only change it applies on top of that
>   one is change the command from REPACK to one of VACUUM FULL or
>   CLUSTER, depending on whether an index is being used or not.  This
>   should keep the behavior identical to previous versions.
>   Alternatively we could just hide rows where the command is REPACK, but
>   I don't think that would be any better.  This way, we maintain
>   compatibility with tools reading pg_stat_progress_cluster.  Maybe this
>   is useless and we should just drop the view, not sure, we can discuss
>   separately.
>

I think this mostly depends on how aggressive you want to be in moving
people away from cluster and toward repack. If we remove
_progress_cluster, it will force people to update monitoring which
probably encourages people to switch to pg_repackdb. We probably need
to have at least one "bridge" release though, and I think you've got
the right balance for that.

> - I noticed that you can do "CLUSTER pg_class ON some_index" and it will
>   happily modify pg_index.indisclustered, which is a bit weird
>   considering that allow_system_table_mods is off -- if you later try
>   ALTER TABLE .. SET WITHOUT CLUSTER, it won't let you.  I think this is
>   bogus and we should change it so that CLUSTER refuses to change the
>   clustered index on a system catalog, unless allow_system_table_mods is
>   on.  However, that would be a change from longstanding behavior which
>   is specifically tested for in regression tests, so I didn't do it.
>   We can discuss such a change separately.  But I did make REPACK refuse
>   to do that, because we don't need to propagate bogus historical
>   behavior.  So REPACK will fail if you try to change the indisclustered
>   index, but it will work fine if you repack based on the same index as
>   before, or repack with no index.
>

Since cluster will presumably be deprecated with this release, I'd
leave the existing behavior and move forward with repack as you've
laid out.

> - pg_repackdb: if you try with a non-superuser without specifying a
>   table name, it will fail as soon as it hits the first catalog table or
>   whatever with "ERROR: cannot lock this table".  This is sorta fine for
>   vacuumdb, but only because VACUUM itself will instead say "WARNING:
>   cannot lock table XYZ, skipping", so it's not an error and vacuumdb
>   keeps running.  IMO this is bogus: vacuumdb should not try to process
>   tables that it doesn't have privileges to.  However, not wanting to
>   change longstanding behavior, I left that alone.  For pg_repackdb, I
>   added a condition in the WHERE clause there to only fetch tables that
>   the current user has MAINTAIN privilege over.  Then you can do a
>   "pg_repackdb -U foobar" and it will nicely process the tables that
>   that user is allowed to process.  We can discuss changing the vacuumdb
>   behavior separately.

Again, vacuumdb seems to be a good example of what not to do, but I'll
leave that for another thread. In general I like this idea, but it
does make for a weird corner case where if I specify a table with -t
that I don't have permission to repack, repack returns silently whilst
doing nothing. I suppose one way to handle that would be to check if
the table passed in -t is found in the list of tables with MAINTAIN
privileges, and if not to issue a WARNING like "%s not found. Make
sure that the table exists and that you have MAINTAIN privileges".

Robert Treat
https://xzilla.net



pgsql-hackers by date:

Previous
From: Alexey Makhmutov
Date:
Subject: Re: Adding basic NUMA awareness
Next
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences