Thread: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From
Alexey Kondratov
Date:
On 2020-09-09 18:36, Justin Pryzby wrote:
> Rebased on a6642b3ae: Add support for partitioned tables and indexes in 
> REINDEX
> 
> So now this includes the new functionality and test for reindexing a
> partitioned table onto a new tablespace.  That part could use some 
> additional
> review.
> 

I have finally had a look on your changes regarding partitioned tables.

+set_rel_tablespace(Oid indexid, char *tablespace)
+{
+    Oid tablespaceOid = tablespace ? get_tablespace_oid(tablespace, false) 
:
+        InvalidOid;

You pass a tablespace name to set_rel_tablespace(), but it is already 
parsed into the Oid before. So I do not see why we need this extra work 
here instead of just passing Oid directly.

Also set_rel_tablespace() does not check for a no-op case, i.e. if 
requested tablespace is the same as before.

+    /*
+     * Set the new tablespace for the relation.  Do that only in the
+     * case where the reindex caller wishes to enforce a new tablespace.
+     */
+    if (set_tablespace &&
+        tablespaceOid != iRel->rd_rel->reltablespace)

Just noticed that this check is not completely correct as well, since it 
does not check for MyDatabaseTableSpace (stored as InvalidOid) logic.

I put these small fixes directly into the attached 0003.

Also, I thought about your comment above set_rel_tablespace() and did a 
bit 'extreme' refactoring, which is attached as a separated patch 0004. 
The only one doubtful change IMO is reordering of RelationDropStorage() 
operation inside reindex_index(). However, it only schedules unlinking 
of physical storage at transaction commit, so this refactoring seems to 
be safe.

If there will be no objections I would merge it with 0003.

On 2020-09-09 16:03, Alexey Kondratov wrote:
> On 2020-09-09 15:22, Michael Paquier wrote:
>> 
>> By the way, skimming through the patch set, I was wondering if we
>> could do the refactoring of patch 0005 as a first step
>> 
> 
> Yes, I did it with intention to put as a first patch, but wanted to
> get some feedback. It's easier to refactor the last patch without
> rebasing others.
> 
>> 
>> until I
>> noticed this part:
>> +common_option_name:
>>         NonReservedWord { $$ = $1; }
>>     | analyze_keyword { $$ = "analyze"; }
>> This is not a good idea as you make ANALYZE an option available for
>> all the commands involved in the refactoring.  A portion of that could
>> be considered though, like the use of common_option_arg.
>> 
> 
> From the grammar perspective ANY option is available for any command
> that uses parenthesized option list. All the checks and validations
> are performed at the corresponding command code.
> This analyze_keyword is actually doing only an ANALYZE word
> normalization if it's used as an option. Why it could be harmful?
> 

Michael has not replied since then, but he was relatively positive about 
0005 initially, so I put it as a first patch now.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From
Alexey Kondratov
Date:
On 2020-11-30 14:33, Michael Paquier wrote:
> On Tue, Nov 24, 2020 at 09:31:23AM -0600, Justin Pryzby wrote:
>> @cfbot: rebased
> 
> Catching up with the activity here, I can see four different things in
> the patch set attached:
> 1) Refactoring of the grammar of CLUSTER, VACUUM, ANALYZE and REINDEX
> to support values in parameters.
> 2) Tablespace change for REINDEX.
> 3) Tablespace change for VACUUM FULL/CLUSTER.
> 4) Tablespace change for indexes with VACUUM FULL/CLUSTER.
> 
> I am not sure yet about the last three points, so let's begin with 1)
> that is dealt with in 0001 and 0002.  I have spent some time on 0001,
> renaming the rule names to be less generic than "common", and applied
> it.  0002 looks to be in rather good shape, still there are a few
> things that have caught my eyes.  I'll look at that more closely
> tomorrow.
> 

Thanks. I have rebased the remaining patches on top of 873ea9ee to use 
'utility_option_list' instead of 'common_option_list'.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From
Peter Eisentraut
Date:
A side comment on this patch:  I think using enums as bit mask values is 
bad style.  So changing this:

-/* Reindex options */
-#define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */
-#define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */
-#define REINDEXOPT_MISSING_OK (1 << 2) /* skip missing relations */
-#define REINDEXOPT_CONCURRENTLY (1 << 3)   /* concurrent mode */

to this:

+typedef enum ReindexOption
+{
+   REINDEXOPT_VERBOSE = 1 << 0,    /* print progress info */
+   REINDEXOPT_REPORT_PROGRESS = 1 << 1,    /* report pgstat progress */
+   REINDEXOPT_MISSING_OK = 1 << 2, /* skip missing relations */
+   REINDEXOPT_CONCURRENTLY = 1 << 3    /* concurrent mode */
+} ReindexOption;

seems wrong.

There are a couple of more places like this, including the existing 
ClusterOption that this patched moved around, but we should be removing 
those.

My reasoning is that if you look at an enum value of this type, either 
say in a switch statement or a debugger, the enum value might not be any 
of the defined symbols.  So that way you lose all the type checking that 
an enum might give you.

Let's just keep the #define's like it is done in almost all other places.



Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

From
Alexey Kondratov
Date:
On 2020-12-04 04:25, Justin Pryzby wrote:
> On Thu, Dec 03, 2020 at 04:12:53PM +0900, Michael Paquier wrote:
>> > +typedef struct ReindexParams {
>> > +    bool concurrently;
>> > +    bool verbose;
>> > +    bool missingok;
>> > +
>> > +    int options;    /* bitmask of lowlevel REINDEXOPT_* */
>> > +} ReindexParams;
>> > +
>> 
>> By moving everything into indexcmds.c, keeping ReindexParams within it
>> makes sense to me.  Now, there is no need for the three booleans
>> because options stores the same information, no?
> 
>  I liked the bools, but dropped them so the patch is smaller.
> 

I had a look on 0001 and it looks mostly fine to me except some strange 
mixture of tabs/spaces in the ExecReindex(). There is also a couple of 
meaningful comments:

-    options =
-        (verbose ? REINDEXOPT_VERBOSE : 0) |
-        (concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+    if (verbose)
+        params.options |= REINDEXOPT_VERBOSE;

Why do we need this intermediate 'verbose' variable here? We only use it 
once to set a bitmask. Maybe we can do it like this:

params.options |= defGetBoolean(opt) ?
    REINDEXOPT_VERBOSE : 0;

See also attached txt file with diff (I wonder can I trick cfbot this 
way, so it does not apply the diff).

+    int options;    /* bitmask of lowlevel REINDEXOPT_* */

I would prefer if the comment says '/* bitmask of ReindexOption */' as 
in the VacuumOptions, since citing the exact enum type make it easier to 
navigate source code.

> 
> Regarding the REINDEX patch, I think this comment is misleading:
> 
> |+                * Even if table was moved to new tablespace,
> normally toast cannot move.
> |                 */
> |+               Oid toasttablespaceOid = allowSystemTableMods ?
> tablespaceOid : InvalidOid;
> |                result |= reindex_relation(toast_relid, flags,
> 
> I think it ought to say "Even if a table's indexes were moved to a new
> tablespace, its toast table's index is not normally moved"
> Right ?
> 

Yes, I think so, we are dealing only with index tablespace changing 
here. Thanks for noticing.

> 
> Also, I don't know whether we should check for GLOBALTABLESPACE_OID 
> after
> calling get_tablespace_oid(), or in the lowlevel routines.  Note that
> reindex_relation is called during cluster/vacuum, and in the later 
> patches, I
> moved the test from from cluster() and ExecVacuum() to 
> rebuild_relation().
> 

IIRC, I wanted to do GLOBALTABLESPACE_OID check as early as possible 
(just after getting Oid), since it does not make sense to proceed 
further if tablespace is set to that value. So initially there were a 
lot of duplicative GLOBALTABLESPACE_OID checks, since there were a lot 
of reindex entry-points (index, relation, concurrently, etc.). Now we 
are going to have ExecReindex(), so there are much less entry-points and 
in my opinion it is fine to keep this validation just after 
get_tablespace_oid().

However, this is mostly a sanity check. I can hardly imagine a lot of 
users trying to constantly move indexes to the global tablespace, so it 
is also OK to put this check deeper into guts.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment