Thread: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-20 18:54, Alvaro Herrera wrote: > On 2021-Jan-20, Alvaro Herrera wrote: > >> On 2021-Jan-20, Michael Paquier wrote: >> >> > +/* >> > + * This is mostly duplicating ATExecSetTableSpaceNoStorage, >> > + * which should maybe be factored out to a library function. >> > + */ >> > Wouldn't it be better to do first the refactoring of 0002 and then >> > 0001 so as REINDEX can use the new routine, instead of putting that >> > into a comment? >> >> I think merging 0001 and 0002 into a single commit is a reasonable >> approach. > > ... except it doesn't make a lot of sense to have set_rel_tablespace in > either indexcmds.c or index.c. I think tablecmds.c is a better place > for it. (I would have thought catalog/storage.c, but that one's not > the > right abstraction level it seems.) > I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New function SetRelTablesapce() is placed into the tablecmds.c. Following 0002 gets use of it. Is it close to what you and Michael suggested? > > But surely ATExecSetTableSpaceNoStorage should be using this new > routine. (I first thought 0002 was doing that, since that commit is > calling itself a "refactoring", but now that I look closer, it's not.) > Yeah, this 'refactoring' was initially referring to refactoring of what Justin added to one of the previous 0001. And it was meant to be merged with 0001, once agreed, but we got distracted by other stuff. I have not yet addressed Michael's concerns regarding reindex of partitions. I am going to look closer on it tomorrow. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-20 21:08, Alexey Kondratov wrote: > On 2021-01-20 18:54, Alvaro Herrera wrote: >> On 2021-Jan-20, Alvaro Herrera wrote: >> >>> On 2021-Jan-20, Michael Paquier wrote: >>> >>> > +/* >>> > + * This is mostly duplicating ATExecSetTableSpaceNoStorage, >>> > + * which should maybe be factored out to a library function. >>> > + */ >>> > Wouldn't it be better to do first the refactoring of 0002 and then >>> > 0001 so as REINDEX can use the new routine, instead of putting that >>> > into a comment? >>> >>> I think merging 0001 and 0002 into a single commit is a reasonable >>> approach. >> >> ... except it doesn't make a lot of sense to have set_rel_tablespace >> in >> either indexcmds.c or index.c. I think tablecmds.c is a better place >> for it. (I would have thought catalog/storage.c, but that one's not >> the >> right abstraction level it seems.) >> > > I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New > function SetRelTablesapce() is placed into the tablecmds.c. Following > 0002 gets use of it. Is it close to what you and Michael suggested? > Ugh, forgot to attach the patches. Here they are. -- Alexey
Attachment
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-21 04:41, Michael Paquier wrote: > On Wed, Jan 20, 2021 at 03:34:39PM -0300, Alvaro Herrera wrote: >> On 2021-Jan-20, Alexey Kondratov wrote: >>> Ugh, forgot to attach the patches. Here they are. >> >> Yeah, looks reasonable. > >>> + >>> + if (changed) >>> + /* Record dependency on tablespace */ >>> + changeDependencyOnTablespace(RelationRelationId, >>> + reloid, rd_rel->reltablespace); >> >> Why have a separate "if (changed)" block here instead of merging with >> the above? > > Yep. > Sure, this is a refactoring artifact. > + if (SetRelTablespace(reloid, newTableSpace)) > + /* Make sure the reltablespace change is visible */ > + CommandCounterIncrement(); > At quick glance, I am wondering why you just don't do a CCI within > SetRelTablespace(). > I did it that way for a better readability at first, since it looks more natural when you do some change (SetRelTablespace) and then make them visible with CCI. Second argument was that in the case of reindex_index() we have to also call RelationAssumeNewRelfilenode() and RelationDropStorage() before doing CCI and making the new tablespace visible. And this part is critical, I guess. > > + This specifies that indexes will be rebuilt on a new tablespace. > + Cannot be used with "mapped" relations. If > <literal>SCHEMA</literal>, > + <literal>DATABASE</literal> or <literal>SYSTEM</literal> is > specified, then > + all unsuitable relations will be skipped and a single > <literal>WARNING</literal> > + will be generated. > What is an unsuitable relation? How can the end user know that? > This was referring to mapped relations mentioned in the previous sentence. I have tried to rewrite this part and make it more specific in my current version. Also added Justin's changes to the docs and comment. > This is missing ACL checks when moving the index into a new location, > so this requires some pg_tablespace_aclcheck() calls, and the other > patches share the same issue. > I added proper pg_tablespace_aclcheck()'s into the reindex_index() and ReindexPartitions(). > + else if (partkind == RELKIND_PARTITIONED_TABLE) > + { > + Relation rel = table_open(partoid, ShareLock); > + List *indexIds = RelationGetIndexList(rel); > + ListCell *lc; > + > + table_close(rel, NoLock); > + foreach (lc, indexIds) > + { > + Oid indexid = lfirst_oid(lc); > + (void) set_rel_tablespace(indexid, > params->tablespaceOid); > + } > + } > This is really a good question. ReindexPartitions() would trigger one > transaction per leaf to work on. Changing the tablespace of the > partitioned table(s) before doing any work has the advantage to tell > any new partition to use the new tablespace. Now, I see a struggling > point here: what should we do if the processing fails in the middle of > the move, leaving a portion of the leaves in the previous tablespace? > On a follow-up reindex with the same command, should the command force > a reindex even on the partitions that have been moved? Or could there > be a point in skipping the partitions that are already on the new > tablespace and only process the ones on the previous tablespace? It > seems to me that the first scenario makes the most sense as currently > a REINDEX works on all the relations defined, though there could be > use cases for the second case. This should be documented, I think. > I agree that follow-up REINDEX should also reindex moved partitions, since REINDEX (TABLESPACE ...) is still reindex at first. I will try to put something about this part into the docs. Also I think that we cannot be sure that nothing happened with already reindexed partitions between two consequent REINDEX calls. > There are no tests for partitioned tables, aka we'd want to make sure > that the new partitioned index is on the correct tablespace, as well > as all its leaves. It may be better to have at least two levels of > partitioned tables, as well as a partitioned table with no leaves in > the cases dealt with. > Yes, sure, it makes sense. > + * > + * Even if a table's indexes were moved to a new tablespace, > the index > + * on its toast table is not normally moved. > */ > Still, REINDEX (TABLESPACE) TABLE should move all of them to be > consistent with ALTER TABLE SET TABLESPACE, but that's not the case > with this code, no? This requires proper test coverage, but there is > nothing of the kind in this patch. You are right, we do not move TOAST indexes now, since IsSystemRelation() is true for TOAST indexes, so I thought that we should not allow moving them without allow_system_table_mods=true. Now I wonder why ALTER TABLE does that. I am going to attach the new version of patch set today or tomorrow. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-21 17:06, Alexey Kondratov wrote: > On 2021-01-21 04:41, Michael Paquier wrote: > >> There are no tests for partitioned tables, aka we'd want to make sure >> that the new partitioned index is on the correct tablespace, as well >> as all its leaves. It may be better to have at least two levels of >> partitioned tables, as well as a partitioned table with no leaves in >> the cases dealt with. >> > > Yes, sure, it makes sense. > >> + * >> + * Even if a table's indexes were moved to a new tablespace, >> the index >> + * on its toast table is not normally moved. >> */ >> Still, REINDEX (TABLESPACE) TABLE should move all of them to be >> consistent with ALTER TABLE SET TABLESPACE, but that's not the case >> with this code, no? This requires proper test coverage, but there is >> nothing of the kind in this patch. > > You are right, we do not move TOAST indexes now, since > IsSystemRelation() is true for TOAST indexes, so I thought that we > should not allow moving them without allow_system_table_mods=true. Now > I wonder why ALTER TABLE does that. > > I am going to attach the new version of patch set today or tomorrow. > Attached is a new patch set of first two patches, that should resolve all the issues raised before (ACL, docs, tests) excepting TOAST. Double thanks for suggestion to add more tests with nested partitioning. I have found and squashed a huge bug related to the returning back to the default tablespace using newly added tests. Regarding TOAST. Now we skip moving toast indexes or throw error if someone wants to move TOAST index directly. I had a look on ALTER TABLE SET TABLESPACE and it has a bit complicated logic: 1) You cannot move TOAST table directly. 2) But if you move basic relation that TOAST table belongs to, then they are moved altogether. 3) Same logic as 2) happens if one does ALTER TABLE ALL IN TABLESPACE ... That way, ALTER TABLE allows moving TOAST tables (with indexes) implicitly, but does not allow doing that explicitly. In the same time I found docs to be vague about such behavior it only says: All tables in the current database in a tablespace can be moved by using the ALL IN TABLESPACE ... Note that system catalogs are not moved by this command Changing any part of a system catalog table is not permitted. So actually ALTER TABLE treats TOAST relations as system sometimes, but sometimes not. From the end user perspective it makes sense to move TOAST with main table when doing ALTER TABLE SET TABLESPACE. But should we touch indexes on TOAST table with REINDEX? We cannot move TOAST relation itself, since we are doing only a reindex, so we end up in the state when TOAST table and its index are placed in the different tablespaces. This state is not reachable with ALTER TABLE/INDEX, so it seem we should not allow it with REINDEX as well, should we? Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-22 00:26, Justin Pryzby wrote: > On Thu, Jan 21, 2021 at 11:48:08PM +0300, Alexey Kondratov wrote: >> Attached is a new patch set of first two patches, that should resolve >> all >> the issues raised before (ACL, docs, tests) excepting TOAST. Double >> thanks >> for suggestion to add more tests with nested partitioning. I have >> found and >> squashed a huge bug related to the returning back to the default >> tablespace >> using newly added tests. >> >> Regarding TOAST. Now we skip moving toast indexes or throw error if >> someone >> wants to move TOAST index directly. I had a look on ALTER TABLE SET >> TABLESPACE and it has a bit complicated logic: >> >> 1) You cannot move TOAST table directly. >> 2) But if you move basic relation that TOAST table belongs to, then >> they are >> moved altogether. >> 3) Same logic as 2) happens if one does ALTER TABLE ALL IN TABLESPACE >> ... >> >> That way, ALTER TABLE allows moving TOAST tables (with indexes) >> implicitly, >> but does not allow doing that explicitly. In the same time I found >> docs to >> be vague about such behavior it only says: >> >> All tables in the current database in a tablespace can be moved >> by using the ALL IN TABLESPACE ... Note that system catalogs are >> not moved by this command >> >> Changing any part of a system catalog table is not permitted. >> >> So actually ALTER TABLE treats TOAST relations as system sometimes, >> but >> sometimes not. >> >> From the end user perspective it makes sense to move TOAST with main >> table >> when doing ALTER TABLE SET TABLESPACE. But should we touch indexes on >> TOAST >> table with REINDEX? We cannot move TOAST relation itself, since we are >> doing >> only a reindex, so we end up in the state when TOAST table and its >> index are >> placed in the different tablespaces. This state is not reachable with >> ALTER >> TABLE/INDEX, so it seem we should not allow it with REINDEX as well, >> should >> we? > >> + * Even if a table's indexes were moved to a new tablespace, the >> index >> + * on its toast table is not normally moved. >> */ >> ReindexParams newparams = *params; >> >> newparams.options &= ~(REINDEXOPT_MISSING_OK); >> + if (!allowSystemTableMods) >> + newparams.tablespaceOid = InvalidOid; > > I think you're right. So actually TOAST should never move, even if > allowSystemTableMods, right ? > I think so. I would prefer to do not move TOAST indexes implicitly at all during reindex. > >> @@ -292,7 +315,11 @@ REINDEX [ ( <replaceable >> class="parameter">option</replaceable> [, ...] ) ] { IN >> with <command>REINDEX INDEX</command> or <command>REINDEX >> TABLE</command>, >> respectively. Each partition of the specified partitioned relation >> is >> reindexed in a separate transaction. Those commands cannot be used >> inside >> - a transaction block when working on a partitioned table or index. >> + a transaction block when working on a partitioned table or index. >> If >> + <command>REINDEX</command> with <literal>TABLESPACE</literal> >> executed >> + on partitioned relation fails it may have moved some partitions to >> the new >> + tablespace. Repeated command will still reindex all partitions >> even if they >> + are already in the new tablespace. > > Minor corrections here: > > If a <command>REINDEX</command> command fails when run on a partitioned > relation, and <literal>TABLESPACE</literal> was specified, then it may > have > moved indexes on some partitions to the new tablespace. Re-running the > command > will reindex all partitions and move previously-unprocessed indexes to > the new > tablespace. Sounds good to me. I have updated patches accordingly and also simplified tablespaceOid checks and assignment in the newly added SetRelTableSpace(). Result is attached as two separate patches for an ease of review, but no objections to merge them and apply at once if everything is fine. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-25 11:07, Michael Paquier wrote: > On Fri, Jan 22, 2021 at 05:07:02PM +0300, Alexey Kondratov wrote: >> I have updated patches accordingly and also simplified tablespaceOid >> checks >> and assignment in the newly added SetRelTableSpace(). Result is >> attached as >> two separate patches for an ease of review, but no objections to merge >> them >> and apply at once if everything is fine. > > extern void SetRelationHasSubclass(Oid relationId, bool > relhassubclass); > +extern bool SetRelTableSpace(Oid reloid, Oid tablespaceOid); > Seeing SetRelationHasSubclass(), wouldn't it be more consistent to use > SetRelationTableSpace() as routine name? > > I think that we should document that the caller of this routine had > better do a CCI once done to make the tablespace chage visible. > Except for those two nits, the patch needs an indentation run and some > style tweaks but its logic looks fine. So I'll apply that first > piece. > I updated comment with CCI info, did pgindent run and renamed new function to SetRelationTableSpace(). New patch is attached. > +INSERT INTO regress_tblspace_test_tbl (num1, num2, t) > + SELECT round(random()*100), random(), repeat('text', 1000000) > + FROM generate_series(1, 10) s(i); > Repeating 1M times a text value is too costly for such a test. And as > even for empty tables there is one page created for toast indexes, > there is no need for that? > Yes, TOAST relation is created anyway. I just wanted to put some data into a TOAST index, so REINDEX did some meaningful work there, not only a new relfilenode creation. However you are right and this query increases tablespace tests execution for more for more than 2 times on my machine. I think that it is not really required. > > This patch is introducing three new checks for system catalogs: > - don't use tablespace for mapped relations. > - don't use tablespace for system relations, except if > allowSystemTableMods. > - don't move non-shared relation to global tablespace. > For the non-concurrent case, all three checks are in reindex_index(). > For the concurrent case, the two first checks are in > ReindexMultipleTables() and the third one is in > ReindexRelationConcurrently(). That's rather tricky to follow because > CONCURRENTLY is not allowed on system relations. I am wondering if it > would be worth an extra comment effort, or if there is a way to > consolidate that better. > Yeah, all these checks we complicated from the beginning. I will try to find a better place tomorrow or put more info into the comments at least. I am also going to check/fix the remaining points regarding 002 tomorrow. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-26 09:58, Michael Paquier wrote: > On Mon, Jan 25, 2021 at 11:11:38PM +0300, Alexey Kondratov wrote: >> I updated comment with CCI info, did pgindent run and renamed new >> function >> to SetRelationTableSpace(). New patch is attached. >> >> [...] >> >> Yeah, all these checks we complicated from the beginning. I will try >> to find >> a better place tomorrow or put more info into the comments at least. > > I was reviewing that, and I think that we can do a better > consolidation on several points that will also help the features > discussed on this thread for VACUUM, CLUSTER and REINDEX. > > If you look closely, ATExecSetTableSpace() uses the same logic as the > code modified here to check if a relation can be moved to a new > tablespace, with extra checks for mapped relations, > GLOBALTABLESPACE_OID or if attempting to manipulate a temp relation > from another session. There are two differences though: > - Custom actions are taken between the phase where we check if a > relation can be moved to a new tablespace, and the update of > pg_class. > - ATExecSetTableSpace() needs to be able to set a given relation > relfilenode on top of reltablespace, the newly-created one. > > So I think that the heart of the problem is made of two things here: > - We should have one common routine for the existing code paths and > the new code paths able to check if a tablespace move can be done or > not. The case of a cluster, reindex or vacuum on a list of relations > extracted from pg_class would still require a different handling > as incorrect relations have to be skipped, but the case of individual > relations can reuse the refactoring pieces done here > (see CheckRelationTableSpaceMove() in the attached). > - We need to have a second routine able to update reltablespace and > optionally relfilenode for a given relation's pg_class entry, once the > caller has made sure that CheckRelationTableSpaceMove() validates a > tablespace move. > I think that I got your idea. One comment: +bool +CheckRelationTableSpaceMove(Relation rel, Oid newTableSpaceId) +{ + Oid oldTableSpaceId; + Oid reloid = RelationGetRelid(rel); + + /* + * No work if no change in tablespace. Note that MyDatabaseTableSpace + * is stored as 0. + */ + oldTableSpaceId = rel->rd_rel->reltablespace; + if (newTableSpaceId == oldTableSpaceId || + (newTableSpaceId == MyDatabaseTableSpace && oldTableSpaceId == 0)) + { + InvokeObjectPostAlterHook(RelationRelationId, reloid, 0); + return false; + } CheckRelationTableSpaceMove() does not feel like a right place for invoking post alter hooks. It is intended only to check for tablespace change possibility. Anyway, ATExecSetTableSpace() and ATExecSetTableSpaceNoStorage() already do that in the no-op case. > Please note that was a bug in your previous patch 0002: shared > dependencies need to be registered if reltablespace is updated of > course, but also iff the relation has no physical storage. So > changeDependencyOnTablespace() requires a check based on > RELKIND_HAS_STORAGE(), or REINDEX would have registered shared > dependencies even for relations with storage, something we don't > want per the recent work done by Alvaro in ebfe2db. > Yes, thanks. I have removed this InvokeObjectPostAlterHook() from your 0001 and made 0002 to work on top of it. I think that now it should look closer to what you described above. In the new 0002 I moved ACL check to the upper level, i.e. ExecReindex(), and removed expensive text generation in test. Not touched yet some of your previously raised concerns. Also, you made SetRelationTableSpace() to accept Relation instead of Oid, so now we have to open/close indexes in the ReindexPartitions(), I am not sure that I use proper locking there, but it works. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
From
Alexey Kondratov
Date:
On 2021-01-27 06:14, Michael Paquier wrote: > On Wed, Jan 27, 2021 at 01:00:50AM +0300, Alexey Kondratov wrote: >> In the new 0002 I moved ACL check to the upper level, i.e. >> ExecReindex(), >> and removed expensive text generation in test. Not touched yet some of >> your >> previously raised concerns. Also, you made SetRelationTableSpace() to >> accept >> Relation instead of Oid, so now we have to open/close indexes in the >> ReindexPartitions(), I am not sure that I use proper locking there, >> but it >> works. > > Passing down Relation to the new routines makes the most sense to me > because we force the callers to think about the level of locking > that's required when doing any tablespace moves. > > + Relation iRel = index_open(partoid, ShareLock); > + > + if (CheckRelationTableSpaceMove(iRel, > params->tablespaceOid)) > + SetRelationTableSpace(iRel, > + params->tablespaceOid, > + InvalidOid); > Speaking of which, this breaks the locking assumptions of > SetRelationTableSpace(). I feel that we should think harder about > this part for partitioned indexes and tables because this looks rather > unsafe in terms of locking assumptions with partition trees. If we > cannot come up with a safe solution, I would be fine with disallowing > TABLESPACE in this case, as a first step. Not all problems have to be > solved at once, and even without this part the feature is still > useful. > I have read more about lock levels and ShareLock should prevent any kind of physical modification of indexes. We already hold ShareLock doing find_all_inheritors(), which is higher than ShareUpdateExclusiveLock, so using ShareLock seems to be safe here, but I will look on it closer. > > + /* It's not a shared catalog, so refuse to move it to shared > tablespace */ > + if (params->tablespaceOid == GLOBALTABLESPACE_OID) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot move non-shared relation to tablespace > \"%s\"", > + get_tablespace_name(params->tablespaceOid)))); > Why is that needed if CheckRelationTableSpaceMove() is used? > This is from ReindexRelationConcurrently() where we do not use CheckRelationTableSpaceMove(). For me it makes sense to add only this GLOBALTABLESPACE_OID check there, since before we already check for system catalogs and after for temp relations, so adding CheckRelationTableSpaceMove() will be a double-check. > > - indexRelation->rd_rel->reltablespace, > + OidIsValid(tablespaceOid) ? > + tablespaceOid : > indexRelation->rd_rel->reltablespace, > Let's remove this logic from index_concurrently_create_copy() and let > the caller directly decide the tablespace to use, without a dependency > on InvalidOid in the inner routine. A share update exclusive lock is > already hold on the old index when creating the concurrent copy, so > there won't be concurrent schema changes. > Changed. Also added tests for ACL checks, relfilenode changes. Added ACL recheck for multi-transactional case. Added info about TOAST index reindexing. Changed some comments. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company