Re: 回复:how to create index concurrently on partitioned table - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: 回复:how to create index concurrently on partitioned table |
Date | |
Msg-id | 20200812045438.GA11663@paquier.xyz Whole thread Raw |
In response to | Re: 回复:how to create index concurrently on partitioned table (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: 回复:how to create index concurrently on partitioned table
|
List | pgsql-hackers |
On Sun, Aug 09, 2020 at 06:44:23PM -0500, Justin Pryzby wrote: > On Sun, Aug 09, 2020 at 02:00:09PM +0900, Michael Paquier wrote: >> For now, I would recommend to focus first on 0001 to add support for >> partitioned tables and indexes to REINDEX. CIC is much more >> complicated btw, but I am not entering in the details now. >> >> + /* Avoid erroring out */ >> if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >> { >> This comment does not help, and actually this becomes incorrect as >> reindex for this relkind becomes supported once 0001 is done. > > I made a minimal change to avoid forgetting to eventually change > that part. Why not changing it then? We already filter out per relkind in all the code paths calling reindex_relation(), be it in indexcmds.c for schema-level reindex or even tablecmds.c, so I have switched this part to an elog(). >> - We should *not* handle directly partitioned index and/or table in >> ReindexRelationConcurrently() to not complicate the logic where we >> gather all the indexes of a table/matview. So I think that the list >> of partition indexes/tables to work on should be built directly in >> ReindexIndex() and ReindexTable(), and then this should call the >> second part of ReindexMultipleTables() refactored in the previous >> point. > > I think I addressed these mostly as you intended. Mostly. I have been hacking on this patch, and basically rewrote it as the attached. The handling of the memory context used to keep the list of partitions intact across transactions was rather clunky: the context was not reset when we are done, and we would call more APIs than necessary while switching to it, like find_all_inheritors() which could do much more allocations. I have fixed that by minimizing the areas where the private context is used, switching to it only when saving a new OID in the list of partitions, or a session lock (see below for this part). While on it, I found that the test coverage was not enough, so I have extended the set of tests to make sure any concurrent and non-concurrent operation for partitioned tables and indexes change the correct set of relfilenodes for each operation. I have written some custom functions to minimize the duplication (the whole thing cannot be grouped as those commands cannot run in a transaction block). Speaking of which, the patch missed that REINDEX INDEX/TABLE should not run in a transaction block when working on a partitioned relation. And the documentation needs to be clear about the limitation of each operation, so I have written more about all that. The patch also has commented out areas with slashes or such, and I have added some elog() and some asserts to make sure that we don't cross any areas that should not work with partitioned relations. While hacking on this patch, I have found an old bug in the REINDEX logic: we build a list of relations to reindex in ReindexMultipleTables() for schema and database reindexes, but it happens that we don't recheck if the relations listed actually exists or not, so dropping a relation during a large reindex can cause sparse failures because of relations that cannot be found anymore. In the case of this thread, the problem is different though (the proposed patch was full of holes regarding that) and we need to use session locks on the parent *table* partitions (not the indexes) to avoid any issues within the first transaction building the list of relations to work on, similarly to REINDEX CONCURRENTLY. So I fixed this problem this way. For the schema and database cases, I think that we would need to do something similar to VACUUM, aka have an extra code path to skip relations not defined. I'll leave that for another thread. One last thing. I think that the patch is in a rather good shape, but there is one error message I am not happy with when running some commands in a transaction block. Say, this sequence: CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id); CREATE INDEX parent_index ON parent_tab (id); BEGIN; REINDEX INDEX parent_index; -- error ERROR: 25001: REINDEX INDEX cannot run inside a transaction block LOCATION: PreventInTransactionBlock, xact.c:3386 This error can be confusing, because we don't tell directly that the relation involved here is partitioned, and REINDEX INDEX/TABLE are fine when doing their stuff on non-partitions. For other code paths, we have leveraged such errors to use the grammar specific to partitions, for example "CREATE TABLE .. PARTITION OF" or such as these don't cause translation issues, but we don't have a specific syntax of REINDEX for partitioned relations, and I don't think that we need more grammar just for that. The simplest idea I have here is to just use an error callback to set an errcontext(), saying roughly: "while reindexing partitioned table/index %s" while we go through PreventInTransactionBlock(). I have done nothing about that yet but adding an errcallback is simple enough. Perhaps somebody has a different idea here? -- Michael
Attachment
pgsql-hackers by date: