Re: [HACKERS] Proposal: Local indexes for partitioned table - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: [HACKERS] Proposal: Local indexes for partitioned table |
Date | |
Msg-id | 20180110153707.x4uklha5qupwh6qb@alvherre.pgsql Whole thread Raw |
In response to | Re: [HACKERS] Proposal: Local indexes for partitioned table (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Proposal: Local indexes for partitioned table
Re: [HACKERS] Proposal: Local indexes for partitioned table |
List | pgsql-hackers |
David Rowley wrote: > Hi Álvaro, Hi David, Thanks for the review. Attached is a delta patch that fixes most things, except your item 14 below. Before getting into the details of the items you list, in this version I also fixed a couple of issues noted by Jaime Casanova; namely a) ALTER INDEX SET TABLESPACE threw an error for partitioned indexes. Handled the same was as for partitioned tables: silently do nothing. b) Expression indexes not considered at all. (You also commented on this point). Jaime reported it for this case: create index on t_part USING gin (string_to_array(btrim((t)::text, '-'::text), '--'::text)); I dealt with this by adding code to CompareIndexInfo that runs equal() on the expression lists, after applying map_variable_attnos() (to support the case of attaching a table with dropped or reordered columns). As far as I can see, this works correctly for very simple expressions, though I didn't test Jaime's case specifically; I suppose I should add a few more tests. On to your notes: > 1. "either partition" -> "either the partition" > 4. Extra newline > 8. Missing if (attachInfos) pfree(attachInfos); > 10. lock -> locks: > 11. "already the right" -> "already in the right" > 16. It's not exactly a problem, but I had a quick look and I don't see > any other uses of list_free() checking for a non-NIL list first: Fixed. > 2. In findDependentObjects(), should the following if test be below > the ReleaseDeletionLock call? > > + if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO) > + break; > ReleaseDeletionLock(object); No, as far as I understand we should keep that lock if we break out of the loop -- it's the lock on the object being deleted (the index partition in this case). I did break the comment in two, so that now the "First," paragraph appears below the "if () break" test. That seems to make more sense. It looks like this now: /* * 3. Not all the owning objects have been visited, so * transform this deletion request into a delete of this * owning object. * * For INTERNAL_AUTO dependencies, we don't enforce this; * in other words, we don't follow the links back to the * owning object. */ if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO) break; /* * First, release caller's lock on this object and get * deletion lock on the owning object. (We must release * caller's lock to avoid deadlock against a concurrent * deletion of the owning object.) */ ReleaseDeletionLock(object); AcquireDeletionLock(&otherObject, 0); > 3. The following existing comment indicates that we'll be creating a > disk file for the index. Should we update this comment to say that > this is the case only for RELKIND_INDEX? > Maybe: > > /* > * create the index relation's relcache entry and for non-partitioned > * indexes, a physical disk file. (If we fail further down, it's the > * smgr's responsibility to remove the disk file again.) > */ I used this: /* * create the index relation's relcache entry and, if necessary, the * physical disk file. (If we fail further down, it's the smgr's * responsibility to remove the disk file again, if any.) */ > 5. Do you think it's out of scope to support expression indexes? Answered above. > 6. pg_inherits.inhseqno is int, not smallint. I understand you copied > this from StoreCatalogInheritance1(), so maybe a fix should be put in > before this patch is? Hmm, yeah, will fix. > 7. Is the following a bug fix? If so, should it not be fixed and > backpatched, rather than sneaked in here? > > @@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid > newOwnerId, bool recursing, LOCKMODE lock > * relation, as well as its toast table (if it has one). > */ > if (tuple_class->relkind == RELKIND_RELATION || > + tuple_class->relkind == RELKIND_PARTITIONED_TABLE || No, it's not a bug fix -- this block is only concerned with searching index for this relkind, and prior to this patch PARTITIONED_TABLEs do not have indexes, so it's ok not to consider the case. > 9. The first OR branch of the Assert will always be false, so might as > well get rid of it. True. Removed. > 12. It seems to be RangeVarGetRelidExtended() that locks the partition > index, not the callback. > > /* no deadlock risk: our callback above already acquired the lock */ Hmm ... yeah, you're right. Fixed. > 13. We seem to only be holding an AccessShareLock on the partitioned > table. What happens if someone concurrently does ALTER TABLE > partitioned_table DETACH our_partition;? > > parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock); > > and; > /* Make sure it indexes a partition of the other index's table */ > partDesc = RelationGetPartitionDesc(parentTbl); > found = false; > for (i = 0; i < partDesc->nparts; i++) > { > if (partDesc->oids[i] == state.partitionOid) > { > found = true; > break; > } > } > > Wouldn't you also need an AccessExclusiveLock to prevent another > session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index; > at the same time? The concurrent session could be trying to ATTACH it > to a subpartition and this comment could be attaching to the parent. But AccessShare does conflict with AccessExclusive. So even with just AccessShare we've already prevented others from detaching/attaching partitions. I think I should spend a bit of time going over the locking considerations again and make sure it is all sound. > 14. validatePartitionedIndex does not verify that the index it is > checking is valid. Consider the following: > > create table part (a int, b int) partition by list(a); > create table part1 partition of part for values in(1) partition by list(b) > create table part1 partition of part for values in(1) partition by list(b); > create table part2 partition of part1 for values in (1); > create index on only part1 (a); > create index on only part (a); > alter index part_a_idx attach partition part1_a_idx; > \d part > Table "public.part" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > a | integer | | | > b | integer | | | > Partition key: LIST (a) > Indexes: > "part_a_idx" btree (a) <--- should be INVALID > Number of partitions: 1 (Use \d+ to list them.) > > \d part1 > Table "public.part1" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > a | integer | | | > b | integer | | | > Partition of: part FOR VALUES IN (1) > Partition key: LIST (b) > Indexes: > "part1_a_idx" btree (a) INVALID > Number of partitions: 1 (Use \d+ to list them.) > > But that probably means you also need more code to search up to parent > partitions and try and validate any invalid indexes once a > sub-partition index is made valid. > > As, if the part_a_idx had correctly been marked as INVALID, and then we do: > > CREATE INDEX part2_a_idx ON part2 (a); > ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx; > > This would validate part1_a_idx, but we'd also then need to attempt > validation on part_a_idx. (I'm still reading, so perhaps you've > already thought of that.) In prior incarnations of the patch, I had an if-test to prevent attaching invalid indexes, but I decided to remove it at some point -- mainly thinking of attaching a partition for which a CREATE INDEX CONCURRENTLY was running which already had the index as invalid and was later expected to become valid. I suppose that doesn't really work anyway because of locking considerations (you can't attach a partition in which CIC is concurrently running, can you). I'll think some more about this case and post an updated version later. > 15. Can you explain why you do the following for CREATE INDEX, but not > for CREATE INDEX IF NOT EXISTS? I cannot -- just an oversight. Fixed. It's true that makeNode() fixes the hole, but we're supposed to be strict about initializing structures. > 17. Just a stylistic thing; I think it would be neater to set isindex > using the existing switch(). It would just need some case shuffling. > Or maybe it's even better to set relation->rd_amroutine->amoptions > into a local variable and use it unconditionally in the call to > extractRelOptions(). Yeah, that last suggestion leads to simpler code, so I did it that way. > 18. At most it can't do any harm, but is the following still needed? I > originally thought this would have been for pg_index changes. What's > changed? > > -#define CATALOG_VERSION_NO 201712251 > +#define CATALOG_VERSION_NO 201712291 Yeah, probably not needed anymore. Will lose it. > The tests seem fine, but I'd like to see a test for #14 once that's fixed. Sure thing. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: