Re: [HACKERS] Proposal: Local indexes for partitioned table - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] Proposal: Local indexes for partitioned table
Date
Msg-id 20180110153707.x4uklha5qupwh6qb@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] Proposal: Local indexes for partitioned table  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Proposal: Local indexes for partitioned table
Re: [HACKERS] Proposal: Local indexes for partitioned table
List pgsql-hackers
David Rowley wrote:

> Hi Álvaro,

Hi David,

Thanks for the review.  Attached is a delta patch that fixes most
things, except your item 14 below.

Before getting into the details of the items you list, in this version I
also fixed a couple of issues noted by Jaime Casanova; namely

a) ALTER INDEX SET TABLESPACE threw an error for partitioned indexes.
Handled the same was as for partitioned tables: silently do nothing.

b) Expression indexes not considered at all.  (You also commented on
this point).  Jaime reported it for this case:
  create index on t_part USING gin (string_to_array(btrim((t)::text, '-'::text), '--'::text));

I dealt with this by adding code to CompareIndexInfo that runs equal()
on the expression lists, after applying map_variable_attnos() (to
support the case of attaching a table with dropped or reordered
columns).  As far as I can see, this works correctly for very simple
expressions, though I didn't test Jaime's case specifically; I suppose I
should add a few more tests.

On to your notes:

> 1. "either partition" -> "either the partition"
> 4. Extra newline
> 8. Missing if (attachInfos) pfree(attachInfos);
> 10. lock -> locks:
> 11. "already the right" -> "already in the right"
> 16. It's not exactly a problem, but I had a quick look and I don't see
> any other uses of list_free() checking for a non-NIL list first:

Fixed.

> 2. In findDependentObjects(), should the following if test be below
> the ReleaseDeletionLock call?
> 
> + if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO)
> + break;
>   ReleaseDeletionLock(object);

No, as far as I understand we should keep that lock if we break out of
the loop -- it's the lock on the object being deleted (the index
partition in this case).

I did break the comment in two, so that now the "First," paragraph
appears below the "if () break" test.  That seems to make more sense.
It looks like this now:

                /*
                 * 3. Not all the owning objects have been visited, so
                 * transform this deletion request into a delete of this
                 * owning object.
                 *
                 * For INTERNAL_AUTO dependencies, we don't enforce this;
                 * in other words, we don't follow the links back to the
                 * owning object.
                 */
                if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO)
                    break;

                /*
                 * First, release caller's lock on this object and get
                 * deletion lock on the owning object.  (We must release
                 * caller's lock to avoid deadlock against a concurrent
                 * deletion of the owning object.)
                 */
                ReleaseDeletionLock(object);
                AcquireDeletionLock(&otherObject, 0);


> 3. The following existing comment indicates that we'll be creating a
> disk file for the index. Should we update this comment to say that
> this is the case only for RELKIND_INDEX?

> Maybe:
> 
> /*
> * create the index relation's relcache entry and for non-partitioned
> * indexes, a physical disk file. (If we fail further down, it's the
> * smgr's responsibility to remove the disk file again.)
> */

I used this:
    /*
     * create the index relation's relcache entry and, if necessary, the
     * physical disk file. (If we fail further down, it's the smgr's
     * responsibility to remove the disk file again, if any.)
     */

> 5. Do you think it's out of scope to support expression indexes?

Answered above.

> 6. pg_inherits.inhseqno is int, not smallint. I understand you copied
> this from StoreCatalogInheritance1(), so maybe a fix should be put in
> before this patch is?

Hmm, yeah, will fix.

> 7. Is the following a bug fix? If so, should it not be fixed and
> backpatched, rather than sneaked in here?
> 
> @@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid
> newOwnerId, bool recursing, LOCKMODE lock
>   * relation, as well as its toast table (if it has one).
>   */
>   if (tuple_class->relkind == RELKIND_RELATION ||
> + tuple_class->relkind == RELKIND_PARTITIONED_TABLE ||

No, it's not a bug fix -- this block is only concerned with searching
index for this relkind, and prior to this patch PARTITIONED_TABLEs do
not have indexes, so it's ok not to consider the case.

> 9. The first OR branch of the Assert will always be false, so might as
> well get rid of it.

True.  Removed.

> 12. It seems to be RangeVarGetRelidExtended() that locks the partition
> index, not the callback.
> 
> /* no deadlock risk: our callback above already acquired the lock */

Hmm ... yeah, you're right.  Fixed.

> 13. We seem to only be holding an AccessShareLock on the partitioned
> table. What happens if someone concurrently does ALTER TABLE
> partitioned_table DETACH our_partition;?
> 
> parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock);
> 
> and;
> /* Make sure it indexes a partition of the other index's table */
> partDesc = RelationGetPartitionDesc(parentTbl);
> found = false;
> for (i = 0; i < partDesc->nparts; i++)
> {
> if (partDesc->oids[i] == state.partitionOid)
> {
> found = true;
> break;
> }
> }
> 
> Wouldn't you also need an AccessExclusiveLock to prevent another
> session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index;
> at the same time? The concurrent session could be trying to ATTACH it
> to a subpartition and this comment could be attaching to the parent.

But AccessShare does conflict with AccessExclusive.  So even with just
AccessShare we've already prevented others from detaching/attaching
partitions.

I think I should spend a bit of time going over the locking
considerations again and make sure it is all sound.

> 14. validatePartitionedIndex does not verify that the index it is
> checking is valid. Consider the following:
> 
> create table part (a int, b int) partition by list(a);
> create table part1 partition of part for values in(1) partition by list(b)
> create table part1 partition of part for values in(1) partition by list(b);
> create table part2 partition of part1 for values in (1);
> create index on only part1 (a);
> create index on only part (a);
> alter index part_a_idx attach partition part1_a_idx;
> \d part
>                 Table "public.part"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  a      | integer |           |          |
>  b      | integer |           |          |
> Partition key: LIST (a)
> Indexes:
>     "part_a_idx" btree (a) <--- should be INVALID
> Number of partitions: 1 (Use \d+ to list them.)
> 
> \d part1
>                Table "public.part1"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  a      | integer |           |          |
>  b      | integer |           |          |
> Partition of: part FOR VALUES IN (1)
> Partition key: LIST (b)
> Indexes:
>     "part1_a_idx" btree (a) INVALID
> Number of partitions: 1 (Use \d+ to list them.)
> 
> But that probably means you also need more code to search up to parent
> partitions and try and validate any invalid indexes once a
> sub-partition index is made valid.
> 
> As, if the part_a_idx had correctly been marked as INVALID, and then we do:
> 
> CREATE INDEX part2_a_idx ON part2 (a);
> ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx;
> 
> This would validate part1_a_idx, but we'd also then need to attempt
> validation on part_a_idx. (I'm still reading, so perhaps you've
> already thought of that.)

In prior incarnations of the patch, I had an if-test to prevent
attaching invalid indexes, but I decided to remove it at some point --
mainly thinking of attaching a partition for which a CREATE INDEX
CONCURRENTLY was running which already had the index as invalid and was
later expected to become valid.  I suppose that doesn't really work
anyway because of locking considerations (you can't attach a partition
in which CIC is concurrently running, can you).  I'll think some more
about this case and post an updated version later.

> 15. Can you explain why you do the following for CREATE INDEX, but not
> for CREATE INDEX IF NOT EXISTS?

I cannot -- just an oversight.  Fixed.  It's true that makeNode() fixes
the hole, but we're supposed to be strict about initializing structures.

> 17. Just a stylistic thing; I think it would be neater to set isindex
> using the existing switch(). It would just need some case shuffling.

> Or maybe it's even better to set relation->rd_amroutine->amoptions
> into a local variable and use it unconditionally in the call to
> extractRelOptions().

Yeah, that last suggestion leads to simpler code, so I did it that way.

> 18. At most it can't do any harm, but is the following still needed? I
> originally thought this would have been for pg_index changes. What's
> changed?
> 
> -#define CATALOG_VERSION_NO 201712251
> +#define CATALOG_VERSION_NO 201712291

Yeah, probably not needed anymore.  Will lose it.

> The tests seem fine, but I'd like to see a test for #14 once that's fixed.

Sure thing.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: CUBE seems a bit confused about ORDER BY
Next
From: Simon Riggs
Date:
Subject: Re: Rangejoin rebased