Thread: [HACKERS] Proposal: Local indexes for partitioned table

[HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
Hi hackers!

As I've understood from thread [1] the main issue of creating local 
indexes for partitions is supporting REINDEX and DROP INDEX operations 
on parent partitioned tables. Furthermore Robert Haas mentioned the 
problem of creating index on key that is represented in partitions with 
single value (or primitive interval) [1] i.e. under the 
list-partitioning or range-partitioning with unit interval.

I would like to propose the following solution:

1. Create index for hierarchy of partitioned tables and partitions 
recursively. Don't create relfilenode for indexes on parents, only 
entries in catalog (much like the partitioned table's storage 
elimination in [2]). Abstract index for partitioned tables is only for 
the reference on indexes of child tables to perform REINDEX and DROP 
INDEX operations.

2. Specify created indexes in pg_depend table so that indexes of child 
tables depend on corresponding indexes of parent tables with type of 
dependency DEPENDENCY_NORMAL so that index could be removed separately 
for partitions and recursively/separately for partitioned tables.

3. REINDEX on index of partitioned table would perform this operation on 
existing indexes of corresponding partitions. In this case it is 
necessary to consider such operations as REINDEX SCHEMA | DATABASE | 
SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times 
in a row.

Any thoughts?

1. 
https://www.postgresql.org/message-id/CA+TgmoZUwj=QYnaK+F7xEf4w_e2g3XxdMnSNZMZjuinHRcOB8A@mail.gmail.com
2. 
https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp

-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Wed, Mar 1, 2017 at 4:23 PM, Maksim Milyutin
<m.milyutin@postgrespro.ru> wrote:
> As I've understood from thread [1] the main issue of creating local indexes
> for partitions is supporting REINDEX and DROP INDEX operations on parent
> partitioned tables. Furthermore Robert Haas mentioned the problem of
> creating index on key that is represented in partitions with single value
> (or primitive interval) [1] i.e. under the list-partitioning or
> range-partitioning with unit interval.
>
> I would like to propose the following solution:
>
> 1. Create index for hierarchy of partitioned tables and partitions
> recursively. Don't create relfilenode for indexes on parents, only entries
> in catalog (much like the partitioned table's storage elimination in [2]).
> Abstract index for partitioned tables is only for the reference on indexes
> of child tables to perform REINDEX and DROP INDEX operations.
>
> 2. Specify created indexes in pg_depend table so that indexes of child
> tables depend on corresponding indexes of parent tables with type of
> dependency DEPENDENCY_NORMAL so that index could be removed separately for
> partitions and recursively/separately for partitioned tables.
>
> 3. REINDEX on index of partitioned table would perform this operation on
> existing indexes of corresponding partitions. In this case it is necessary
> to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that
> partitions' indexes wouldn't be re-indexed multiple times in a row.
>
> Any thoughts?

Sounds generally good.  One thing to keep in mind is that - in this
system - a UNIQUE index on the parent doesn't actually guarantee
uniqueness across the whole partitioning hierarchy unless it so
happens that the index columns or expressions are the same as the
partitioning columns or expressions.  That's a little a
counterintuitive, and people have already been complaining that a
partitioned table + partitions doesn't look enough like a plain table.
However, I'm not sure there's a better alternative, because somebody
might want partition-wise unique indexes even though that doesn't
guarantee global uniqueness.  So I think if someday we have global
indexes, then we can plan to use some other syntax for that, like
CREATE GLOBAL [ UNIQUE ] INDEX.

But, of course, that's just my opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
On 02.03.2017 11:41, Robert Haas wrote:
> Sounds generally good.  One thing to keep in mind is that - in this
> system - a UNIQUE index on the parent doesn't actually guarantee
> uniqueness across the whole partitioning hierarchy unless it so
> happens that the index columns or expressions are the same as the
> partitioning columns or expressions.  That's a little a
> counterintuitive, and people have already been complaining that a
> partitioned table + partitions doesn't look enough like a plain table.
> However, I'm not sure there's a better alternative, because somebody
> might want partition-wise unique indexes even though that doesn't
> guarantee global uniqueness.  So I think if someday we have global
> indexes, then we can plan to use some other syntax for that, like
> CREATE GLOBAL [ UNIQUE ] INDEX.

Yes, I absolutely agree with your message that cross-partition 
uniqueness is guaranteed through global index on partitioned table apart 
from the case when the index key are the same as partitioning key (or 
index comprises partitioning key in general).

Thanks for your comment. I'll try to propose the first patches as soon 
as possible.

-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
On 01.03.2017 13:53, Maksim Milyutin wrote:
> Hi hackers!
>
> As I've understood from thread [1] the main issue of creating local
> indexes for partitions is supporting REINDEX and DROP INDEX operations
> on parent partitioned tables. Furthermore Robert Haas mentioned the
> problem of creating index on key that is represented in partitions with
> single value (or primitive interval) [1] i.e. under the
> list-partitioning or range-partitioning with unit interval.
>
> I would like to propose the following solution:
>
> 1. Create index for hierarchy of partitioned tables and partitions
> recursively. Don't create relfilenode for indexes on parents, only
> entries in catalog (much like the partitioned table's storage
> elimination in [2]). Abstract index for partitioned tables is only for
> the reference on indexes of child tables to perform REINDEX and DROP
> INDEX operations.
>
> 2. Specify created indexes in pg_depend table so that indexes of child
> tables depend on corresponding indexes of parent tables with type of
> dependency DEPENDENCY_NORMAL so that index could be removed separately
> for partitions and recursively/separately for partitioned tables.
>
> 3. REINDEX on index of partitioned table would perform this operation on
> existing indexes of corresponding partitions. In this case it is
> necessary to consider such operations as REINDEX SCHEMA | DATABASE |
> SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times
> in a row.
>
> Any thoughts?
>
> 1.
> https://www.postgresql.org/message-id/CA+TgmoZUwj=QYnaK+F7xEf4w_e2g3XxdMnSNZMZjuinHRcOB8A@mail.gmail.com
>
> 2.
> https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp
>
>

I want to present the first version of patches that implement local 
indexes for partitioned tables and discuss some technical details of 
that implementation.


1. I have added a new relkind for local indexes named 
RELKIND_LOCAL_INDEX (literal 'l').

This was done because physical storage is created in the 'heap_create' 
function and we need to revoke the creating storage as with partitioned 
tables. Since information that this index belongs to partitioned tables 
is not available in 'heap_create' function (pg_index entry on the index 
is not created yet) I chose the least painful way - added a specific 
relkind for index on partitioned table.
I suppose that this act will require the integrating new relkind to 
different places of source code so I'm ready to consider another 
proposals on this point.

2. My implementation doesn't support the concurrent creating of local 
index (CREATE INDEX CONCURRENTLY). As I understand, this operation 
involves nontrivial manipulation with snapshots and I don't know how to 
implement concurrent creating of multiple indexes. In this point I ask 
help from community.

3. As I noticed early pg_depend table is used for cascade deleting 
indexes on partitioned table and its children. I also use pg_depend to 
determine relationship between parent and child indexes when reindex 
executes recursively on child indexes.

Perhaps, it's not good way to use pg_depend to determine the 
relationship between parent and child indexes because the kind of this 
relationship is not defined. I could propose to add into pg_index table 
specific field of 'oidvector' type that specify oids of dependent 
indexes for the current local index.


On this stage I want to discuss only technical details of local indexes' 
implementation. The problems related to merging existing indexes of 
partitions within local index tree, determination uniqueness of field in 
global sense through local index and syntax notes I want to arise later.


CC welcome!

-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Greg Stark
Date:
On 4 April 2017 at 17:10, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote:
>
> 3. As I noticed early pg_depend table is used for cascade deleting indexes
> on partitioned table and its children. I also use pg_depend to determine
> relationship between parent and child indexes when reindex executes
> recursively on child indexes.
>
> Perhaps, it's not good way to use pg_depend to determine the relationship
> between parent and child indexes because the kind of this relationship is
> not defined. I could propose to add into pg_index table specific field of
> 'oidvector' type that specify oids of dependent indexes for the current
> local index.


Alternately you could have an single oid in pg_index on each of the
children that specifies which local index is its parent. That would
probably require a new index on that column so you could look up all
the children efficiently.

I think it would behave more sensibly when you're adding or removing a
partition, especially if you want to add many partitions in parallel
using multiple transactions. An oidvector of children would
effectively mean you could only be doing one partition creation or
deletion at a time.

-- 
greg



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
<m.milyutin@postgrespro.ru> wrote:
> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
> (literal 'l').

Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
nothing particularly "local" about it.  I suppose what you're going
for is that it's not global, but in a way it *is* global to the
partitioning hierarchy.  That's the point.  It's just that it's
partitioned.

> Perhaps, it's not good way to use pg_depend to determine the relationship
> between parent and child indexes because the kind of this relationship is
> not defined. I could propose to add into pg_index table specific field of
> 'oidvector' type that specify oids of dependent indexes for the current
> local index.

I agree with Greg's comment on this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
On 10.04.2017 13:46, Greg Stark wrote:
> On 4 April 2017 at 17:10, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote:
>>
>> 3. As I noticed early pg_depend table is used for cascade deleting indexes
>> on partitioned table and its children. I also use pg_depend to determine
>> relationship between parent and child indexes when reindex executes
>> recursively on child indexes.
>>
>> Perhaps, it's not good way to use pg_depend to determine the relationship
>> between parent and child indexes because the kind of this relationship is
>> not defined. I could propose to add into pg_index table specific field of
>> 'oidvector' type that specify oids of dependent indexes for the current
>> local index.
>
>
> Alternately you could have an single oid in pg_index on each of the
> children that specifies which local index is its parent. That would
> probably require a new index on that column so you could look up all
> the children efficiently.
>
> I think it would behave more sensibly when you're adding or removing a
> partition, especially if you want to add many partitions in parallel
> using multiple transactions. An oidvector of children would
> effectively mean you could only be doing one partition creation or
> deletion at a time.
>

Thanks for your comment. Your approach sounds better than mine. I'll try it.

-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
On 10.04.2017 14:20, Robert Haas wrote:
> On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
> <m.milyutin@postgrespro.ru> wrote:
>> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
>> (literal 'l').
>
> Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
> nothing particularly "local" about it.  I suppose what you're going
> for is that it's not global, but in a way it *is* global to the
> partitioning hierarchy.  That's the point.  It's just that it's
> partitioned.
>

Ok, thanks for the note.

But I want to discuss the relevancy of introduction of a new relkind for 
partitioned index. I could to change the control flow in partitioned 
index creation (specify conditional statement in the 'index_create' 
routine in attached patch) and not enter to the 'heap_create' routine. 
This case releases us from integrating new relkind into different places 
of Postgres code. But we have to copy-paste some specific code from 
'heap_create' function, e.g., definition of relfilenode and tablespaceid 
for the new index and perhaps something more when 'heap_create' routine 
will be extended.

What do you think about this way?


-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Amit Langote
Date:
Hi,

On 2017/04/17 23:00, Maksim Milyutin wrote:
> On 10.04.2017 14:20, Robert Haas wrote:
>> On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin
>> <m.milyutin@postgrespro.ru> wrote:
>>> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX
>>> (literal 'l').
>>
>> Seems like it should maybe be RELKIND_PARTITIONED_INDEX.  There's
>> nothing particularly "local" about it.  I suppose what you're going
>> for is that it's not global, but in a way it *is* global to the
>> partitioning hierarchy.  That's the point.  It's just that it's
>> partitioned.
>>
> 
> Ok, thanks for the note.
> 
> But I want to discuss the relevancy of introduction of a new relkind for
> partitioned index. I could to change the control flow in partitioned index
> creation (specify conditional statement in the 'index_create' routine in
> attached patch) and not enter to the 'heap_create' routine. This case
> releases us from integrating new relkind into different places of Postgres
> code. But we have to copy-paste some specific code from 'heap_create'
> function, e.g., definition of relfilenode and tablespaceid for the new
> index and perhaps something more when 'heap_create' routine will be extended.

I may be missing something, but isn't it that a new relkind will be needed
anyway?  How does the rest of the code distinguish such index objects once
they are created?  Is it possible that some other code may try to access
the storage for an index whose indrelid is a partitioned table?

Thanks,
Amit




Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
On 18.04.2017 13:08, Amit Langote wrote:
> Hi,
>

Hi, Amit!

> On 2017/04/17 23:00, Maksim Milyutin wrote:
>>
>> Ok, thanks for the note.
>>
>> But I want to discuss the relevancy of introduction of a new relkind for
>> partitioned index. I could to change the control flow in partitioned index
>> creation (specify conditional statement in the 'index_create' routine in
>> attached patch) and not enter to the 'heap_create' routine. This case
>> releases us from integrating new relkind into different places of Postgres
>> code. But we have to copy-paste some specific code from 'heap_create'
>> function, e.g., definition of relfilenode and tablespaceid for the new
>> index and perhaps something more when 'heap_create' routine will be extended.
>
> I may be missing something, but isn't it that a new relkind will be needed
> anyway?  How does the rest of the code distinguish such index objects once
> they are created?

Local partitioned indexes can be recognized through the check on the 
relkind of table to which the index refers. Something like this:

heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)    /* indexid is local index on partitioned table */

> Is it possible that some other code may try to access
> the storage for an index whose indrelid is a partitioned table?
>

Thеsе cases must be caught. But as much as partitioned tables doesn't 
participate in query plans their indexes are unaccessible by executor. 
Reindex operation is overloaded with my patch.


-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Ashutosh Bapat
Date:
On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin
<m.milyutin@postgrespro.ru> wrote:
> On 18.04.2017 13:08, Amit Langote wrote:
>>
>> Hi,
>>
>
> Hi, Amit!
>
>> On 2017/04/17 23:00, Maksim Milyutin wrote:
>>>
>>>
>>> Ok, thanks for the note.
>>>
>>> But I want to discuss the relevancy of introduction of a new relkind for
>>> partitioned index. I could to change the control flow in partitioned
>>> index
>>> creation (specify conditional statement in the 'index_create' routine in
>>> attached patch) and not enter to the 'heap_create' routine. This case
>>> releases us from integrating new relkind into different places of
>>> Postgres
>>> code. But we have to copy-paste some specific code from 'heap_create'
>>> function, e.g., definition of relfilenode and tablespaceid for the new
>>> index and perhaps something more when 'heap_create' routine will be
>>> extended.
>>
>>
>> I may be missing something, but isn't it that a new relkind will be needed
>> anyway?  How does the rest of the code distinguish such index objects once
>> they are created?
>
>
> Local partitioned indexes can be recognized through the check on the relkind
> of table to which the index refers. Something like this:
>
> heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
> if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>     /* indexid is local index on partitioned table */

An index on partitioned table can be global index (yet to be
implemented) or a local index. We can not differentiate between those
just by looking at the relation on which they are built.

>
>> Is it possible that some other code may try to access
>> the storage for an index whose indrelid is a partitioned table?
>>
>
> Thеsе cases must be caught. But as much as partitioned tables doesn't
> participate in query plans their indexes are unaccessible by executor.
> Reindex operation is overloaded with my patch.
>

A global index would have storage for a partitioned table whereas a
local index wouldn't have any storage for a partitioned table.

I agree with Amit that we need new relkinds for local as well as global indexes.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
On 19.04.2017 11:42, Ashutosh Bapat wrote:
> On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin
> <m.milyutin@postgrespro.ru> wrote:
>>
>> Local partitioned indexes can be recognized through the check on the relkind
>> of table to which the index refers. Something like this:
>>
>> heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
>> if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>>     /* indexid is local index on partitioned table */
>
> An index on partitioned table can be global index (yet to be
> implemented) or a local index. We can not differentiate between those
> just by looking at the relation on which they are built.
>

We could to refine the criteria for the local partitioned index later 
encapsulating it in a macro, e.g., adding a new flag from pg_index that 
differentiate the type of index on partitioned table.


>> Thеsе cases must be caught. But as much as partitioned tables doesn't
>> participate in query plans their indexes are unaccessible by executor.
>> Reindex operation is overloaded with my patch.
>>
>
> A global index would have storage for a partitioned table whereas a
> local index wouldn't have any storage for a partitioned table.
>
> I agree with Amit that we need new relkinds for local as well as global indexes.
>

Ok, thanks for the feedback. Then I'll use a new relkind for local 
partitioned index in further development.



-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin
<m.milyutin@postgrespro.ru> wrote:
> Ok, thanks for the feedback. Then I'll use a new relkind for local
> partitioned index in further development.

Any update on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
10.08.17 23:01, Robert Haas wrote:

> On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin
> <m.milyutin@postgrespro.ru> wrote:
>> Ok, thanks for the feedback. Then I'll use a new relkind for local
>> partitioned index in further development.
> Any update on this?
>

I'll continue to work soon. Sorry for so long delay.

-- 
Regards,
Maksim Milyutin




Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hello

I've been thinking about this issue too.  I think your patch is not
ambitious enough.  Here's my brain dump on the issue, to revive
discussion.

As you propose, IMO this new feature would use the standard index
creation syntax:  CREATE [UNIQUE] INDEX someindex ON parted_table (a, b);

This command currently throws an error.  We'd have it do two things:

1. create catalog rows in pg_class and pg_index for the main table,  indicating the existance of this partitioned
index. These would not  point to an actual index (since the main table itself is empty), but  instead to an "abstract"
index. This abstract index can be used by  various operations; see below.
 

2. create one index for each existing partition.  These would be  identical to what would happen if you created the
indexdirectly on  each partition, except that there is an additional dependency to the  parent's abstract index.
 

If partitions are themselves partitioned, we would recursively apply the
indexes to the sub-partitions by doing (1) above for the partitioned
partition.

Once the index has been created for all existing partitions, the
hierarchy-wide index becomes valid and can be used normally by the
planner/executor.  I think we could use the pg_index.indisvalid property
for the abstract index for this.

I propose that an index declared UNIQUE throws an error for now.  We can
implement uniqueness (for the case where the indexed columns match the
partitioning key) later, once we sort out all the issues here first.  I
think unique indexes would be very useful even with that limitation, but
let's have it as a separate project.

I think using pg_depend as the mechanism to link partition indexes to
parent is a bad idea.  How about pg_inherits instead?  Seems more
appropriate.


Creating hierachy-wide indexes for existing partitioned tables is likely
to take a long time, so we must include CONCURRENTLY as an option.  This
will need some transaction machinery support in order to ensure that
each partition gets its index created at some point in a long chain of
transactions, and that the whole thing is marked valid only at the end.
Also, if the creation is interrupted (because of a crash or a regular
shutdown), it'll be useful to be able to continue rather than being
forced to start from scratch.

When a new partition is added, indexes satisfying the partitioned
table's abstract indexes are created automatically.

During ALTER TABLE ... ATTACH PARTITION, we check that an indexing
satisfying the abstract index exist (and we create a pg_inherit link).
If not, the command is aborted.

REINDEX is easily supported (just reindex each partition's index
individually), but that command is blocking, which is not good.  For
concurrent operation for tables not partitioned, a typical pattern to
avoid blocking the table is to suggest creation of an identical index
using CREATE INDEX CONCURRENTLY, then drop the original one.  That's not
going to work with these partitioned indexes, which is going to be a
problem.  I don't have any great ideas about this part yet.

We need to come up with some way to generate names for each partition
index.


I am going to work on this now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:

Hi!

On 06.10.2017 19:37, Alvaro Herrera wrote:
As you propose, IMO this new feature would use the standard index
creation syntax:  CREATE [UNIQUE] INDEX someindex ON parted_table (a, b);

This command currently throws an error.  We'd have it do two things:

1. create catalog rows in pg_class and pg_index for the main table,  indicating the existance of this partitioned index.  These would not  point to an actual index (since the main table itself is empty), but  instead to an "abstract" index.  This abstract index can be used by  various operations; see below.

Robert Haas proposed[1] to use the name "partitioned index" (instead of abstract) that have to be reflected in 'relkind' field of pg_class as the RELKIND_PARTITIONED_INDEX value.

I propose that an index declared UNIQUE throws an error for now.  We can
implement uniqueness (for the case where the indexed columns match the
partitioning key) later, once we sort out all the issues here first.  I
think unique indexes would be very useful even with that limitation, but
let's have it as a separate project.

Yes, global uniqueness through local unique indexes causes further work related with foreign keys on partitioned table, full support of INSERT OF CONFLICT, etc. It make sense to implement after the current stage of work.

I think using pg_depend as the mechanism to link partition indexes to
parent is a bad idea.  How about pg_inherits instead?  Seems more
appropriate.

Greg Stark proposed[2] to use new pg_index field of oid type that refers to the parent pg_index item. AFAIC pg_inherits also makes sense but semantically it deals with inheriting tables. IMHO the using of this catalog table to define relation between partitioned table and partitions looks like a hack to make use of constraint exclusion logic for partition pruning.

Creating hierachy-wide indexes for existing partitioned tables is likely
to take a long time, so we must include CONCURRENTLY as an option.  This
will need some transaction machinery support in order to ensure that
each partition gets its index created at some point in a long chain of
transactions, and that the whole thing is marked valid only at the end.
Also, if the creation is interrupted (because of a crash or a regular
shutdown), it'll be useful to be able to continue rather than being
forced to start from scratch.

This option was very difficult for me. I would be interested to see the implementation.

During ALTER TABLE ... ATTACH PARTITION, we check that an indexing
satisfying the abstract index exist (and we create a pg_inherit link).
If not, the command is aborted.

We could create necessary index for partition, not abort ALTER TABLE ... ATTACH PARTITION statement.

We need to come up with some way to generate names for each partition
index.

I think the calling 'ChooseIndexName(RelationGetRelationName(childrel), namespaceId, indexColNames, ...)' resolves this problem.

I am going to work on this now.

It will be great! I think this project is difficult for me on the part of integration described above functionality with the legacy postgres code. Also IMO this project is very important because it opens the way for such feature as global uniqueness of fields of partitioned tables. And any protraction in implementation is bad.

I would like to review and test your intermediate results.


1. https://www.postgresql.org/message-id/CA%2BTgmoY5UOUnW%3DMcwT7xUB_2W5dAkvOg5kD20Spx5gF-Ad47cA%40mail.gmail.com
2. https://www.postgresql.org/message-id/CAM-w4HOVftuv5RVi3a%2BsRV6nBpg204w7%3DL8MwPXVvYBFo1uM1Q%40mail.gmail.com
-- 
Regards,
Maksim Milyutin

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 2. create one index for each existing partition.  These would be
>    identical to what would happen if you created the index directly on
>    each partition, except that there is an additional dependency to the
>    parent's abstract index.

One thing I'm a bit worried about is how to name these subordinate
indexes.  They have to have names because that's how pg_class works,
and those names can't all be the same, again because that's how
pg_class works.  There's no problem right away when you first create
the partitioned index, because you can just pick names out of a hat
using whatever name-generating algorithm seems best.  However, when
you dump-and-restore (including but not limited to the pg_upgrade
case) you've got to preserve those names.  If you just generate a new
name that may or may not be the same as the old one, then it may
collide with a user-specified name that only occurs later in the dump.
Also, you'll have trouble if the user has applied a COMMENT or a
SECURITY LABEL to the index because that command works by name, or if
the user has a reference to the index name inside a function or
whatever.

These are pretty annoying corner-case bugs because they're not likely
to come up very often.  Most people won't notice or care if the index
name changes.  But I don't think it's acceptable to just ignore the
problem.  An idea I had was to treat the abstract index - to use your
term - sort of the way we treat an extension.  Normally, when you
create an index on a partitioned table, it cascades, but for dump and
restore purpose, we tag on some syntax that says "well, don't actually
create the subordinate indexes, i'll tell you about those later".
Then for each subordinate index we issue a separate CREATE INDEX
command followed by ALTER INDEX abstract_index ATTACH PARTITION
concrete_index or something of that sort.  That means you can't
absolutely count on the parent index to have all of the children it's
supposed to have but maybe that's OK.

Another thing that would let you do is CREATE INDEX CONCURRENTLY
replacement_concrete_index; ALTER INDEX abstract_index DETACH
PARTITION old_concrete_index, ATTACH PARTITION
replacement_concrete_index; DROP INDEX CONCURRENTLY
old_concrete_index, which seems like a thing someone might want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Maksim Milyutin
Date:
07.10.17 16:34, Robert Haas wrote:

> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> One thing I'm a bit worried about is how to name these subordinate
> indexes.  They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works.  There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best.  However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
>
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often.  Most people won't notice or care if the index
> name changes.  But I don't think it's acceptable to just ignore the
> problem.  An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension.  Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort.  That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

AFAICS, the main problem with naming is generating new unique names for 
subordinate indexes on the stage of migrating data scheme (pg_dump, 
pg_upgrade, etc). And we cannot specify these names in the 'CREATE INDEX 
partitioned_index' statement therefore we have to regenerate their.

In this case I propose to restore index names' hierarchy *bottom-up*, 
i.e. first of all create indexes for the leaf partitions and then create 
ones for parents up to root explicitly specifying names. When creating 
index on parent table we have to check is there exist any index on child 
table that could be child index (identical criteria). If so, not 
generate new index but implicitly attach that index into parent one.
If we have incomplete index hierarchy, e.g. we dropped some indexes of 
partitions previously, then recreating of parent's index would 
regenerate (not attach) indexes for those partitions. We could drop 
those odd generated indexes after building of parent's index. This 
decision is not straightforward but provides to consider 'CREATE INDEX 
paritioned_table' statement as a cascade operation.
As a result, we can specify name for each concrete index while 
recreating a whole hierarchy.

-- 
Regards,
Maksim Milyutin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Ashutosh Bapat
Date:
On Sat, Oct 7, 2017 at 7:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> 2. create one index for each existing partition.  These would be
>>    identical to what would happen if you created the index directly on
>>    each partition, except that there is an additional dependency to the
>>    parent's abstract index.
>
> One thing I'm a bit worried about is how to name these subordinate
> indexes.  They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works.  There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best.  However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
>
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often.  Most people won't notice or care if the index
> name changes.  But I don't think it's acceptable to just ignore the
> problem.  An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension.  Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort.  That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

+1.

How about CREATE INDEX ... PARTITION OF ... FOR TABLE ...? to create
the index and attach it?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > 2. create one index for each existing partition.  These would be
> >    identical to what would happen if you created the index directly on
> >    each partition, except that there is an additional dependency to the
> >    parent's abstract index.
> 
> One thing I'm a bit worried about is how to name these subordinate
> indexes.  They have to have names because that's how pg_class works,
> and those names can't all be the same, again because that's how
> pg_class works.  There's no problem right away when you first create
> the partitioned index, because you can just pick names out of a hat
> using whatever name-generating algorithm seems best.  However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.
> 
> These are pretty annoying corner-case bugs because they're not likely
> to come up very often.  Most people won't notice or care if the index
> name changes.  But I don't think it's acceptable to just ignore the
> problem.

I agree it's a problem that needs to be addressed directly.

> An idea I had was to treat the abstract index - to use your
> term - sort of the way we treat an extension.  Normally, when you
> create an index on a partitioned table, it cascades, but for dump and
> restore purpose, we tag on some syntax that says "well, don't actually
> create the subordinate indexes, i'll tell you about those later".
> Then for each subordinate index we issue a separate CREATE INDEX
> command followed by ALTER INDEX abstract_index ATTACH PARTITION
> concrete_index or something of that sort.  That means you can't
> absolutely count on the parent index to have all of the children it's
> supposed to have but maybe that's OK.

Hmm ... yeah, ATTACH and DETACH sound acceptable to me.  On DETACH, the
abstract index should be marked indisvalid=false unless a substitute
index already exists; and on ATTACH when indisvalid=false we verify that
all local indexes exist, and if so we can flip indisvalid.  That way, we
can continue to rely on the parent index always having all its children
when the flag is set.

I'm not clear on a syntax that creates the main index and hopes to later
have the sub-indexes created.  Another approach is to do it the other
way around, i.e. create the children first, then once they're all in
place create the main one normally, which merely verifies that all the
requisite children exist.  This is related to what I proposed to occur
when a regular table is joined as a partition of the partitioned table:
we run a verification that an index matching the parent's abstract
indexes exists, and if not we raise an error.  (Alternatively we could
allow the case, and mark the abstract index as indisvalid=false, but
that seems to violate POLA).

> Another thing that would let you do is CREATE INDEX CONCURRENTLY
> replacement_concrete_index; ALTER INDEX abstract_index DETACH
> PARTITION old_concrete_index, ATTACH PARTITION
> replacement_concrete_index; DROP INDEX CONCURRENTLY
> old_concrete_index, which seems like a thing someone might want to do.

Yeah, this is a point I explicitly mentioned, and this proposal seems
like a good way.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Oct 10, 2017 at 11:22 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Hmm ... yeah, ATTACH and DETACH sound acceptable to me.  On DETACH, the
> abstract index should be marked indisvalid=false unless a substitute
> index already exists; and on ATTACH when indisvalid=false we verify that
> all local indexes exist, and if so we can flip indisvalid.  That way, we
> can continue to rely on the parent index always having all its children
> when the flag is set.

True.  I don't see an immediate use for that guarantee, actually,
because the index can't be "scanned" either way. But it might be
useful someday, if for example we wanted to try plan large numbers of
children symmetrically rather than (as we do currently) planning each
one individually, or if we've got an idea about making foreign keys
work.  I think you could ignore this for now, though, if you want.

> I'm not clear on a syntax that creates the main index and hopes to later
> have the sub-indexes created.  Another approach is to do it the other
> way around, i.e. create the children first, then once they're all in
> place create the main one normally, which merely verifies that all the
> requisite children exist.  This is related to what I proposed to occur
> when a regular table is joined as a partition of the partitioned table:
> we run a verification that an index matching the parent's abstract
> indexes exists, and if not we raise an error.  (Alternatively we could
> allow the case, and mark the abstract index as indisvalid=false, but
> that seems to violate POLA).

It's not much different than
pg_catalog.binary_upgrade_create_empty_extension but I don't really
care which way pg_dump emits it.

>> Another thing that would let you do is CREATE INDEX CONCURRENTLY
>> replacement_concrete_index; ALTER INDEX abstract_index DETACH
>> PARTITION old_concrete_index, ATTACH PARTITION
>> replacement_concrete_index; DROP INDEX CONCURRENTLY
>> old_concrete_index, which seems like a thing someone might want to do.
>
> Yeah, this is a point I explicitly mentioned, and this proposal seems
> like a good way.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hello

I started with Maksim's submitted code, and developed according to the
ideas discussed in this thread.  Attached is a very WIP patch series for
this feature.

Many things remain to be done before this is committable:  pg_dump
support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
implemented.  No REINDEX support yet.  Docs not updated (but see the
regression test as a guide for how this is supposed to work; see patch
0005).  CREATE INDEX CONCURRENTLY not done yet.

I'm now working on the ability to build unique indexes (and unique
constraints) on top of this.

The docs have not been updated yet, but the new regression test file
illustrates how this works.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> I started with Maksim's submitted code, and developed according to the
> ideas discussed in this thread.  Attached is a very WIP patch series for
> this feature.
>
> Many things remain to be done before this is committable:  pg_dump
> support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
> implemented.  No REINDEX support yet.  Docs not updated (but see the
> regression test as a guide for how this is supposed to work; see patch
> 0005).  CREATE INDEX CONCURRENTLY not done yet.
>
> I'm now working on the ability to build unique indexes (and unique
> constraints) on top of this.

Cool.  Are you planning to do that by (a) only allowing the special
case where the partition key columns/expressions are included in the
indexed columns/expressions, (b) trying to make every insert to any
index check all of the indexes for uniqueness conflicts, or (c)
implementing global indexes?  Because (b) sounds complex - think about
attach operations, for example - and (c) sounds super-hard.  I'd
suggest doing (a) first, just on the basis of complexity.

I hope that you don't get so involved in making this unique index
stuff work that we don't get the cascading index feature, at least,
committed to v11.  That's already a considerable step forward in terms
of ease of use, and I'd really like to have it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
> > I started with Maksim's submitted code, and developed according to the
> > ideas discussed in this thread.  Attached is a very WIP patch series for
> > this feature.
> >
> > Many things remain to be done before this is committable:  pg_dump
> > support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
> > implemented.  No REINDEX support yet.  Docs not updated (but see the
> > regression test as a guide for how this is supposed to work; see patch
> > 0005).  CREATE INDEX CONCURRENTLY not done yet.
> >
> > I'm now working on the ability to build unique indexes (and unique
> > constraints) on top of this.
> 
> Cool.  Are you planning to do that by (a) only allowing the special
> case where the partition key columns/expressions are included in the
> indexed columns/expressions, (b) trying to make every insert to any
> index check all of the indexes for uniqueness conflicts, or (c)
> implementing global indexes?  Because (b) sounds complex - think about
> attach operations, for example - and (c) sounds super-hard.  I'd
> suggest doing (a) first, just on the basis of complexity.

Yes, I think (a) is a valuable thing to have -- not planning on doing
(c) at all because I fear it'll be a huge time sink.  I'm not sure about
(b), but it's not currently on my plan.

> I hope that you don't get so involved in making this unique index
> stuff work that we don't get the cascading index feature, at least,
> committed to v11.  That's already a considerable step forward in terms
> of ease of use, and I'd really like to have it.

Absolutely -- I do plan to get this one finished regardless of unique
indexes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Amit Langote
Date:
On 2017/10/24 1:15, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
>> <alvherre@alvh.no-ip.org> wrote:
>>> I started with Maksim's submitted code, and developed according to the
>>> ideas discussed in this thread.  Attached is a very WIP patch series for
>>> this feature.

Nice!

>>> Many things remain to be done before this is committable:  pg_dump
>>> support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
>>> implemented.  No REINDEX support yet.  Docs not updated (but see the
>>> regression test as a guide for how this is supposed to work; see patch
>>> 0005).  CREATE INDEX CONCURRENTLY not done yet.
>>>
>>> I'm now working on the ability to build unique indexes (and unique
>>> constraints) on top of this.
>>
>> Cool.  Are you planning to do that by (a) only allowing the special
>> case where the partition key columns/expressions are included in the
>> indexed columns/expressions, (b) trying to make every insert to any
>> index check all of the indexes for uniqueness conflicts, or (c)
>> implementing global indexes?  Because (b) sounds complex - think about
>> attach operations, for example - and (c) sounds super-hard.  I'd
>> suggest doing (a) first, just on the basis of complexity.
> 
> Yes, I think (a) is a valuable thing to have -- not planning on doing
> (c) at all because I fear it'll be a huge time sink.  I'm not sure about
> (b), but it's not currently on my plan.

+1 to proceeding with (a) first.

>> I hope that you don't get so involved in making this unique index
>> stuff work that we don't get the cascading index feature, at least,
>> committed to v11.  That's already a considerable step forward in terms
>> of ease of use, and I'd really like to have it.
> 
> Absolutely -- I do plan to get this one finished regardless of unique
> indexes.

+1

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> I'm now working on the ability to build unique indexes (and unique
> constraints) on top of this.

So I think there's not a lot of additional code required to support
unique indexes with the restrictions mentioned; proof-of-concept (with
several holes still) attached.

As before, this is not finished, as there a few things that are wrong
(such as generateClonedIndexStmt taking a RangeVar), and others that I
don't understand (such as why is rd_partkey NULL for partitioned
partitions when DefineIndex cascades to them and the columns are
checked).

I noticed that RelationBuildPartitionKey is generating a partition key
in a temp context, then creating a private context and copying the key
into that.  That seems leftover from some previous iteration of some
other patch; I think it's pretty reasonable to create the new context
right from the start and allocate the key there directly instead.  Then
there's no need for copy_partition_key at all.

Anyway, here is a preliminary submission before I close shop for the
week.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I noticed that RelationBuildPartitionKey is generating a partition key
> in a temp context, then creating a private context and copying the key
> into that.  That seems leftover from some previous iteration of some
> other patch; I think it's pretty reasonable to create the new context
> right from the start and allocate the key there directly instead.  Then
> there's no need for copy_partition_key at all.

We could do that, but the motivation for the current system was to
avoid leaking memory in a long-lived context.  I think the same
technique is used elsewhere for similar reasons.  I admit I haven't
checked whether there would actually be a leak here if we did it as
you propose, but I wouldn't find it at all surprising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> I noticed that RelationBuildPartitionKey is generating a partition key
>> in a temp context, then creating a private context and copying the key
>> into that.  That seems leftover from some previous iteration of some
>> other patch; I think it's pretty reasonable to create the new context
>> right from the start and allocate the key there directly instead.  Then
>> there's no need for copy_partition_key at all.

> We could do that, but the motivation for the current system was to
> avoid leaking memory in a long-lived context.  I think the same
> technique is used elsewhere for similar reasons.  I admit I haven't
> checked whether there would actually be a leak here if we did it as
> you propose, but I wouldn't find it at all surprising.

Another key point is to avoid leaving a corrupted relcache entry behind
if you fail partway through.  I think that the current coding is
RelationBuildPartitionKey is safe, although it's far too undercommented
for my taste.  (The ordering of the last few steps is *critical*.)

It might work to build the new key in a context that's initially a
child of CurrentMemoryContext, then reparent it to be a child of
CacheMemoryContext when done.  But you'd have to look at whether that
would end up wasting long-lived memory, if the build process generates
any cruft in the same context.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
> Robert Haas <robertmhaas@gmail.com> writes:

> > We could do that, but the motivation for the current system was to
> > avoid leaking memory in a long-lived context.

Yeah, my approach here is to use a CATCH block that deletes the memory
context just created, thus avoiding a long-lived leak.

Tom Lane wrote:

> Another key point is to avoid leaving a corrupted relcache entry behind
> if you fail partway through.

Sure ... in the code as I have it we only assign the local variable to
the relcache entry if everything is succesful.  So no relcache
corruption should result.

> It might work to build the new key in a context that's initially a
> child of CurrentMemoryContext, then reparent it to be a child of
> CacheMemoryContext when done. 

That's another way (than the PG_TRY block), but I think it's more
complicated with no gain.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> It might work to build the new key in a context that's initially a
>> child of CurrentMemoryContext, then reparent it to be a child of
>> CacheMemoryContext when done. 

> That's another way (than the PG_TRY block), but I think it's more
> complicated with no gain.

I disagree: PG_TRY blocks are pretty expensive.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Here's a third iteration of this work; I've polished a few rough edges
and made sure that the behavior on ALTER TABLE ATTACH is consistent with
what you get when creating a new partition via CREATE TABLE .. PARTITION OF,
and with what happens when you do CREATE INDEX on an existing
partitioning tree, and added some docs.

The main big item missing is the CONCURRENTLY option to CREATE INDEX.
With that and once I review a couple of minor items marked XXX, I'm
about to call committable.

I also added ONLY to create index: "CREATE INDEX .. ON ONLY tab".
Normally if you create an index on a partitioned table, it recurses;
with this option, it doesn't.  This is used by pg_dump to ensure that
restore doesn't create indexes on partitions that didn't have them on
the original database.  Then the others are attached using the new
command ALTER INDEX ATTACH, as discussed.

For symmetry there's also ALTER INDEX DETACH.  Can be used to replace an
index on a partition with a fresh version, for cases of bloat.


The last patch is in WIP state yet, and mostly untouched since I last
posted it.  It allows creation of unique constraints (and PKs).  But any
interaction with the rest of the system is untested (such as FKs, ON
CONFLICT, etc).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Somehow I managed to include an unrelated patch as attachment.  Here's
another attempt (on which I also lightly touched ddl.sgml with relevant
changes).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Simon Riggs
Date:
On 13 November 2017 at 12:55, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Somehow I managed to include an unrelated patch as attachment.  Here's
> another attempt (on which I also lightly touched ddl.sgml with relevant
> changes).

Looks good. Some minor comments below.

0001- Simplify
Seems useful as separate step; agree with everything, no further comments
Why uint16? Why not just uint?

0003-export
I don't like Assert((heapRel != NULL) ^ OidIsValid(heapRelid));
Standard boolean logic is clearer
I couldn't see why this was a separate patch

0004-Allow
If we do ATTACH PARTITION, do we also need to do a separate ATTACH
INDEX step to allow an index to join the main index? Hopefully not.
The tests seem to indicate to me that it does work that way, just no
docs in altertable.sgml to describe that
typo "they all have a matching indexes" - no plural needed
typo "whether equivalent" - insert "an"
unsure what "regardless of whether this option was specified" means,
probably just remove

0005-Allow
RelationCacheInitializePhase3() asserts that rd_partkey is not NULL
for a partitoned table after RelationBuildPartitionKey() runs

You removed the test to check whether "create unique index" works, not
sure if that was intentional
There is no test for attach index to a unique index

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 8 October 2017 at 02:34, Robert Haas <robertmhaas@gmail.com> wrote:
> However, when
> you dump-and-restore (including but not limited to the pg_upgrade
> case) you've got to preserve those names.  If you just generate a new
> name that may or may not be the same as the old one, then it may
> collide with a user-specified name that only occurs later in the dump.
> Also, you'll have trouble if the user has applied a COMMENT or a
> SECURITY LABEL to the index because that command works by name, or if
> the user has a reference to the index name inside a function or
> whatever.

I might be arriving late for the party here, but I see mention on here
of when we ATTACH a partition that we'd check to see if any of that
table's indexes match the index definition required for the
partitioned table's "partitioned index", and just use those indexes
and incorporate them into the partitioned index by setting the new OID
column in pg_index (or whatever way was decided to record the
dependency)... So, if that works for ATTACH, then can't pg_dump just
create each of the partition's indexes first, then create the
partitioned table's partition index, and that command would just go
through checking each partition similar to how ATTACH would work. That
way we get to keep all the names as pg_dump would just dump the
indexes belonging to each partition like it were any other index.
Those indexes would only become a bit more special once the
partitioned index is created on the parent.

I've not read the patch yet, but reading the thread it does not seem
to have gone in this direction. Maybe I'm overlooking something?

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Thanks for looking.

David Rowley wrote:

> So, if that works for ATTACH, then can't pg_dump just
> create each of the partition's indexes first, then create the
> partitioned table's partition index, and that command would just go
> through checking each partition similar to how ATTACH would work.

That was my first thought too (and I actually implemented it like this
before the current approach).  But there's a small problem: if a
partition exists which *doesn't* have the index, restoring things this
way would create the index in that partition too, which is unwanted
because the end state is different to what was in the dumped database.
This is because the command is either recursive (finds a matching index
and attachs it, or create a new one) or not recursive (does not look at
children at all).  But a command that is partially recursive (attach any
matching index, but do nothing if none exists) would be way too
confusing, ISTM.

Another small issue is that each partition might have two matching
indexes, maybe because one was in the process of being removed; how
would CREATE INDEX at restore time know which one to attach?  It could
do so by choosing arbitrarily, but that doesn't seem reliable.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 15 November 2017 at 01:09, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> if a
> partition exists which *doesn't* have the index, restoring things this
> way would create the index in that partition too, which is unwanted
> because the end state is different to what was in the dumped database.

hmm, but surely the all those indexes must already exist if the
partitioned index exists. Won't we be disallowing DROP INDEX of the
leaf partition indexes if that index is marked as being part of the
partitioned index?

If so, then surely this could only happen if someone manually edited
the pg_dump file to remove the CREATE INDEX statement for the leaf
partition, and if they do that, then maybe they won't be so surprised
that CREATE INDEX has to create some indexes.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:
> On 15 November 2017 at 01:09, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > if a
> > partition exists which *doesn't* have the index, restoring things this
> > way would create the index in that partition too, which is unwanted
> > because the end state is different to what was in the dumped database.
> 
> hmm, but surely the all those indexes must already exist if the
> partitioned index exists. Won't we be disallowing DROP INDEX of the
> leaf partition indexes if that index is marked as being part of the
> partitioned index?

In normal cases, sure -- but you can do ALTER INDEX DETACH on a
partition, then drop the index.  This is useful for example if you want
to replace some partition's index because of bloat: create a replacement
index, detach the old one, attach the new one, drop the old one.  Now
you probably don't *want* to take a pg_dump in the middle of this
process ...  but pg_dump's charter is not to produce the output we think
would be most convenient to users most of the time, but rather to
accurately describe what is in the database.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 15 November 2017 at 01:30, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> David Rowley wrote:
>> hmm, but surely the all those indexes must already exist if the
>> partitioned index exists. Won't we be disallowing DROP INDEX of the
>> leaf partition indexes if that index is marked as being part of the
>> partitioned index?
>
> In normal cases, sure -- but you can do ALTER INDEX DETACH on a
> partition, then drop the index.  This is useful for example if you want
> to replace some partition's index because of bloat: create a replacement
> index, detach the old one, attach the new one, drop the old one.

But if you can DETACH a partition from requiring an index to back up
this partitioned index, then the partitioned index is not valid while
the leaf table's index is missing. Maybe I misunderstood what you mean
by DETACH here, but if you remove an index and there's some period of
time where none exists for a leaf partition, then you're not going to
be able to ever do partitioned UNIQUE indexes.

I'd have thought some sort of: ALTER INDEX name_of_partitioned_index
REPLACE INDEX FOR partitioned_tablename WITH
name_of_new_matching_bloat_free_index;

... or something along those lines, and just have an atomic swap out
of the index with some new one that's been created. Something similar
would be quite good for a foreign key constraint. Today we have no
standard way to replace the referenced table's index without dropping
the FK first.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> On 13 November 2017 at 12:55, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Somehow I managed to include an unrelated patch as attachment.  Here's
> > another attempt (on which I also lightly touched ddl.sgml with relevant
> > changes).
> 
> Looks good. Some minor comments below.
> 
> 0001- Simplify
> Seems useful as separate step; agree with everything, no further comments

Thanks, pushed.

> Why uint16? Why not just uint?

*Shrug*.  16 bits seem plenty enough.  I changed it to bits16, which we
use in other places for bitmasks.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

> But if you can DETACH a partition from requiring an index to back up
> this partitioned index, then the partitioned index is not valid while
> the leaf table's index is missing.

Yeah.  I don't see this as too much of a problem myself.  At present,
from the planner's POV each partition is its own table and incurs its
own planning, so there's no restriction on having different sets of
indexes.  If in the future we want to see partition hierarchies as a
single relation which must have identical properties such as the same
index everywhere, this is easily changed: just set indisready flag false
for the parent table until every partition can be proven to have the
index.

Example: on DETACH we set indisready=false on the parent; on ATTACH we
scan the whole hierarchy and set it true if all partitions have the
index.  So the planner can rely on the parent's indisready.  Since those
operations are catalog-only, the transaction which attaches/detaches the
index can be short (though AFAICS it requires AEL on the whole
hierarchy, sadly).


> Maybe I misunderstood what you mean by DETACH here, but if you remove
> an index and there's some period of time where none exists for a leaf
> partition, then you're not going to be able to ever do partitioned
> UNIQUE indexes.

Well, UNIQUE indexes have additional restrictions of their own.  I have
no problem decreeing that you cannot DETACH an index from a primary key.
Then a future project can implement index replacement like you propose:

> I'd have thought some sort of: ALTER INDEX name_of_partitioned_index
> REPLACE INDEX FOR partitioned_tablename WITH
> name_of_new_matching_bloat_free_index;
> 
> ... or something along those lines, and just have an atomic swap out
> of the index with some new one that's been created.

(My intention here is to avoid scope creep.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On 13 November 2017 at 12:55, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > Somehow I managed to include an unrelated patch as attachment.  Here's
> > > another attempt (on which I also lightly touched ddl.sgml with relevant
> > > changes).
> > 
> > Looks good. Some minor comments below.
> > 
> > 0001- Simplify
> > Seems useful as separate step; agree with everything, no further comments
> 
> Thanks, pushed.

Here's the remaining bits, rebased.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Nov 14, 2017 at 12:49 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Here's the remaining bits, rebased.

It's true that Tom and I reviewed patch 0001, as your proposed commit
message states.  But it's also true that we both said that it probably
wasn't a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Simon Riggs
Date:
On 14 November 2017 at 13:12, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Nov 14, 2017 at 12:49 PM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
>> Here's the remaining bits, rebased.
>
> It's true that Tom and I reviewed patch 0001, as your proposed commit
> message states.  But it's also true that we both said that it probably
> wasn't a good idea.

I don't see any comments from you or Tom about patch 0001, which was
simple refactoring and not much to complain about.

Perhaps there is some confusion about the numbering?

I see that Alvaro had taken your comments on memory contexts into
account in his later patch.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi,

On 11/14/2017 12:49 PM, Alvaro Herrera wrote:
>> Thanks, pushed.
> 
> Here's the remaining bits, rebased.
> 

First of all, thanks for working on this.

I have been looking at the "CREATE INDEX ... ONLY" syntax, and I think 
it could cause some confusion due to the "Note" described in 
create_index.sgml.

An alternative, maybe crazy, could be to treat a partitioned index as 
one; e.g. all operations are on the definition. That way ONLY, ATTACH 
and DETACH could be eliminated. Once a partition is detached any 
partitioned indexes would be marked as a normal index, and a partition 
could only be attached if it had indexes satisfying all partition index 
definitions. Bloat could be handled by swapping the index as suggested 
by David [1]. It would be less flexible, but people always have the 
option to define indexes directly on the partitions.

Some comments.

0004's alter_index.sgml are missing the "PARTITION" keyword for both the 
ATTACH and DETACH commands.

A small thing, for

-- test.sql --
CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a);
CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 0);
CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 1);
CREATE INDEX idx_test_a ON test (a);
-- test.sql --

I would expect that the index names were 'test_p00_idx_test_a' and 
'test_p01_idx_test_a'.

psql completion for "CREATE INDEX blah ON tes<TAB>" only lists the child 
tables. Completion for the ALTER INDEX commands is missing.

[1] 
https://www.postgresql.org/message-id/CAKJS1f_Wf%3DM06o8iQi72SxN%2BZpLV4c0CwYoN5xYVY4aWWX-jBQ%40mail.gmail.com

Best regards, Jesper


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Nov 14, 2017 at 1:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I don't see any comments from you or Tom about patch 0001, which was
> simple refactoring and not much to complain about.

We both commented that getting rid of copy_partition_data could
introduce memory leaks.

> Perhaps there is some confusion about the numbering?

I don't think so.

> I see that Alvaro had taken your comments on memory contexts into
> account in his later patch.

Which later patch?  It seems like any changes meant to mitigate the
problems with removing copy_partition_data ought to be folded into the
patch that removes copy_partition_data, rather than being in some
other patch later in the series.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hi Jesper,

Thanks for reviewing.

Jesper Pedersen wrote:

> I have been looking at the "CREATE INDEX ... ONLY" syntax, and I think it
> could cause some confusion due to the "Note" described in create_index.sgml.
> 
> An alternative, maybe crazy, could be to treat a partitioned index as one;
> e.g. all operations are on the definition.

Well, honestly that's the way I wanted partitioning to be defined, right
from the start -- my first proposal was that partitions were not to be
standalone objects.  But I was outvoted once I stepped out of the
implementation, and partitioning is now the way we have it -- each
partition its own separate object.  Until we change that view (if we
ever do), I think the only sensible decision is that the whole feature
works that way.

> Some comments. [...]

Thanks, will fix.  Except for this one:

> A small thing, for
> 
> -- test.sql --
> CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a);
> CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2,
> REMAINDER 0);
> CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2,
> REMAINDER 1);
> CREATE INDEX idx_test_a ON test (a);
> -- test.sql --
> 
> I would expect that the index names were 'test_p00_idx_test_a' and
> 'test_p01_idx_test_a'.

Hmm, the code in my patch just uses the standard naming routine for
indexes.  I'm disinclined to change it in the initial commit, but
perhaps we can change that later.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 15 November 2017 at 04:23, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> David Rowley wrote:
>> I'd have thought some sort of: ALTER INDEX name_of_partitioned_index
>> REPLACE INDEX FOR partitioned_tablename WITH
>> name_of_new_matching_bloat_free_index;
>>
>> ... or something along those lines, and just have an atomic swap out
>> of the index with some new one that's been created.
>
> (My intention here is to avoid scope creep.)

OK, I didn't really mean that this would be required for an initial
patch. It's something that could be invented at some later date.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 15 November 2017 at 06:49, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's the remaining bits, rebased.

Hi,

I've not had time for a thorough look at  this, but on a quick scan I noticed that CompareIndexInfo() missed checking if the Index AM matches the AM of the partitioned index.

Testing with:

create table p (a int not null) partition by range (a);
create table p1 partition of p for values from (1) to (10);
create table p2 partition of p for values from (10) to (20);
create index on p1 using btree (a);
create index on p2 using hash (a);
create index on p (a);

I see it ends up making use of the hash index on p2 to support the index that's stored as a btree on the partitioned table. I think these should match so that the operations we can perform on the index are all aligned.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 15 November 2017 at 06:49, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's the remaining bits, rebased.

 I wonder if it should be this patches job to alter the code in get_relation_info() which causes the indexes not to be loaded for partitioned tables:

/*
* Make list of indexes.  Ignore indexes on system catalogs if told to.
* Don't bother with indexes for an inheritance parent, either.
*/
if (inhparent ||
(IgnoreSystemIndexes && IsSystemRelation(relation)))
hasindex = false;
else
hasindex = relation->rd_rel->relhasindex;

A partitioned table will always go into the hasindex = false code path.

I'm kind of thinking this patch should change that, even if the patch is not making use of the indexes, you could argue that something using set_rel_pathlist_hook might want to do something there, although, there's likely a bunch of counter arguments too.

What do you think?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Ashutosh Bapat
Date:
On Fri, Nov 17, 2017 at 6:24 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> I'm kind of thinking this patch should change that, even if the patch is not
> making use of the indexes, you could argue that something using
> set_rel_pathlist_hook might want to do something there, although, there's
> likely a bunch of counter arguments too.
>

How do you think optimizer would use this information? A partitioned
table which doesn't have data doesn't need that information.

Partitioned table's behaviour here is similar to that of the
inheritance parent. Even though an inheritance parent may have indexes
on it, that information is not read and stored by optimizer for that
table in its role as an inheritance parent. It stores and uses that
information for that table in its role as the child, which gets
scanned.

May be in future, we will try to create common index paths for all
children for the indexes which are inherited from the parent. Then we
might want to load the index information for partitioned table as
well, but I don't think we are any close to do that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Michael Paquier
Date:
On Wed, Nov 15, 2017 at 2:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Here's the remaining bits, rebased.

At least patch 3 has conflicts with HEAD. I am moving this patch to
next CF per a lack of reviews, switching status to waiting on author.
-- 
Michael


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

>  I wonder if it should be this patches job to alter the code in
> get_relation_info() which causes the indexes not to be loaded for
> partitioned tables:
> 
> /*
> * Make list of indexes.  Ignore indexes on system catalogs if told to.
> * Don't bother with indexes for an inheritance parent, either.
> */
> if (inhparent ||
> (IgnoreSystemIndexes && IsSystemRelation(relation)))
> hasindex = false;
> else
> hasindex = relation->rd_rel->relhasindex;
> 
> A partitioned table will always go into the hasindex = false code path.
> 
> I'm kind of thinking this patch should change that, even if the patch is
> not making use of the indexes, you could argue that something
> using set_rel_pathlist_hook might want to do something there, although,
> there's likely a bunch of counter arguments too.
> 
> What do you think?

Great question.  So you're thinking that the planner might have an
interest in knowing what indexes are defined at the parent table level
for planning purposes; but for that to actually have any effect we would
need to change the planner and executor also.  And one more point, also
related to something you said before: we currently (I mean after my
patch) don't mark partitioned-table-level indexes as valid or not valid
depending on whether all its children exist, so trying to use that in
the planner without having a flag could cause invalid plans to be
generated (i.e. ones that would cause nonexistent indexes to be
referenced).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Wed, Nov 15, 2017 at 2:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Here's the remaining bits, rebased.
> 
> At least patch 3 has conflicts with HEAD. I am moving this patch to
> next CF per a lack of reviews, switching status to waiting on author.

Thanks, will submit a new version before the start of the next
commitfest.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Thu, Nov 30, 2017 at 7:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Great question.  So you're thinking that the planner might have an
> interest in knowing what indexes are defined at the parent table level
> for planning purposes; but for that to actually have any effect we would
> need to change the planner and executor also.  And one more point, also
> related to something you said before: we currently (I mean after my
> patch) don't mark partitioned-table-level indexes as valid or not valid
> depending on whether all its children exist, so trying to use that in
> the planner without having a flag could cause invalid plans to be
> generated (i.e. ones that would cause nonexistent indexes to be
> referenced).

Did you do it this way due to locking concerns?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Thu, Nov 30, 2017 at 11:39 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I feel like we could do better here with little extra effort. The
> DETACH index feature does not really seem required for this patch.

Because of the dump/restore considerations mentioned in
http://postgr.es/m/CA+TgmobUhGHg9v8SAswkHbBfyWg5A0QB+jGt0UOvq5YcBDUGig@mail.gmail.com
I believe we need a way to create the index on the parent without
immediately triggering index builds on the children, plus a way to
create an index on a child after-the-fact and attach it to the parent.
Detach isn't strictly required, but why support attach and not detach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

> So, then this patch is only really intended as a syntax shortcut for
> mass adding of indexes to each partition?

This patch is intended to serve as a basis on which to construct further
features, just like every other patch we apply.

> I feel like we could do better here with little extra effort. The
> DETACH index feature does not really seem required for this patch. I
> think just ensuring a matching index exists on each leaf partition and
> creating any which don't exist before creating the index on the target
> partitioned table seems like the correct solution. That way we can
> make that index indisvalid flag have a valid meaning all the time.
> Some later patch can invent some way to replace a bloated index.

What you're saying is that I've written code for A+B, and you're
"interested in C (which is incompatible with B), so can we have A+C and
drop B".  But in reality, there exists (unwritten) D that solves the
incompatiblity between B and C.  I'm just saying it's essentially the
same to postpone C+D than to postpone B+D, and I already have B written;
plus that way we don't have to come up with some novel way to handle
pg_dump support.  So can we get A+B committed and discuss C+D later?

A = partitioned indexes
B = pg_dump support based on ATTACH
C = your proposed planner stuff
D = correct indisvalid setting for partitioned indexes (set to false
    when a partition does not contain the index)

The patch in this thread is A+B.

> Perhaps later we can invent some generic way to replace a physical
> leaf index for a given partitioned index perhaps with the same patch
> that might allow us to replace an index which is used by a constraint,
> which to me seems like a feature we should have had years ago.

This is a hypothetical feature E which would be nice (for partitioned
indexes and for ordinary indexes too) but is not strictly necessary.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 2 December 2017 at 11:10, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> What you're saying is that I've written code for A+B, and you're
> "interested in C (which is incompatible with B), so can we have A+C and
> drop B".  But in reality, there exists (unwritten) D that solves the
> incompatiblity between B and C.  I'm just saying it's essentially the
> same to postpone C+D than to postpone B+D, and I already have B written;
> plus that way we don't have to come up with some novel way to handle
> pg_dump support.  So can we get A+B committed and discuss C+D later?
>
> A = partitioned indexes
> B = pg_dump support based on ATTACH
> C = your proposed planner stuff
> D = correct indisvalid setting for partitioned indexes (set to false
>     when a partition does not contain the index)
>
> The patch in this thread is A+B.

Okay,  I wasn't insisting, just asking if you thought this was missing
from the patch.

However, I do still feel that if we're adding an index to an object
then it should be available in RelOptInfo->indexlist, but this patch
is still good progress even if we don't add it there.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 2 December 2017 at 03:39, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 30, 2017 at 11:39 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> I feel like we could do better here with little extra effort. The
>> DETACH index feature does not really seem required for this patch.
>
> Because of the dump/restore considerations mentioned in
> http://postgr.es/m/CA+TgmobUhGHg9v8SAswkHbBfyWg5A0QB+jGt0UOvq5YcBDUGig@mail.gmail.com
> I believe we need a way to create the index on the parent without
> immediately triggering index builds on the children, plus a way to
> create an index on a child after-the-fact and attach it to the parent.
> Detach isn't strictly required, but why support attach and not detach?

I proposed that this worked a different way in [1]. I think the way I
mention is cleaner as it means there's no extra reason for a
partitioned index to be indisvalid=false than there is for any other
normal index.

My primary reason for not liking this way is around leaving indexes
around as indisvalid=false, however, I suppose that an index could be
replaced atomically with a DETACH and ATTACH within a single
transaction. I had previously not really liked the idea of
invalidating an index by DETACHing a leaf table's index from it. Of
course, this patch does nothing special with partitioned indexes, but
I believe one day we will want to do something with these and that the
DETACH/ATTACH is not the best design for replacing part of the index.

[1] https://www.postgresql.org/message-id/CAKJS1f_o6v%2BXtT%2B3gfieUdCiU5Sn84humT-CVW5So_x_f%3DkLxQ%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Mon, Dec 4, 2017 at 6:57 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 2 December 2017 at 03:39, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Nov 30, 2017 at 11:39 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> I feel like we could do better here with little extra effort. The
>>> DETACH index feature does not really seem required for this patch.
>>
>> Because of the dump/restore considerations mentioned in
>> http://postgr.es/m/CA+TgmobUhGHg9v8SAswkHbBfyWg5A0QB+jGt0UOvq5YcBDUGig@mail.gmail.com
>> I believe we need a way to create the index on the parent without
>> immediately triggering index builds on the children, plus a way to
>> create an index on a child after-the-fact and attach it to the parent.
>> Detach isn't strictly required, but why support attach and not detach?
>
> I proposed that this worked a different way in [1]. I think the way I
> mention is cleaner as it means there's no extra reason for a
> partitioned index to be indisvalid=false than there is for any other
> normal index.

How does that proposal keep pg_dump from latching onto the wrong
index, if there's more than one index on the same columns?

> My primary reason for not liking this way is around leaving indexes
> around as indisvalid=false, however, I suppose that an index could be
> replaced atomically with a DETACH and ATTACH within a single
> transaction. I had previously not really liked the idea of
> invalidating an index by DETACHing a leaf table's index from it. Of
> course, this patch does nothing special with partitioned indexes, but
> I believe one day we will want to do something with these and that the
> DETACH/ATTACH is not the best design for replacing part of the index.

What do you propose?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 6 December 2017 at 04:29, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 4, 2017 at 6:57 PM, David Rowley
>> I proposed that this worked a different way in [1]. I think the way I
>> mention is cleaner as it means there's no extra reason for a
>> partitioned index to be indisvalid=false than there is for any other
>> normal index.
>
> How does that proposal keep pg_dump from latching onto the wrong
> index, if there's more than one index on the same columns?

I'm not hugely concerned about that. It's not a new problem and it's
not a problem that I recall seeing anyone complain about, at least not
to the extent that we've ever bothered to fix it.

The existing problem is with FOREIGN KEY constraints just choosing the
first matching index in transformFkeyCheckAttrs()

We can see the issue today with:

create table t1 (id int not null);
create unique index t1_idx_b on t1 (id);
create table t2 (id int references t1 (id));
create unique index t1_idx_a on t1 (id);

<pg_dump>
<pg_restore>

# drop index t1_idx_a;
ERROR:  cannot drop index t1_idx_a because other objects depend on it
DETAIL:  constraint t2_id_fkey on table t2 depends on index t1_idx_a
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

>> My primary reason for not liking this way is around leaving indexes
>> around as indisvalid=false, however, I suppose that an index could be
>> replaced atomically with a DETACH and ATTACH within a single
>> transaction. I had previously not really liked the idea of
>> invalidating an index by DETACHing a leaf table's index from it. Of
>> course, this patch does nothing special with partitioned indexes, but
>> I believe one day we will want to do something with these and that the
>> DETACH/ATTACH is not the best design for replacing part of the index.
>
> What do you propose?

For this patch, I propose we do nothing about that, but for the
future, I already mentioned an idea to do this above.

> I'd have thought some sort of: ALTER INDEX name_of_partitioned_index
> REPLACE INDEX FOR partitioned_tablename WITH
> name_of_new_matching_bloat_free_index;

This way there is no period where we have to mark the index as
indisvalid=false. So, for when we invent something that uses this
feature, there's no window of time that concurrently running queries
are in danger of generating a poor plan. The index swap operation is
atomic.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:
> On 6 December 2017 at 04:29, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Dec 4, 2017 at 6:57 PM, David Rowley
> >> I proposed that this worked a different way in [1]. I think the way I
> >> mention is cleaner as it means there's no extra reason for a
> >> partitioned index to be indisvalid=false than there is for any other
> >> normal index.
> >
> > How does that proposal keep pg_dump from latching onto the wrong
> > index, if there's more than one index on the same columns?
> 
> I'm not hugely concerned about that. It's not a new problem and it's
> not a problem that I recall seeing anyone complain about, at least not
> to the extent that we've ever bothered to fix it.

Another reason to do things the proposed way is to let parallel restore
create the indexes in parallel.  If we just have a single CREATE INDEX
that cascades to the partitions, that can be run by a single backend
only, which is a loser.

(There's also the fact that you'd lose any COMMENTs etc).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Dec 5, 2017 at 3:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I'm not hugely concerned about that. It's not a new problem and it's
> not a problem that I recall seeing anyone complain about, at least not
> to the extent that we've ever bothered to fix it.
>
> The existing problem is with FOREIGN KEY constraints just choosing the
> first matching index in transformFkeyCheckAttrs()
>
> We can see the issue today with:
>
> create table t1 (id int not null);
> create unique index t1_idx_b on t1 (id);
> create table t2 (id int references t1 (id));
> create unique index t1_idx_a on t1 (id);
>
> <pg_dump>
> <pg_restore>
>
> # drop index t1_idx_a;
> ERROR:  cannot drop index t1_idx_a because other objects depend on it
> DETAIL:  constraint t2_id_fkey on table t2 depends on index t1_idx_a
> HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Ugh, that sucks.  I don't think it's a good argument for making the
problem worse, though.  What are we giving up by explicitly attaching
the correct index?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 6 December 2017 at 09:58, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> David Rowley wrote:
>> On 6 December 2017 at 04:29, Robert Haas <robertmhaas@gmail.com> wrote:
>> > How does that proposal keep pg_dump from latching onto the wrong
>> > index, if there's more than one index on the same columns?
>>
>> I'm not hugely concerned about that. It's not a new problem and it's
>> not a problem that I recall seeing anyone complain about, at least not
>> to the extent that we've ever bothered to fix it.
>
> Another reason to do things the proposed way is to let parallel restore
> create the indexes in parallel.  If we just have a single CREATE INDEX
> that cascades to the partitions, that can be run by a single backend
> only, which is a loser.

I'm not all that sure why parallel restore could not work with this.
The leaf partition indexes would all be created first, then the CREATE
INDEX for the partitioned index would just be a metadata change with
some checks to ensure all the leaf partition indexes exist.

> (There's also the fact that you'd lose any COMMENTs etc).

I can't see why that would be a problem with my proposal.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote:
> What are we giving up by explicitly attaching
> the correct index?

The part I don't like about the ATTACH and DETACH of partitioned index
is that it seems to be trying to just follow the syntax we use to
remove a partition from a partitioned table, however, there's a huge
difference between the two, as DETACHing a partition from a
partitioned table leaves the partitioned table in a valid state, it
simply just no longer contains the detached partition. With the
partitioned index, we leave the index in an invalid state after a
DETACH. It can only be made valid again once another leaf index has
been ATTACHED again and that we've verified that all other indexes on
every leaf partition is also there and are valid. If we're going to
use these indexes to answer queries, then it seems like we should try
to keep them valid so that queries can actually use them for
something.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Ashutosh Bapat
Date:
On Wed, Dec 6, 2017 at 9:42 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote:
>> What are we giving up by explicitly attaching
>> the correct index?
>
> The part I don't like about the ATTACH and DETACH of partitioned index
> is that it seems to be trying to just follow the syntax we use to
> remove a partition from a partitioned table, however, there's a huge
> difference between the two, as DETACHing a partition from a
> partitioned table leaves the partitioned table in a valid state, it
> simply just no longer contains the detached partition. With the
> partitioned index, we leave the index in an invalid state after a
> DETACH. It can only be made valid again once another leaf index has
> been ATTACHED again and that we've verified that all other indexes on
> every leaf partition is also there and are valid. If we're going to
> use these indexes to answer queries, then it seems like we should try
> to keep them valid so that queries can actually use them for
> something.
>

+1 for all that. Exactly my thoughts.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 6 December 2017 at 13:42, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote:
>> What are we giving up by explicitly attaching
>> the correct index?
>
> The part I don't like about the ATTACH and DETACH of partitioned index
> is that it seems to be trying to just follow the syntax we use to
> remove a partition from a partitioned table, however, there's a huge
> difference between the two, as DETACHing a partition from a
> partitioned table leaves the partitioned table in a valid state, it
> simply just no longer contains the detached partition. With the
> partitioned index, we leave the index in an invalid state after a
> DETACH. It can only be made valid again once another leaf index has
> been ATTACHED again and that we've verified that all other indexes on
> every leaf partition is also there and are valid. If we're going to
> use these indexes to answer queries, then it seems like we should try
> to keep them valid so that queries can actually use them for
> something.

Also, ATTACH and DETACH are especially useless when it comes to UNIQUE
indexes. If we simply want to replace out a bloated index using a
DETACH quickly followed by an ATTACH then it leaves a non-zero window
of time that we can't be certain that the uniqueness is enforced. This
would still work on an individual partition level, but if we ever want
to reference a UNIQUE partitioned index in a foreign key constraint
then what happens to the foreign key when the index is in the invalid
state? Should we just disallow DETACH when a foreign key exists? or
just invalidate the foreign key constraint too? Both seem like a
nightmare from a DBA point-of-view.

You might argue that concurrently recreating an index used by a
foreign key is just as difficult today, but there's no reason to make
this as problematic, is there?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Dec 5, 2017 at 7:42 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote:
>> What are we giving up by explicitly attaching
>> the correct index?
>
> The part I don't like about the ATTACH and DETACH of partitioned index
> is that it seems to be trying to just follow the syntax we use to
> remove a partition from a partitioned table, however, there's a huge
> difference between the two, as DETACHing a partition from a
> partitioned table leaves the partitioned table in a valid state, it
> simply just no longer contains the detached partition. With the
> partitioned index, we leave the index in an invalid state after a
> DETACH. It can only be made valid again once another leaf index has
> been ATTACHED again and that we've verified that all other indexes on
> every leaf partition is also there and are valid. If we're going to
> use these indexes to answer queries, then it seems like we should try
> to keep them valid so that queries can actually use them for
> something.

I think keeping them valid is a great goal, just like I like low
interest rates and a chicken in every pot.  However, I'm pretty
skeptical of our ability to always succeed in meeting that goal with
absolutely zero corner cases.  What about a CREATE INDEX CONCURRENTLY
that fails midway through, or similarly DROP INDEX CONCURRENTLY?
Those operations can leave around artifacts in the unpartitioned table
case, and I bet they will also leave around artifacts for partitioned
tables, and maybe there will be cases where they don't leave the same
artifacts for every table in the hierarchy.  Even if they do, there is
future development to consider.  Maybe REINDEX INDEX CONCURRENTLY will
carry a possibility of creating a mix of states.  Or maybe someone
will implement Simon's idea from a few years ago of allowing an
unlogged index on a permanent table, with the index being somehow
marked not-valid after a restart.  In that situation, each one would
need to be reindexed independently to become valid.

I do agree with you that an index which is currently enforcing a
unique constraint has to remain continuously valid -- or if it
unavoidably doesn't, for example if we allowed an unlogged unique
index on a logged table, then we'd have to do something unpleasant
like disallow inserts and updates to the key column until that gets
fixed.  However, that doesn't seem to preclude gracefully swapping out
indexes for individual partitions; instead of providing a DETACH
operation, we could provide a REPLACE operation that effectively does
DETACH + ATTACH.

It's possible that we are not that far apart here.  I don't like the
ATTACH syntax because it's like what we do for partitions; I like it
because it solves the pg_dump problem.  And it seems to me that your
reservations are more about DETACH than ATTACH.  I have no issue with
punting DETACH to the curb, recasting it as REPLACE, or whatever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 8 December 2017 at 10:07, Robert Haas <robertmhaas@gmail.com> wrote:
> I do agree with you that an index which is currently enforcing a
> unique constraint has to remain continuously valid -- or if it
> unavoidably doesn't, for example if we allowed an unlogged unique
> index on a logged table, then we'd have to do something unpleasant
> like disallow inserts and updates to the key column until that gets
> fixed.  However, that doesn't seem to preclude gracefully swapping out
> indexes for individual partitions; instead of providing a DETACH
> operation, we could provide a REPLACE operation that effectively does
> DETACH + ATTACH.

Yeah, that's what I've been trying to push for. I've mentioned a handy
wavy REPLACE syntax a few times.

> It's possible that we are not that far apart here.  I don't like the
> ATTACH syntax because it's like what we do for partitions; I like it
> because it solves the pg_dump problem.  And it seems to me that your
> reservations are more about DETACH than ATTACH.  I have no issue with
> punting DETACH to the curb, recasting it as REPLACE, or whatever.

I just don't quite see the pg_dump problem with my proposed approach.

Let me pseudo write out what I imagine in a pg_dump file.

CREATE TABLE p (a INT NOT NULL, b INT NOT NULL) PARTITION BY RANGE (a);

CREATE TABLE p1 PARTITION OF p FOR VALUES FROM (1) TO (10);
CREATE TABLE p2 PARTITION OF p FOR VALUES FROM (10) TO (20);

<load data>

-- create indexes on partitions
CREATE INDEX p1_a_idx ON p1 (a); -- normal create index
CREATE INDEX p2_a_idx ON p2 (a); -- normal create index.

COMMENT ON INDEX p1_a_idx IS 'the comment';

-- create partitioned indexes

-- checks indexes exist for each partition of p with matching columns,
-- AM and unique property then perform a metadata only change to
-- say there's an index on this table, also updates each index uses to
-- say its parent index is this index. If someone happens to have edited
-- the dump file to remove one of the indexes on a leaf partition then the
-- following would have to create that index again.
CREATE INDEX p_a_idx ON p (a);

--
-- PostgreSQL database dump complete
--

We'd just need to ensure the indexes of leafs are created in an order
that takes into account the dependency order.

And yeah, this does nothing for making sure we choose the correct
index if more than one matching index exists on the leaf partition,
but perhaps we can dump a series of

ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx;
ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx;

... which would be no-ops most of the time, but at least would ensure
we use the correct index. (Likely we could fix the FOREIGN KEY
constraint choosing the first matching index with some variation of
this syntax)

Also, DROP INDEX should be disallowed on a leaf partition index which
is in use for a partitioned index. I imagine pg_index just needs a new
column indparentid which would be InvalidOid if it's not used. I'm
just not that clear on if that column should be set to the leaf
partition's direct parent, or the parent that the CREATE INDEX was
done on.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Thu, Dec 7, 2017 at 4:43 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> And yeah, this does nothing for making sure we choose the correct
> index if more than one matching index exists on the leaf partition,
> but perhaps we can dump a series of
>
> ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx;
> ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx;
>
> ... which would be no-ops most of the time, but at least would ensure
> we use the correct index. (Likely we could fix the FOREIGN KEY
> constraint choosing the first matching index with some variation of
> this syntax)

Sure, that would fix the problem I'm concerned about, but creating the
parent index first, as a shell, and then creating each child and
attaching it to the parent *also* fixes the problem I'm concerned
about, and without dragging any bystander objects into the fray.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 8 December 2017 at 10:51, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 4:43 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> And yeah, this does nothing for making sure we choose the correct
>> index if more than one matching index exists on the leaf partition,
>> but perhaps we can dump a series of
>>
>> ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx;
>> ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx;
>>
>> ... which would be no-ops most of the time, but at least would ensure
>> we use the correct index. (Likely we could fix the FOREIGN KEY
>> constraint choosing the first matching index with some variation of
>> this syntax)
>
> Sure, that would fix the problem I'm concerned about, but creating the
> parent index first, as a shell, and then creating each child and
> attaching it to the parent *also* fixes the problem I'm concerned
> about, and without dragging any bystander objects into the fray.

That's true, but is it worth inventing/maintaining an ATTACH syntax
for that? It's not a very common case that multiple matching indexes
existing. If it is worth it, do we need DETACH at all?


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Thu, Dec 7, 2017 at 4:57 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> Sure, that would fix the problem I'm concerned about, but creating the
>> parent index first, as a shell, and then creating each child and
>> attaching it to the parent *also* fixes the problem I'm concerned
>> about, and without dragging any bystander objects into the fray.
>
> That's true, but is it worth inventing/maintaining an ATTACH syntax
> for that? It's not a very common case that multiple matching indexes
> existing. If it is worth it, do we need DETACH at all?

I think it is totally worth it.  A little piece of extra DDL syntax
isn't that really costing us anything.  Your proposed approach
probably still has obscure failure cases - e.g. I bet the
deadlock-avoidance logic in parallel restore won't know about the
dependency on a seemingly-unrelated object.  Also, the use of a
seemingly-useless REPLACE syntax in every dump file will probably
confuse at least a few users, and maybe a few developers, too.  I
think there is considerable value, both for the system and for the
humans who use it, in having something that has *exactly* the
semantics we want rather than *almost* the semantics we want.

I suppose if we want to get cute, we could have ONLY the ATTACH
syntax; if you attach an index for a partition that already has an
index attached, that could mean attach to the new one instead of the
old one (i.e. replace).  But I would just add support for both ATTACH
and REPLACE and call it good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 8 December 2017 at 11:23, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 4:57 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> That's true, but is it worth inventing/maintaining an ATTACH syntax
>> for that? It's not a very common case that multiple matching indexes
>> existing. If it is worth it, do we need DETACH at all?
>
> I think it is totally worth it.  A little piece of extra DDL syntax
> isn't that really costing us anything.  Your proposed approach
> probably still has obscure failure cases - e.g. I bet the
> deadlock-avoidance logic in parallel restore won't know about the
> dependency on a seemingly-unrelated object.  Also, the use of a
> seemingly-useless REPLACE syntax in every dump file will probably
> confuse at least a few users, and maybe a few developers, too.  I
> think there is considerable value, both for the system and for the
> humans who use it, in having something that has *exactly* the
> semantics we want rather than *almost* the semantics we want.
>
> I suppose if we want to get cute, we could have ONLY the ATTACH
> syntax; if you attach an index for a partition that already has an
> index attached, that could mean attach to the new one instead of the
> old one (i.e. replace).  But I would just add support for both ATTACH
> and REPLACE and call it good.

ATTACH/REPLACE sounds fine. My objection was more about the
DETACH/ATTACH method to replace an index.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

> ATTACH/REPLACE sounds fine. My objection was more about the
> DETACH/ATTACH method to replace an index.

So what happens if you do ALTER INDEX .. ATTACH and you already have
another index in that partition that is attached to the same parent in
the index?  With my code, what happens is you have two indexes attached
to the same parent, and you can't drop any.  If we don't have DETACH,
how can you get out of that situation?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Mon, Dec 11, 2017 at 10:04 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> David Rowley wrote:
>
>> ATTACH/REPLACE sounds fine. My objection was more about the
>> DETACH/ATTACH method to replace an index.
>
> So what happens if you do ALTER INDEX .. ATTACH and you already have
> another index in that partition that is attached to the same parent in
> the index?

I think that should be an ERROR.  You can use REPLACE if you want to
switch which index is attached, but you shouldn't be able to attach
two indexes from the same partition at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hmm, so I'm now unsure what the actual proposals for handling pg_dump
are.  We seem to have the following three proposals:

1. Alvaro: use CREATE INDEX ON ONLY <parent> (not recursive ), followed
   by CREATE INDEX ON <partition>, followed by ALTER INDEX <on_parent>
   ATTACH PARTITION <on_partition>.  I provide an ALTER INDEX DETACH
   PARTITION for symmetry and because it can be used to replace the
   index.

   Pros: the database is always restored identically to what was in the
   original.
   Con:  The index hierarchy might be "partial", that is, lack a
   component index on some partition.

2. David's: use CREATE INDEX ON <partition>, followed by CREATE INDEX ON
   <parent>.  This will use the matching mechanism to automatically
   attach the index on partition to index on parent.  If some partition
   lacks a matching index, one is created automatically by the creation
   on parent.

   If you want to replace the index on a partition, use a new (as yet
   unimplemented) ALTER INDEX REPLACE.

   No need to add ONLY to the table name in CREATE INDEX, since the
   command always recurses.  (This seems good to me, because I 

   Pro: the index is never "partial" (missing a partition).
   Con: the matching mechanism might choose a different index on restore
   than what was selected in the database being dumped.

3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell
   index on parent only (no recursion), followed by CREATE INDEX ON
   <partition>.  DETACH is not provided.  If you ATTACH an index for a
   partition that already has one index attached, then (1) the newly
   attached one replaces the original (i.e. effectively REPLACE) or (2)
   you get an error and we implement a separate ALTER INDEX REPLACE
   command.  It's not clear to me how or when the shell index becomes a
   real index.


Robert, can you please clarify the terms of your proposal?  How is it
better than mine?  Is David's concern about a "partial" index (i.e. an
index that doesn't exist in some partition) solved by it?

I have code for proposals 1 and 2.  I don't like proposal 2, and David &
Ashutosh don't like (1).  Maybe if we all understand (3) we can agree on
using that one?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell
>    index on parent only (no recursion), followed by CREATE INDEX ON
>    <partition>.  DETACH is not provided.  If you ATTACH an index for a
>    partition that already has one index attached, then (1) the newly
>    attached one replaces the original (i.e. effectively REPLACE) or (2)
>    you get an error and we implement a separate ALTER INDEX REPLACE
>    command.  It's not clear to me how or when the shell index becomes a
>    real index.

As I understand the whole purpose of this design is that there is no
point during the restore at which the index lacks indexes on partitions:
it is either complete, or it doesn't exist yet.  If we create the index
on parent first, and later the indexes on partitions, that condition is
not satisfied.  To solve this, we could create the children indexes
first and then the parent; but how do we indicate to the parent creation
which are the indexes that oughta be marked as children?  ALTER INDEX
ATTACH PARTITION doesn't cut it, because it occurs after the index on
parent is created, which is too late.  We would do something like

CREATE INDEX ON parent (columns) [other stuff]
    ATTACH idx_on_child1, idx_on_child2, idx_on_child3;

but this seems mighty ugly.

Anyway if we do that, what is the point of ALTER INDEX ATTACH PARTITION?
We might as leave it out and just keep the door open for a later ALTER
INDEX REPLACE.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Fri, Dec 15, 2017 at 4:02 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell
>    index on parent only (no recursion), followed by CREATE INDEX ON
>    <partition>.  DETACH is not provided.  If you ATTACH an index for a
>    partition that already has one index attached, then (1) the newly
>    attached one replaces the original (i.e. effectively REPLACE) or (2)
>    you get an error and we implement a separate ALTER INDEX REPLACE
>    command.  It's not clear to me how or when the shell index becomes a
>    real index.

With this proposal, I think the index can be invalid initially, but
once you've attached an index for every child partition, it becomes
irrevocably valid.  After that, the only supported operation is
REPLACE, which preserves validity.

> Robert, can you please clarify the terms of your proposal?  How is it
> better than mine?  Is David's concern about a "partial" index (i.e. an
> index that doesn't exist in some partition) solved by it?

I think the perceived advantage is that, once valid, the index can't
subsequently become not-valid.  That seemed to be David's big concern
(which is not without foundation).

> I have code for proposals 1 and 2.  I don't like proposal 2, and David &
> Ashutosh don't like (1).  Maybe if we all understand (3) we can agree on
> using that one?

Yes, it would be nice to achieve some sort of consensus and I think
(3) gives everyone a little of what they want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Nov 30, 2017 at 7:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Great question.  So you're thinking that the planner might have an
> > interest in knowing what indexes are defined at the parent table level
> > for planning purposes; but for that to actually have any effect we would
> > need to change the planner and executor also.  And one more point, also
> > related to something you said before: we currently (I mean after my
> > patch) don't mark partitioned-table-level indexes as valid or not valid
> > depending on whether all its children exist, so trying to use that in
> > the planner without having a flag could cause invalid plans to be
> > generated (i.e. ones that would cause nonexistent indexes to be
> > referenced).
> 
> Did you do it this way due to locking concerns?

No -- just because since the index-on-parent is a different kind of
object (RELKIND_PARTITIONED_INDEX) it is not considered for anything in
the planner anyway, so there's no need for indisvalid to be "correct".
Changing the flag in the parent index only needs to examine state on the
children, not modify them, so I don't think there would be any serious
locking problem.


By the way, my implementation of ALTER INDEX ATTACH PARTITION was prone
to deadlocks because it needs locks on parent table, index-on-parent,
partition, and index-on-partition; and there was no consideration to the
ordering in which these were being acquired.  I wrote a callback for
RangeVarGetRelidExtended to be called in ATExecAttachPartitionIdx() that
tries to acquire locks in the right order -- but I recently realized
that even that is not sufficient, because (unless I misread it) ALTER
INDEX itself does not worry about locking the containing table before
the index.  I think some tweaking needs to be done in
RangeVarCallbackForAlterRelation to lock the table if an index is being
altered (conditionally, depending on the precise ALTER TABLE operation).
It surprises me that we don't need to do that yet, but I haven't looked
into it any further.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Fri, Dec 15, 2017 at 4:02 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell
> >    index on parent only (no recursion), followed by CREATE INDEX ON
> >    <partition>.  DETACH is not provided.  If you ATTACH an index for a
> >    partition that already has one index attached, then (1) the newly
> >    attached one replaces the original (i.e. effectively REPLACE) or (2)
> >    you get an error and we implement a separate ALTER INDEX REPLACE
> >    command.  It's not clear to me how or when the shell index becomes a
> >    real index.
> 
> With this proposal, I think the index can be invalid initially, but
> once you've attached an index for every child partition, it becomes
> irrevocably valid.  After that, the only supported operation is
> REPLACE, which preserves validity.

Sounds okay to me.  (I had already deleted ALTER INDEX DETACH from my
patch earlier today.)  I admit I had also deleted the ONLY clause from
CREATE INDEX, and I don't like having to put it back :-)  But on the
whole, having it sounds better than the alternatives.

We have two options for marking valid:

1. after each ALTER INDEX ATTACH, verify whether the set of partitions
that contain the index is complete; if so, mark it valid, otherwise do
nothing.  This sucks because we have to check that over and over for
every index that we attach

2. We invent yet another command, say
    ALTER INDEX <idx-on-parent> VALIDATE

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Fri, Dec 15, 2017 at 5:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> We have two options for marking valid:
>
> 1. after each ALTER INDEX ATTACH, verify whether the set of partitions
> that contain the index is complete; if so, mark it valid, otherwise do
> nothing.  This sucks because we have to check that over and over for
> every index that we attach
>
> 2. We invent yet another command, say
>     ALTER INDEX <idx-on-parent> VALIDATE

If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I
think it might as well try to validate while it's at it.  But if not
then we might want to go with #2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 17 December 2017 at 16:22, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 5:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> We have two options for marking valid:
>>
>> 1. after each ALTER INDEX ATTACH, verify whether the set of partitions
>> that contain the index is complete; if so, mark it valid, otherwise do
>> nothing.  This sucks because we have to check that over and over for
>> every index that we attach
>>
>> 2. We invent yet another command, say
>>     ALTER INDEX <idx-on-parent> VALIDATE

It's not perfect that we need to validate each time, but it might not
be that expensive to validate since we only really need to count the
pg_index rows that have indisvalid = true rows which have a parent
index listed as the index we're ATTACHing too, then ensure that
matches the number of leaf partitions.

> If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I
> think it might as well try to validate while it's at it.  But if not
> then we might want to go with #2.

I'm now not that clear on what the behaviour is if the ONLY keyword is
not specified on the CREATE INDEX for the partitioned index. Does that
go and create each leaf partition index regardless of if there is a
suitable candidate to ATTACH?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Fri, Dec 15, 2017 at 5:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > We have two options for marking valid:
> >
> > 1. after each ALTER INDEX ATTACH, verify whether the set of partitions
> > that contain the index is complete; if so, mark it valid, otherwise do
> > nothing.  This sucks because we have to check that over and over for
> > every index that we attach
> >
> > 2. We invent yet another command, say
> >     ALTER INDEX <idx-on-parent> VALIDATE
> 
> If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I
> think it might as well try to validate while it's at it.  But if not
> then we might want to go with #2.

The problem I have with it is that restoring a dump containing indexes
on partitions becomes a O(N^2) deal as it has to do the full check once
for every index we attach.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Sun, Dec 17, 2017 at 10:50 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
>> If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I
>> think it might as well try to validate while it's at it.  But if not
>> then we might want to go with #2.
>
> The problem I have with it is that restoring a dump containing indexes
> on partitions becomes a O(N^2) deal as it has to do the full check once
> for every index we attach.

Sure.  If the constant factor is high enough to matter, then VALIDATE
makes sense.

IMHO, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Sun, Dec 17, 2017 at 5:29 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I
>> think it might as well try to validate while it's at it.  But if not
>> then we might want to go with #2.
>
> I'm now not that clear on what the behaviour is if the ONLY keyword is
> not specified on the CREATE INDEX for the partitioned index. Does that
> go and create each leaf partition index regardless of if there is a
> suitable candidate to ATTACH?

No, the other way around.  ONLY is being proposed as a way to create
an initially-not-valid parent to which we can then ATTACH
subsequently-created child indexes.  But because we will have REPLACE
rather than DETACH, once you get the index valid it never goes back to
not-valid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 18 December 2017 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Dec 17, 2017 at 5:29 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> I'm now not that clear on what the behaviour is if the ONLY keyword is
>> not specified on the CREATE INDEX for the partitioned index. Does that
>> go and create each leaf partition index regardless of if there is a
>> suitable candidate to ATTACH?
>
> No, the other way around.  ONLY is being proposed as a way to create
> an initially-not-valid parent to which we can then ATTACH
> subsequently-created child indexes.  But because we will have REPLACE
> rather than DETACH, once you get the index valid it never goes back to
> not-valid.

I understand what the ONLY is proposed to do. My question was in
regards to the behaviour without ONLY.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Sun, Dec 17, 2017 at 9:38 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 18 December 2017 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Dec 17, 2017 at 5:29 AM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> I'm now not that clear on what the behaviour is if the ONLY keyword is
>>> not specified on the CREATE INDEX for the partitioned index. Does that
>>> go and create each leaf partition index regardless of if there is a
>>> suitable candidate to ATTACH?
>>
>> No, the other way around.  ONLY is being proposed as a way to create
>> an initially-not-valid parent to which we can then ATTACH
>> subsequently-created child indexes.  But because we will have REPLACE
>> rather than DETACH, once you get the index valid it never goes back to
>> not-valid.
>
> I understand what the ONLY is proposed to do. My question was in
> regards to the behaviour without ONLY.

Oh, sorry -- I was confused.  I'm not sure whether that should try to
attach to something if it exists, or just create unconditionally...
what do you think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 18 December 2017 at 15:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Dec 17, 2017 at 9:38 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 18 December 2017 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Sun, Dec 17, 2017 at 5:29 AM, David Rowley
>>> <david.rowley@2ndquadrant.com> wrote:
>>>> I'm now not that clear on what the behaviour is if the ONLY keyword is
>>>> not specified on the CREATE INDEX for the partitioned index. Does that
>>>> go and create each leaf partition index regardless of if there is a
>>>> suitable candidate to ATTACH?
>>>
>>> No, the other way around.  ONLY is being proposed as a way to create
>>> an initially-not-valid parent to which we can then ATTACH
>>> subsequently-created child indexes.  But because we will have REPLACE
>>> rather than DETACH, once you get the index valid it never goes back to
>>> not-valid.
>>
>> I understand what the ONLY is proposed to do. My question was in
>> regards to the behaviour without ONLY.
>
> Oh, sorry -- I was confused.  I'm not sure whether that should try to
> attach to something if it exists, or just create unconditionally...
> what do you think?

I think you feel quite strongly about not having the code select a
random matching index, so if we want to stick to that rule, then we'll
need to create a set of new leaf indexes rather than select a random
one.

If we don't do that, then we might as well go with my idea and ditch
the ONLY syntax and have the CREATE INDEX try to find a suitable leaf
index, or create a new leaf index if one cannot be found.

Would it be surprising to users if CREATE INDEX ON partitioned_table
created a bunch of duplicate new indexes on the leaf tables?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Sun, Dec 17, 2017 at 10:05 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I think you feel quite strongly about not having the code select a
> random matching index, so if we want to stick to that rule, then we'll
> need to create a set of new leaf indexes rather than select a random
> one.

I feel quite strongly about it *in the context of pg_dump*.  Outside
of that scenario, I'm happy to go with whatever behavior you and
others think will satisfy the POLA.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
After this discussion, this is how I see things working:

1. pg_dump
   a) creates indexes on partitions normally
   b) once all existing indexes are done, index on parent is created,
      with ONLY.  No cascading occurs, no indexes are attached.
   c) ATTACH is run for each existing partition index.  After each
      ATTACH, we check that all indexes exist.  If so, the parent is
      marked valid.
   d) if not all indexes existed in partitions, index on parent remains
      invalid.  (It was invalid in the dumped database, so this is
      correct.)

2. other uses
   Normal CREATE INDEX (without ONLY) recurses and attaches the first
   matching index it finds (no duplicate indexes are created);
   partitions without a matching index get one created.

3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
   it remains so forever.

I think this satisfies all concerns.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> After this discussion, this is how I see things working:
>
> 1. pg_dump
>    a) creates indexes on partitions normally
>    b) once all existing indexes are done, index on parent is created,
>       with ONLY.  No cascading occurs, no indexes are attached.
>    c) ATTACH is run for each existing partition index.  After each
>       ATTACH, we check that all indexes exist.  If so, the parent is
>       marked valid.
>    d) if not all indexes existed in partitions, index on parent remains
>       invalid.  (It was invalid in the dumped database, so this is
>       correct.)
>
> 2. other uses
>    Normal CREATE INDEX (without ONLY) recurses and attaches the first
>    matching index it finds (no duplicate indexes are created);
>    partitions without a matching index get one created.
>
> 3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
>    it remains so forever.
>
> I think this satisfies all concerns.

Sounds great to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 19 December 2017 at 05:08, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
>> After this discussion, this is how I see things working:
>>
>> 1. pg_dump
>>    a) creates indexes on partitions normally
>>    b) once all existing indexes are done, index on parent is created,
>>       with ONLY.  No cascading occurs, no indexes are attached.
>>    c) ATTACH is run for each existing partition index.  After each
>>       ATTACH, we check that all indexes exist.  If so, the parent is
>>       marked valid.
>>    d) if not all indexes existed in partitions, index on parent remains
>>       invalid.  (It was invalid in the dumped database, so this is
>>       correct.)
>>
>> 2. other uses
>>    Normal CREATE INDEX (without ONLY) recurses and attaches the first
>>    matching index it finds (no duplicate indexes are created);
>>    partitions without a matching index get one created.
>>
>> 3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
>>    it remains so forever.
>>
>> I think this satisfies all concerns.
>
> Sounds great to me.

and me.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Great, thanks for the input.

pg_dump behaves as described upthread -- thanks David and Robert for the
input.  I did this by injecting a fake "INDEX ATTACH" object in
pg_dump's model, which depends on the index-on-parent-table; which in
turn depends on the index-on-partition.  Because of the dependencies,
the attach is dumped last, and the index-on-parent is dumped after all
the indexes-on-partitions have been dumped.  I needed to apply a little
bit of surgery so that each table object would contain links to its
indexes, which previously wasn't there.  A followup step to obtaining
index info creates the INDEX ATTACH objects.

In the backend code: There is now a difference in initial setting of
indisvalid and indisready when creating an index.  Previously we always
set them the same (!concurrent) but now indisready depends only on
"concurrent" while indisvalid additionally depends on whether a a
non-inherited index on a partitioned table is being built.  Failing to
set indisready to true initially was causing the index to be invisible
to pg_dump.

There is no ALTER INDEX / DETACH, as discussed.  Also, trying to attach
an index for a partition that already contains an attached index raises
an error -- which means that in order to determine whether attaching an
index as partition makes the parent index valid, is a simple matter of
counting tuples.

All these behaviors are immortalized in the regression tests.

I added tab-complete support too (thanks Jesper for the reminder).  It
is probably not exhaustive, but should be good enough.

I have two patches to rebase on top of this, which I hope to post later
today:

1) let these indexes be unique (i.e. allow unique and PK constraints)
2) allow foreign keys on partitioned tables

I have a further patch to allow FOR EACH ROW triggers on partitioned
tables, but I think it's largely unrelated to this (and, as I was
surprised to discover, it's also unrelated to the FKs one).

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Wed, Dec 20, 2017 at 12:01 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> I have two patches to rebase on top of this, which I hope to post later
> today:
>
> 1) let these indexes be unique (i.e. allow unique and PK constraints)
> 2) allow foreign keys on partitioned tables
>
> I have a further patch to allow FOR EACH ROW triggers on partitioned
> tables, but I think it's largely unrelated to this (and, as I was
> surprised to discover, it's also unrelated to the FKs one).

Sounds great!  I made the comment during my talk at PGCONF.EU that
partitioned tables in PostgreSQL are really just a bunch of tables
glued together, but that over time "we'll make the glue better", and I
think the improvements on which you are working will go a long way in
that direction.

With regard to indexes, presumably we can only have a unique index if
the index definition matches the partition key definition.  Otherwise,
we can only guarantee per-partition uniqueness, unless (1) we
implement "global" indexes, which sounds like a patch that would take
a fair amount of work, or (2) every index insertion goes and checks
for conflicting values in all of the "sibling" indexes.  This latter
approach was suggested to me by Andres, and certainly solves a bunch
of problems with vacuuming and dropping indexes, but seems like it
would probably have lousy insert performance unless the number of
partitions is very small.

With regard to foreign keys, it seems like there are two directions to
worry about.  An "outbound" foreign key needs to cascade, I think,
more or less like an index does, with every child inheriting the
foreign key from the partition root.  An "inbound" foreign key just
needs a unique index to point at and depend on.  Given the work you
are doing on unique indexes, I'm guessing that you are talking about
supporting the latter, not the former, but the way you phrased it
sounds like the opposite.

I haven't thought much about the possibility of supporting FOR EACH
ROW triggers, but it sounds like a good idea if we can do it.  I'm not
sure how that will work, though, if users directly access the leaf
partitions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:

> Sounds great!  I made the comment during my talk at PGCONF.EU that
> partitioned tables in PostgreSQL are really just a bunch of tables
> glued together, but that over time "we'll make the glue better", and I
> think the improvements on which you are working will go a long way in
> that direction.

Thanks -- yes, that's my intention, to improve our partitioning story.
Many other projects related to improving the glue are already underway
by others, so I think we're making great progress on the overall
partitioning front.

> With regard to indexes, presumably we can only have a unique index if
> the index definition matches the partition key definition.  Otherwise,
> we can only guarantee per-partition uniqueness, unless (1) we
> implement "global" indexes, which sounds like a patch that would take
> a fair amount of work, or (2) every index insertion goes and checks
> for conflicting values in all of the "sibling" indexes.  This latter
> approach was suggested to me by Andres, and certainly solves a bunch
> of problems with vacuuming and dropping indexes, but seems like it
> would probably have lousy insert performance unless the number of
> partitions is very small.

Actually, we discussed unique constraints before.  What I proposed is to
implement a useful subset: only allow a unique constraint if the
partition columns are in it, which means that enforcing the constraint
on each partition independently is enough to satisfy the constraint
globally.  This doesn't solve all cases where you want UNIQUE on
partitioned tables, but the reason to implement this subset is precisely
that it doesn't have those drawbacks, so it's quick and efficient.

Another point is that global indexes are not only a lot of work, but
they also come with additional drawbacks such as the necessity to
clean up when partitions are dropped/detached, which is a huge pain.

> With regard to foreign keys, it seems like there are two directions to
> worry about.  An "outbound" foreign key needs to cascade, I think,
> more or less like an index does, with every child inheriting the
> foreign key from the partition root.  An "inbound" foreign key just
> needs a unique index to point at and depend on.  Given the work you
> are doing on unique indexes, I'm guessing that you are talking about
> supporting the latter, not the former, but the way you phrased it
> sounds like the opposite.

Actually, I intend to do both.  I have a prototype patch which seems to
work, though I haven't hammered it heavily yet.  Some of the cascading
options are not implemented -- as I recall I implemented it for the
CREATE TABLE PARTITION OF case but not yet the ALTER TABLE ATTACH
PARTITION one.

> I haven't thought much about the possibility of supporting FOR EACH
> ROW triggers, but it sounds like a good idea if we can do it.  I'm not
> sure how that will work, though, if users directly access the leaf
> partitions.

Yeah, I'm not sure yet either.  Needs more thought than I've given it so
far.  I have a prototype patch for this also, and the basic case seems
to give reasonable results, but I haven't tried to find any weird corner
cases either.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Just to show what I'm talking about, here are a few prototype patches.
Beware, these are all very lightly tested/reviewed.  Input is welcome,
particularly if it comes in the guise of patches to regression tests
showing cases that misbehave.

0001 is a fixup for the v6 patch I posted upthread; it's needed so the
0002 patch doesn't end up with indexes marked invalid after pg_dump.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
I modified the regression test so that a partitioning hierarchy would be
left behind after the test is run, which is useful to test pg_upgrade
and pg_dump -- this caught one small bug.  That and some reading of the
diff resulted in v8, attach.

On my system, make check-world passes.  However, Thomas Munro's
automated patch tester seems to have a problem with the pg_upgrade test,
though I don't know what it is.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi Alvaro,

On 12/20/2017 04:25 PM, Alvaro Herrera wrote:
> I modified the regression test so that a partitioning hierarchy would be
> left behind after the test is run, which is useful to test pg_upgrade
> and pg_dump -- this caught one small bug.  That and some reading of the
> diff resulted in v8, attach.
> 
> On my system, make check-world passes.  However, Thomas Munro's
> automated patch tester seems to have a problem with the pg_upgrade test,
> though I don't know what it is.
> 

Passes check-world here too w/ TAP + cassert.

index.c:

+       values[Anum_pg_index_indparentidx - 1 ] = 
ObjectIdGetDatum(parentIndexOid);

Extra space.

tab-complete.c:

Contains references to DETACH.

create_index.sgml:

I think the new paragraph should mention the naming convention of the 
generated indexes, as they may differ from the index name on the 
partition table.

reindex.sgml:

Missing a note about REINDEX not being supported on the partition index.

Best regards,
  Jesper


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hello Jesper,

Jesper Pedersen wrote:

> Passes check-world here too w/ TAP + cassert.

Great, thanks for checking.

> index.c:
>  [and a few other comments]

I believe these are all fixed by the attached delta patch.

If you have wording suggestions for the doc changes, please send them
along.

Thanks for reading,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Amit Langote
Date:
Hi Alvaro,

On 2017/12/23 0:10, Alvaro Herrera wrote:
> I believe these are all fixed by the attached delta patch.

@@ -1676,7 +1694,12 @@ psql_completion(const char *text, int start, int end)
                                    "UNION SELECT 'ALL IN TABLESPACE'");
     /* ALTER INDEX <name> */
     else if (Matches3("ALTER", "INDEX", MatchAny))
-        COMPLETE_WITH_LIST5("ALTER COLUMN", "OWNER TO", "RENAME TO",
"SET", "RESET");
+        COMPLETE_WITH_LIST7("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET",
+                            "RESET", "ATTACH PARTITION");

This should be COMPLETE_WITH_LIST6().

Thanks,
Amit



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Amit Langote
Date:
On 2017/12/25 13:52, Amit Langote wrote:
> Hi Alvaro,
> 
> On 2017/12/23 0:10, Alvaro Herrera wrote:
>> I believe these are all fixed by the attached delta patch.
> 
> @@ -1676,7 +1694,12 @@ psql_completion(const char *text, int start, int end)
>                                     "UNION SELECT 'ALL IN TABLESPACE'");
>      /* ALTER INDEX <name> */
>      else if (Matches3("ALTER", "INDEX", MatchAny))
> -        COMPLETE_WITH_LIST5("ALTER COLUMN", "OWNER TO", "RENAME TO",
> "SET", "RESET");
> +        COMPLETE_WITH_LIST7("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET",
> +                            "RESET", "ATTACH PARTITION");
> 
> This should be COMPLETE_WITH_LIST6().

I noticed a few things I thought I'd report.  I was using the latest patch
(v8 + the delta patch).

1. The following crashes because of an Assert in ATExecCmd():

ALTER TABLE an_index DETACH PARTITION ...

It seems ATPrepCmd() should not pass ATT_INDEX to ATSimplePermissions()
for the ATDetachPartition sub-command type.

2. Something strange about how dependencies are managed:

create table p (a char) partition by list (a);
create table pa partition of p for values in ('a');;
create table pb partition of p for values in ('b');
create index on p (a);

\d pa
                   Table "public.pa"
 Column |     Type     | Collation | Nullable | Default
|-------+--------------+-----------+----------+---------
 a      | character(1) |           |          |
Partition of: p FOR VALUES IN ('a')
Indexes:
    "pa_a_idx" btree (a)

drop table pa;
ERROR:  cannot drop table pa because other objects depend on it
DETAIL:  index p_a_idx depends on table pa
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

set client_min_messages to debug2;

drop table pa;
DEBUG:  drop auto-cascades to index pa_a_idx
DEBUG:  drop auto-cascades to index pb_a_idx
DEBUG:  drop auto-cascades to type pa
DEBUG:  drop auto-cascades to type pa[]
ERROR:  cannot drop table pa because other objects depend on it
DETAIL:  index p_a_idx depends on table pa
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Thanks,
Amit



Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Amit Langote wrote:

> 2. Something strange about how dependencies are managed:
> 
> create table p (a char) partition by list (a);
> create table pa partition of p for values in ('a');;
> create table pb partition of p for values in ('b');
> create index on p (a);

> drop table pa;
> ERROR:  cannot drop table pa because other objects depend on it
> DETAIL:  index p_a_idx depends on table pa
> HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Hmm, this is quite nasty and I'm not seeing any reasonable way to fix
it.  The problem is that we have an internal dep from the parent index
to the partition index, which prompts us to recurse the chase of
dependencies inverting the way the dependencies flow -- the "case 3" in
findDependentObjects:
                /*
                 * 3. Not all the owning objects have been visited, so
                 * transform this deletion request into a delete of this
                 * owning object.

The problem is that we don't want the partition index to go away unless
it's together with its parent index, except if the table which contains
it goes away first.  We have no way to specify this condition ...  Maybe
something like sticking a phony additional record in pg_depend that
causes the partition index to get added to the targetObjects list ahead
of time ... but that doesn't work either, because we still want to chase
the internal deps in that case.  Hm.

Maybe we need a new "auto internal" deptype with a mix of semantics of
the other two deptypes.  It seems a bit ugly and I'm not sure it'd work
either ... I'll try to code it tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Maybe we need a new "auto internal" deptype with a mix of semantics of
> the other two deptypes.  It seems a bit ugly and I'm not sure it'd work
> either ... I'll try to code it tomorrow.

This seems to work pretty well, much to my surprise.  I was a bit scared
of adding a new deptype, but actually the only affected code is
findDependentObjects() and the semantics of the new type is a subset of
the existing DEPTYPE_INTERNAL, so I think it's okay.  I need to fill in
its description in docs and comments though -- I left it out because the
real difference between INTERNAL and INTERNAL_AUTO is not something that
is covered by the existing description of INTERNAL, so maybe I'll need
to expand that one.

This version includes the fixes I posted as "delta" to the problems
Jesper reported, as well as fixes to the ones reported by Amit.  It's
looking pretty good to me -- I'm seeing it as a candidate to commit
early in the upcoming commitfest.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
I just realized there's a further problem in the area: when a partition
is detached from its parent, its indexes are not made independent of the
indexes on parent.  So they can't be dropped on their own (booh!); and
dropping the index on the former parent partitioned table drops the
index on the former partition also (hiss!).  I think the fix for this is
to delete all dependencies, and re-register normal ones.  Should be
straightforward, but I'm not doing it this year.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi Alvaro,

On 12/22/2017 10:10 AM, Alvaro Herrera wrote:
> I believe these are all fixed by the attached delta patch.
>

Thanks.

> If you have wording suggestions for the doc changes, please send them
> along.
> 

Maybe we should make the default index name more explicit under the 
'name' parameter as attached.

Best regards,
  Jesper

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi Alvaro,

On 12/29/2017 12:59 PM, Alvaro Herrera wrote:
> This seems to work pretty well, much to my surprise.  I was a bit scared
> of adding a new deptype, but actually the only affected code is
> findDependentObjects() and the semantics of the new type is a subset of
> the existing DEPTYPE_INTERNAL, so I think it's okay.  I need to fill in
> its description in docs and comments though -- I left it out because the
> real difference between INTERNAL and INTERNAL_AUTO is not something that
> is covered by the existing description of INTERNAL, so maybe I'll need
> to expand that one.
> 
> This version includes the fixes I posted as "delta" to the problems
> Jesper reported, as well as fixes to the ones reported by Amit.  It's
> looking pretty good to me -- I'm seeing it as a candidate to commit
> early in the upcoming commitfest.
> 

indexing.sql contains a \di command which list the owner of the db, so 
make check-world fails.

Best regards,
  Jesper


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Jesper Pedersen wrote:

> On 12/22/2017 10:10 AM, Alvaro Herrera wrote:

> > If you have wording suggestions for the doc changes, please send them
> > along.
> 
> Maybe we should make the default index name more explicit under the 'name'
> parameter as attached.

I'm -0.2 on documenting this.  In general, I'm hesitant to promise more
than strictly needed, mostly because it's more difficult to later change
our minds.  For example, what if soon after this patch we decide to
invent a new mechanism for creating names on partitions?  Also, in
reality there are additional considerations (how do we truncate very
long names, what do we do if the name is already taken) that I would
rather not ossify in user-facing documentation.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Peter Eisentraut
Date:
On 12/29/17 15:38, Alvaro Herrera wrote:
> I just realized there's a further problem in the area: when a partition
> is detached from its parent, its indexes are not made independent of the
> indexes on parent.  So they can't be dropped on their own (booh!); and
> dropping the index on the former parent partitioned table drops the
> index on the former partition also (hiss!).  I think the fix for this is
> to delete all dependencies, and re-register normal ones.  Should be
> straightforward, but I'm not doing it this year.

We're waiting for an updated patch in this thread, aren't we?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Peter Eisentraut wrote:
> On 12/29/17 15:38, Alvaro Herrera wrote:
> > I just realized there's a further problem in the area: when a partition
> > is detached from its parent, its indexes are not made independent of the
> > indexes on parent.  So they can't be dropped on their own (booh!); and
> > dropping the index on the former parent partitioned table drops the
> > index on the former partition also (hiss!).  I think the fix for this is
> > to delete all dependencies, and re-register normal ones.  Should be
> > straightforward, but I'm not doing it this year.
> 
> We're waiting for an updated patch in this thread, aren't we?

Dunno.  The problems reported so far are rather very minor details that
don't prevent reviewing the rest of it.  I will of course send an
updated version at some point, but there's only one me so I can either
spend the whole commitfest focused on fixing tiny issues my own patches,
or help with other stuff.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hello

Jesper Pedersen wrote:
> indexing.sql contains a \di command which list the owner of the db, so make
> check-world fails.

Wow, that was a stupid oversight, thanks for pointing it out.  Here's a
version that applies to today's master, and fixes this problem.

No other changes.

(I didn't incorporate your proposal to document the index naming
pattern.  I propose that you post it in its own thread, so that it can
be discussed on its own merits.  Note typo "formular" in your patch; I'd
suggest "pattern" or some such rather than "formula" though.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Wow, that was a stupid oversight, thanks for pointing it out.  Here's a
> version that applies to today's master, and fixes this problem.
> 
> No other changes.

v11 fixes the dependency problem I mentioned in
https://postgr.es/m/20171229203818.pqxf2cyl4g2wre6h@alvherre.pgsql
and adds some test to cover that stuff.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Peter Eisentraut
Date:
On 12/29/17 12:59, Alvaro Herrera wrote:
>> Maybe we need a new "auto internal" deptype with a mix of semantics of
>> the other two deptypes.  It seems a bit ugly and I'm not sure it'd work
>> either ... I'll try to code it tomorrow.
> 
> This seems to work pretty well, much to my surprise.  I was a bit scared
> of adding a new deptype, but actually the only affected code is
> findDependentObjects() and the semantics of the new type is a subset of
> the existing DEPTYPE_INTERNAL, so I think it's okay.  I need to fill in
> its description in docs and comments though -- I left it out because the
> real difference between INTERNAL and INTERNAL_AUTO is not something that
> is covered by the existing description of INTERNAL, so maybe I'll need
> to expand that one.

So this patch appears to implement what was agreed upon earlier in the
thread.   Documentation and tests seem pretty comprehensive.

The new dependency type obviously needs to be explained in detail.

CompareIndexInfo() doesn't compare indexes' operator classes and collations.

I'm not sure why this feature of automatically picking up matching
indexes even exists.  Is it for some specific workflows or upgrade
scenarios?  It's kind of a surprising feature in a way.

The catalog representations of partitioned tables and partitioned
indexes are completely different, which may or may not be desirable.

As mentioned elsewhere already, the tests fail because \di shows the
owner, which can vary between sites.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Thu, Jan 4, 2018 at 11:44 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I'm not sure why this feature of automatically picking up matching
> indexes even exists.  Is it for some specific workflows or upgrade
> scenarios?  It's kind of a surprising feature in a way.

It allows you to avoid building a new indexes unnecessarily when
attaching a partition.

> The catalog representations of partitioned tables and partitioned
> indexes are completely different, which may or may not be desirable.

How so?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Jan 4, 2018 at 11:44 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > I'm not sure why this feature of automatically picking up matching
> > indexes even exists.  Is it for some specific workflows or upgrade
> > scenarios?  It's kind of a surprising feature in a way.
> 
> It allows you to avoid building a new indexes unnecessarily when
> attaching a partition.

Yeah -- this is important to reduce the duration of the DDL operation,
since attaching a partition requires an access exclusive lock.  Having
to build the index at that point would make that feature pretty much
useless for many use cases.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Peter Eisentraut wrote:

> CompareIndexInfo() doesn't compare indexes' operator classes and collations.

Hmm ... will look into these.

> As mentioned elsewhere already, the tests fail because \di shows the
> owner, which can vary between sites.

Already fixed in v10 which I posted this morning.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Peter Eisentraut
Date:
On 1/4/18 12:00, Robert Haas wrote:
>> The catalog representations of partitioned tables and partitioned
>> indexes are completely different, which may or may not be desirable.
> 
> How so?

If someone wants to write a query, show me all the partitions of this
table versus show me all the partitions of this index, intuitively,
those could be the same, different only by some relkind references.
Currently, you'd have to write two completely different queries.  It's
not a big deal, but it's a consideration.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Peter Eisentraut wrote:
> On 1/4/18 12:00, Robert Haas wrote:
> >> The catalog representations of partitioned tables and partitioned
> >> indexes are completely different, which may or may not be desirable.
> > 
> > How so?
> 
> If someone wants to write a query, show me all the partitions of this
> table versus show me all the partitions of this index, intuitively,
> those could be the same, different only by some relkind references.

Well, this "indparentidx" stuff I came up with is a little bit strange.
Perhaps we could use pg_inherits instead, but I think the general view
is that pg_inherits is on its way out.

Tangentially: I didn't like very much that I added a new index to
pg_index to support this feature.  I thought maybe it'd be better to
change the index on indrelid to be on (indrelid,indparentidx) instead,
but that doesn't seem great either because it bloats that index which is
used to support common relcache operations ...


(The more I think of this, the more I believe that pg_inherits is a
better answer.  Opinions?)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 5 January 2018 at 11:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> (The more I think of this, the more I believe that pg_inherits is a
> better answer.  Opinions?)

I admit to not having had a chance to look at any code with this yet,
but I'm just thinking about a case like the following.

CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
PARTITION BY RANGE (b);
CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);

CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
part_a1_b1)

CREATE INDEX ON part (a); -- What do we do here?

Should we:

1. Create another identical index on part_a1_b1; or
2. Allow the existing index on part_a1_b1 to have multiple parents; or
3. ERROR... (probably not)

I guess pg_index.indparentidx won't allow #2 to work. Some pg_inherits
arrangement should.

We don't really want to go creating indexes that we don't need to, so
I think we should probably make an effort to allow #2 to work.

Question is, how likely is the above scenario to take place?

Normally, I see customers requiring partitioning only when an existing
table grows too large for maintenance tasks to complete a reasonable
timeframe. Probably there might also come a time when each of the
partitions they've then gone and created also grows too large. So it
does not seem unrealistic to be attaching existing tables/partitioned
tables as partitions multiple times. Wanting to reuse leaf indexes for
some new higher level partition parent does seem reasonable.

This argument might be voided by if we allowed a DROP INDEX on part_a1
without dropping the leaf indexes. I've not checked what the patch
does here, but I'd imagine if the indexes are marked as parents of
that index, then they'll be dropped.

I'll go off and look at the code now...

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Peter Eisentraut
Date:
On 1/4/18 23:08, David Rowley wrote:
> On 5 January 2018 at 11:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> (The more I think of this, the more I believe that pg_inherits is a
>> better answer.  Opinions?)
> 
> I admit to not having had a chance to look at any code with this yet,
> but I'm just thinking about a case like the following.
> 
> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
> PARTITION BY RANGE (b);
> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);
> 
> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
> part_a1_b1)
> 
> CREATE INDEX ON part (a); -- What do we do here?
> 
> Should we:
> 
> 1. Create another identical index on part_a1_b1; or
> 2. Allow the existing index on part_a1_b1 to have multiple parents; or
> 3. ERROR... (probably not)

4. It should adopt part_a1 and its subindexes into its hierarchy.  That
shouldn't be a problem under the current theory, should it?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Thu, Jan 4, 2018 at 5:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Tangentially: I didn't like very much that I added a new index to
> pg_index to support this feature.  I thought maybe it'd be better to
> change the index on indrelid to be on (indrelid,indparentidx) instead,
> but that doesn't seem great either because it bloats that index which is
> used to support common relcache operations ...
>
> (The more I think of this, the more I believe that pg_inherits is a
> better answer.  Opinions?)

I actually haven't looked at the code, but the idea that pg_inherits
is on the way out is news to me.  If that method will work, I don't
quite see why we should invent something new.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Fri, Jan 5, 2018 at 4:57 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 1/4/18 23:08, David Rowley wrote:
>> On 5 January 2018 at 11:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>> (The more I think of this, the more I believe that pg_inherits is a
>>> better answer.  Opinions?)
>>
>> I admit to not having had a chance to look at any code with this yet,
>> but I'm just thinking about a case like the following.
>>
>> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
>> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
>> PARTITION BY RANGE (b);
>> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);
>>
>> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
>> part_a1_b1)
>>
>> CREATE INDEX ON part (a); -- What do we do here?
>>
>> Should we:
>>
>> 1. Create another identical index on part_a1_b1; or
>> 2. Allow the existing index on part_a1_b1 to have multiple parents; or
>> 3. ERROR... (probably not)
>
> 4. It should adopt part_a1 and its subindexes into its hierarchy.  That
> shouldn't be a problem under the current theory, should it?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Fri, Jan 5, 2018 at 4:57 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 1/4/18 23:08, David Rowley wrote:

> >> I admit to not having had a chance to look at any code with this yet,
> >> but I'm just thinking about a case like the following.
> >>
> >> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
> >> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
> >> PARTITION BY RANGE (b);
> >> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);
> >>
> >> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
> >> part_a1_b1)
> >>
> >> CREATE INDEX ON part (a); -- What do we do here?
> >>
> >> Should we:
> >>
> >> 1. Create another identical index on part_a1_b1; or
> >> 2. Allow the existing index on part_a1_b1 to have multiple parents; or
> >> 3. ERROR... (probably not)
> >
> > 4. It should adopt part_a1 and its subindexes into its hierarchy.  That
> > shouldn't be a problem under the current theory, should it?
> 
> +1.

This is what the code does today.  IIRC there's a test for this exact
scenario.  Now, part_a1_b1 is grandchild of part, not direct parent --
so there exists an intermediate relkind=I index in part_a1, and that is
the one that becomes parent of part_a1_b1, not part's directly.

Now, if you drop the index in part, then it gets dropped in all
descendants too, including part_a1_b1.  If you DETACH the partition
first, then the index remains.  All of that seems good to me, and is as
discussed previously.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi Alvaro,

On 01/04/2018 09:30 AM, Alvaro Herrera wrote:
> v11 fixes the dependency problem I mentioned in
> https://postgr.es/m/20171229203818.pqxf2cyl4g2wre6h@alvherre.pgsql
> and adds some test to cover that stuff.
> 

Thanks, make check-world passes.

Maybe a warning for existing indexes on the same column(s), but with a 
different type, should be emitted during ATTACH, e.g.

-- test.sql --
CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a);
CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 0);
CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 1);
CREATE INDEX idx_test_a ON test (a);

INSERT INTO test (SELECT i FROM generate_series(1, 1000000) AS i);

ANALYZE;

ALTER TABLE test DETACH PARTITION test_p00;
DROP INDEX test_p00_a_idx;
CREATE INDEX test_p00_a_idx ON test_p00 USING hash (a);
ALTER TABLE test ATTACH PARTITION test_p00 FOR VALUES WITH (MODULUS 2, 
REMAINDER 0);

-- test.sql --

Of course, this could also fall under index maintenance and no warning 
emitted. Docs have "Each partition is first checked to determine whether 
an equivalent index already exists," so it is covered.

Best regards,
  Jesper


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Jesper Pedersen wrote:

> Maybe a warning for existing indexes on the same column(s), but with a
> different type, should be emitted during ATTACH, e.g.

> [detach one partition, replace index with a different one, attach
> partition]

> Of course, this could also fall under index maintenance and no warning
> emitted. Docs have "Each partition is first checked to determine whether an
> equivalent index already exists," so it is covered.

Yeah, I'm of two minds about this also -- in the initial versions I had
a code comment wondering exactly about having a hash index in a
partition attached to a btree index on parent.

As another example, having a documents table with two partitions (one
"long term archival" and other "documents currently being messed with")
you could have a text search index which is GIN in the former and GiST
in the latter.  There is a performance argument for doing it that way,
so it's not merely academic.

Anyway, while I think attaching an index that doesn't match the
properties of the index on parent can be a useful feature, on the other
hand it could be surprising (you end up losing an index because it was
matched during attach that you didn't think was going to be matched).
One idea would be to have a way to specify at ATTACH time what to do
about indexes when they don't match exactly, but I think the user
interface would be pretty messy: exactly how different do you want to
allow the indexes to be?  Is an index having one more column than the
one in parent good enough?  I think the answer to this is mostly a
judgement call, and I'd rather not spend my time debating those.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi Alvaro,

On 01/08/2018 03:36 PM, Alvaro Herrera wrote:
> Jesper Pedersen wrote:
> 
>> Maybe a warning for existing indexes on the same column(s), but with a
>> different type, should be emitted during ATTACH, e.g.
> 
>> [detach one partition, replace index with a different one, attach
>> partition]
> 
>> Of course, this could also fall under index maintenance and no warning
>> emitted. Docs have "Each partition is first checked to determine whether an
>> equivalent index already exists," so it is covered.
> 
> Yeah, I'm of two minds about this also -- in the initial versions I had
> a code comment wondering exactly about having a hash index in a
> partition attached to a btree index on parent.
> 
> As another example, having a documents table with two partitions (one
> "long term archival" and other "documents currently being messed with")
> you could have a text search index which is GIN in the former and GiST
> in the latter.  There is a performance argument for doing it that way,
> so it's not merely academic.
> 
> Anyway, while I think attaching an index that doesn't match the
> properties of the index on parent can be a useful feature, on the other
> hand it could be surprising (you end up losing an index because it was
> matched during attach that you didn't think was going to be matched).
> One idea would be to have a way to specify at ATTACH time what to do
> about indexes when they don't match exactly, but I think the user
> interface would be pretty messy: exactly how different do you want to
> allow the indexes to be?  Is an index having one more column than the
> one in parent good enough?  I think the answer to this is mostly a
> judgement call, and I'd rather not spend my time debating those.
> 

Yeah, agreed - lets leave as is.

Migrating an index to another type would mean to drop the top-level 
definition anyway.

Best regards,
  Jesper


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Jan 4, 2018 at 5:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Tangentially: I didn't like very much that I added a new index to
> > pg_index to support this feature.  I thought maybe it'd be better to
> > change the index on indrelid to be on (indrelid,indparentidx) instead,
> > but that doesn't seem great either because it bloats that index which is
> > used to support common relcache operations ...
> >
> > (The more I think of this, the more I believe that pg_inherits is a
> > better answer.  Opinions?)
> 
> I actually haven't looked at the code, but the idea that pg_inherits
> is on the way out is news to me.  If that method will work, I don't
> quite see why we should invent something new.

I removed the pg_index.indparentidx column that previous versions add,
and replaced it with pg_inherits rows.  This makes the code a little bit
bulkier in a couple of places, but nothing terrible.  As a benefit,
there's no extra index in pg_index now.

I fixed some outdated comments, too.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 9 January 2018 at 09:54, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I removed the pg_index.indparentidx column that previous versions add,
> and replaced it with pg_inherits rows.  This makes the code a little bit
> bulkier in a couple of places, but nothing terrible.  As a benefit,
> there's no extra index in pg_index now.

Hi Álvaro,

I've made a pass over this patch. It's mostly in pretty good shape,
but I did find a few things to note down.

1. "either partition" -> "either the partition"

+       so the partition index is dropped together with either
partition it indexes,
+       or with the parent index it is attached to.

2. In findDependentObjects(), should the following if test be below
the ReleaseDeletionLock call?

+ if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO)
+ break;
  ReleaseDeletionLock(object);

3. The following existing comment indicates that we'll be creating a
disk file for the index. Should we update this comment to say that
this is the case only for RELKIND_INDEX?

/*
* create the index relation's relcache entry and physical disk file. (If
* we fail further down, it's the smgr's responsibility to remove the disk
* file again.)
*/

Maybe:

/*
* create the index relation's relcache entry and for non-partitioned
* indexes, a physical disk file. (If we fail further down, it's the
* smgr's responsibility to remove the disk file again.)
*/

4. Extra newline

+ recordDependencyOn(&myself, &referenced, DEPENDENCY_INTERNAL_AUTO);
+ }
+
+

5. Do you think it's out of scope to support expression indexes?

/*
* Expression indexes are currently not considered equal.  Not needed for
* current callers.
*/
if (info1->ii_Expressions != NIL || info2->ii_Expressions != NIL)
return false;

6. pg_inherits.inhseqno is int, not smallint. I understand you copied
this from StoreCatalogInheritance1(), so maybe a fix should be put in
before this patch is?

void
StoreSingleInheritance(Oid relationId, Oid parentOid, int16 seqNumber)

and

values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber);

7. Is the following a bug fix? If so, should it not be fixed and
backpatched, rather than sneaked in here?

@@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid
newOwnerId, bool recursing, LOCKMODE lock
  * relation, as well as its toast table (if it has one).
  */
  if (tuple_class->relkind == RELKIND_RELATION ||
+ tuple_class->relkind == RELKIND_PARTITIONED_TABLE ||

8. Missing if (attachInfos) pfree(attachInfos);

+ if (idxes)
+ pfree(idxes);
+ if (attachRelIdxs)
+ pfree(attachRelIdxs);

9. The first OR branch of the Assert will always be false, so might as
well get rid of it.

+ if (!has_superclass(idxid))
+ continue;
+
+ Assert(!has_superclass(idxid) ||
+    (IndexGetRelation(get_partition_parent(idxid), false) ==
+    RelationGetRelid(rel)));

10. lock -> locks:

/* we already hold lock on both tables, so this is safe: */

11. "already the right" -> "already in the right"

/* Silently do nothing if already the right state */

12. It seems to be RangeVarGetRelidExtended() that locks the partition
index, not the callback.

/* no deadlock risk: our callback above already acquired the lock */

13. We seem to only be holding an AccessShareLock on the partitioned
table. What happens if someone concurrently does ALTER TABLE
partitioned_table DETACH our_partition;?

parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock);

and;
/* Make sure it indexes a partition of the other index's table */
partDesc = RelationGetPartitionDesc(parentTbl);
found = false;
for (i = 0; i < partDesc->nparts; i++)
{
if (partDesc->oids[i] == state.partitionOid)
{
found = true;
break;
}
}

Wouldn't you also need an AccessExclusiveLock to prevent another
session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index;
at the same time? The concurrent session could be trying to ATTACH it
to a subpartition and this comment could be attaching to the parent.

14. validatePartitionedIndex does not verify that the index it is
checking is valid. Consider the following:

create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in(1) partition by list(b)
create table part1 partition of part for values in(1) partition by list(b);
create table part2 partition of part1 for values in (1);
create index on only part1 (a);
create index on only part (a);
alter index part_a_idx attach partition part1_a_idx;
\d part
                Table "public.part"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 b      | integer |           |          |
Partition key: LIST (a)
Indexes:
    "part_a_idx" btree (a) <--- should be INVALID
Number of partitions: 1 (Use \d+ to list them.)

\d part1
               Table "public.part1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 b      | integer |           |          |
Partition of: part FOR VALUES IN (1)
Partition key: LIST (b)
Indexes:
    "part1_a_idx" btree (a) INVALID
Number of partitions: 1 (Use \d+ to list them.)

But that probably means you also need more code to search up to parent
partitions and try and validate any invalid indexes once a
sub-partition index is made valid.

As, if the part_a_idx had correctly been marked as INVALID, and then we do:

CREATE INDEX part2_a_idx ON part2 (a);
ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx;

This would validate part1_a_idx, but we'd also then need to attempt
validation on part_a_idx. (I'm still reading, so perhaps you've
already thought of that.)

15. Can you explain why you do the following for CREATE INDEX, but not
for CREATE INDEX IF NOT EXISTS?

@@ -7338,6 +7363,7 @@ IndexStmt: CREATE opt_unique INDEX
opt_concurrently opt_index_name
  n->concurrent = $4;
  n->idxname = $5;
  n->relation = $7;
+ n->relationId = InvalidOid;

It's maybe not required anyway since makeNode() will memset zero the
memory, but I think it's best to do it unless I've misunderstood what
it's for.

16. It's not exactly a problem, but I had a quick look and I don't see
any other uses of list_free() checking for a non-NIL list first:

+ if (inheritors)
+ list_free(inheritors);

Probably might as well just remove the if test.

17. Just a stylistic thing; I think it would be neater to set isindex
using the existing switch(). It would just need some case shuffling.

@@ -439,6 +440,7 @@ RelationParseRelOptions(Relation relation, HeapTuple tuple)
  case RELKIND_RELATION:
  case RELKIND_TOASTVALUE:
  case RELKIND_INDEX:
+ case RELKIND_PARTITIONED_INDEX:
  case RELKIND_VIEW:
  case RELKIND_MATVIEW:
  case RELKIND_PARTITIONED_TABLE:
@@ -452,10 +454,12 @@ RelationParseRelOptions(Relation relation,
HeapTuple tuple)
  * we might not have any other for pg_class yet (consider executing this
  * code for pg_class itself)
  */
+ isindex = relation->rd_rel->relkind == RELKIND_INDEX ||
+ relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX;

Or maybe it's even better to set relation->rd_amroutine->amoptions
into a local variable and use it unconditionally in the call to
extractRelOptions().

18. At most it can't do any harm, but is the following still needed? I
originally thought this would have been for pg_index changes. What's
changed?

-#define CATALOG_VERSION_NO 201712251
+#define CATALOG_VERSION_NO 201712291

I admit to not quite being as thorough in my review with the pg_dump code.

The tests seem fine, but I'd like to see a test for #14 once that's fixed.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

> Hi Álvaro,

Hi David,

Thanks for the review.  Attached is a delta patch that fixes most
things, except your item 14 below.

Before getting into the details of the items you list, in this version I
also fixed a couple of issues noted by Jaime Casanova; namely

a) ALTER INDEX SET TABLESPACE threw an error for partitioned indexes.
Handled the same was as for partitioned tables: silently do nothing.

b) Expression indexes not considered at all.  (You also commented on
this point).  Jaime reported it for this case:
  create index on t_part USING gin (string_to_array(btrim((t)::text, '-'::text), '--'::text));

I dealt with this by adding code to CompareIndexInfo that runs equal()
on the expression lists, after applying map_variable_attnos() (to
support the case of attaching a table with dropped or reordered
columns).  As far as I can see, this works correctly for very simple
expressions, though I didn't test Jaime's case specifically; I suppose I
should add a few more tests.

On to your notes:

> 1. "either partition" -> "either the partition"
> 4. Extra newline
> 8. Missing if (attachInfos) pfree(attachInfos);
> 10. lock -> locks:
> 11. "already the right" -> "already in the right"
> 16. It's not exactly a problem, but I had a quick look and I don't see
> any other uses of list_free() checking for a non-NIL list first:

Fixed.

> 2. In findDependentObjects(), should the following if test be below
> the ReleaseDeletionLock call?
> 
> + if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO)
> + break;
>   ReleaseDeletionLock(object);

No, as far as I understand we should keep that lock if we break out of
the loop -- it's the lock on the object being deleted (the index
partition in this case).

I did break the comment in two, so that now the "First," paragraph
appears below the "if () break" test.  That seems to make more sense.
It looks like this now:

                /*
                 * 3. Not all the owning objects have been visited, so
                 * transform this deletion request into a delete of this
                 * owning object.
                 *
                 * For INTERNAL_AUTO dependencies, we don't enforce this;
                 * in other words, we don't follow the links back to the
                 * owning object.
                 */
                if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO)
                    break;

                /*
                 * First, release caller's lock on this object and get
                 * deletion lock on the owning object.  (We must release
                 * caller's lock to avoid deadlock against a concurrent
                 * deletion of the owning object.)
                 */
                ReleaseDeletionLock(object);
                AcquireDeletionLock(&otherObject, 0);


> 3. The following existing comment indicates that we'll be creating a
> disk file for the index. Should we update this comment to say that
> this is the case only for RELKIND_INDEX?

> Maybe:
> 
> /*
> * create the index relation's relcache entry and for non-partitioned
> * indexes, a physical disk file. (If we fail further down, it's the
> * smgr's responsibility to remove the disk file again.)
> */

I used this:
    /*
     * create the index relation's relcache entry and, if necessary, the
     * physical disk file. (If we fail further down, it's the smgr's
     * responsibility to remove the disk file again, if any.)
     */

> 5. Do you think it's out of scope to support expression indexes?

Answered above.

> 6. pg_inherits.inhseqno is int, not smallint. I understand you copied
> this from StoreCatalogInheritance1(), so maybe a fix should be put in
> before this patch is?

Hmm, yeah, will fix.

> 7. Is the following a bug fix? If so, should it not be fixed and
> backpatched, rather than sneaked in here?
> 
> @@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid
> newOwnerId, bool recursing, LOCKMODE lock
>   * relation, as well as its toast table (if it has one).
>   */
>   if (tuple_class->relkind == RELKIND_RELATION ||
> + tuple_class->relkind == RELKIND_PARTITIONED_TABLE ||

No, it's not a bug fix -- this block is only concerned with searching
index for this relkind, and prior to this patch PARTITIONED_TABLEs do
not have indexes, so it's ok not to consider the case.

> 9. The first OR branch of the Assert will always be false, so might as
> well get rid of it.

True.  Removed.

> 12. It seems to be RangeVarGetRelidExtended() that locks the partition
> index, not the callback.
> 
> /* no deadlock risk: our callback above already acquired the lock */

Hmm ... yeah, you're right.  Fixed.

> 13. We seem to only be holding an AccessShareLock on the partitioned
> table. What happens if someone concurrently does ALTER TABLE
> partitioned_table DETACH our_partition;?
> 
> parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock);
> 
> and;
> /* Make sure it indexes a partition of the other index's table */
> partDesc = RelationGetPartitionDesc(parentTbl);
> found = false;
> for (i = 0; i < partDesc->nparts; i++)
> {
> if (partDesc->oids[i] == state.partitionOid)
> {
> found = true;
> break;
> }
> }
> 
> Wouldn't you also need an AccessExclusiveLock to prevent another
> session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index;
> at the same time? The concurrent session could be trying to ATTACH it
> to a subpartition and this comment could be attaching to the parent.

But AccessShare does conflict with AccessExclusive.  So even with just
AccessShare we've already prevented others from detaching/attaching
partitions.

I think I should spend a bit of time going over the locking
considerations again and make sure it is all sound.

> 14. validatePartitionedIndex does not verify that the index it is
> checking is valid. Consider the following:
> 
> create table part (a int, b int) partition by list(a);
> create table part1 partition of part for values in(1) partition by list(b)
> create table part1 partition of part for values in(1) partition by list(b);
> create table part2 partition of part1 for values in (1);
> create index on only part1 (a);
> create index on only part (a);
> alter index part_a_idx attach partition part1_a_idx;
> \d part
>                 Table "public.part"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  a      | integer |           |          |
>  b      | integer |           |          |
> Partition key: LIST (a)
> Indexes:
>     "part_a_idx" btree (a) <--- should be INVALID
> Number of partitions: 1 (Use \d+ to list them.)
> 
> \d part1
>                Table "public.part1"
>  Column |  Type   | Collation | Nullable | Default
> --------+---------+-----------+----------+---------
>  a      | integer |           |          |
>  b      | integer |           |          |
> Partition of: part FOR VALUES IN (1)
> Partition key: LIST (b)
> Indexes:
>     "part1_a_idx" btree (a) INVALID
> Number of partitions: 1 (Use \d+ to list them.)
> 
> But that probably means you also need more code to search up to parent
> partitions and try and validate any invalid indexes once a
> sub-partition index is made valid.
> 
> As, if the part_a_idx had correctly been marked as INVALID, and then we do:
> 
> CREATE INDEX part2_a_idx ON part2 (a);
> ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx;
> 
> This would validate part1_a_idx, but we'd also then need to attempt
> validation on part_a_idx. (I'm still reading, so perhaps you've
> already thought of that.)

In prior incarnations of the patch, I had an if-test to prevent
attaching invalid indexes, but I decided to remove it at some point --
mainly thinking of attaching a partition for which a CREATE INDEX
CONCURRENTLY was running which already had the index as invalid and was
later expected to become valid.  I suppose that doesn't really work
anyway because of locking considerations (you can't attach a partition
in which CIC is concurrently running, can you).  I'll think some more
about this case and post an updated version later.

> 15. Can you explain why you do the following for CREATE INDEX, but not
> for CREATE INDEX IF NOT EXISTS?

I cannot -- just an oversight.  Fixed.  It's true that makeNode() fixes
the hole, but we're supposed to be strict about initializing structures.

> 17. Just a stylistic thing; I think it would be neater to set isindex
> using the existing switch(). It would just need some case shuffling.

> Or maybe it's even better to set relation->rd_amroutine->amoptions
> into a local variable and use it unconditionally in the call to
> extractRelOptions().

Yeah, that last suggestion leads to simpler code, so I did it that way.

> 18. At most it can't do any harm, but is the following still needed? I
> originally thought this would have been for pg_index changes. What's
> changed?
> 
> -#define CATALOG_VERSION_NO 201712251
> +#define CATALOG_VERSION_NO 201712291

Yeah, probably not needed anymore.  Will lose it.

> The tests seem fine, but I'd like to see a test for #14 once that's fixed.

Sure thing.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 11 January 2018 at 04:37, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> In prior incarnations of the patch, I had an if-test to prevent
> attaching invalid indexes, but I decided to remove it at some point --
> mainly thinking of attaching a partition for which a CREATE INDEX
> CONCURRENTLY was running which already had the index as invalid and was
> later expected to become valid.  I suppose that doesn't really work
> anyway because of locking considerations (you can't attach a partition
> in which CIC is concurrently running, can you).  I'll think some more
> about this case and post an updated version later.

I guess CIC will need to check if the index has a parent index when
setting indisvalid = true, and do likewise to the parent index if all
other siblings are valid.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
The delta patch turned out to have at least one stupid bug, and a few
non-stupid bugs.  Here's an updated version that should behave
correctly, and addresses all reported problems.

(One minor issue Jaime found and I hadn't listed is that alter_index and
alter_table SGML pages said "ATTACH" instead of "ATTACH PARTITION".
Fixed those too for this version.)

I'll set this as Ready for Committer now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Peter Eisentraut
Date:
On 1/11/18 13:52, Alvaro Herrera wrote:
> The delta patch turned out to have at least one stupid bug, and a few
> non-stupid bugs.  Here's an updated version that should behave
> correctly, and addresses all reported problems.

It seems that CompareIndexInfo() still doesn't compare indexes' operator
classes and collations.

Also, some new test cases for pg_dump would be nice.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 12 January 2018 at 07:52, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> The delta patch turned out to have at least one stupid bug, and a few
> non-stupid bugs.  Here's an updated version that should behave
> correctly, and addresses all reported problems.

Thanks for updating the patch.

I've just made another pass over the patch and have a few more comments.

1. Expression varattnos don't seem to get translated correctly.
drop table p;
create table p (a int, b int) partition by list (a);
create table p1 (b int, a int);
alter table p attach partition p1 for values in(1);
create index on p (abs(a));
\d p1
                 Table "public.p1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 b      | integer |           |          |
 a      | integer |           |          |
Partition of: p FOR VALUES IN (1)
Indexes:
    "p1_abs_idx" btree (abs(b))

CompareIndexInfo() seems to validate existing indexes correctly.

Can you also write a test for this?

2. No stats are gathered for the partitioned expression index.

drop table p;
create table p (a int, b int) partition by range (a);
create table p1 (b int, a int);
alter table p attach partition p1 for values from (1) to (1001);
create index on p1 (abs(a));
create index on p (abs(a));
insert into p select x,x from generate_series(1,1000) x;
analyze p, p1;

select * from pg_statistic where starelid = 'p_abs_idx'::regclass;
select * from pg_statistic where starelid = 'p1_abs_idx'::regclass;

3. deadlocking: I've not yet debugged this to see if this can be avoided.

drop table p;
create table p (a int not null) partition by list (a);
create table p1 partition of p for values in(1);
insert into p1 select 1 from generate_Series(1,10000000);

-- Session 1:
create index concurrently on p1 (a);

-- Session 2:
create index on p (a);

-- Session 1:
ERROR:  deadlock detected
DETAIL:  Process 10860 waits for ShareLock on virtual transaction
3/97; blocked by process 7968.
Process 7968 waits for ShareLock on relation 90492 of database 12342;
blocked by process 10860.
HINT:  See server log for query details.

4. nparts initialized twice in DefineIndex()

int nparts = partdesc->nparts;
Oid    *part_oids;
TupleDesc parentDesc;
bool invalidate_parent = false;

nparts = partdesc->nparts;

5. In DefineIndex, should the following have a heap_freetuple(newtup); ?

newtup = heap_copytuple(tup);
((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false;
CatalogTupleUpdate(pg_index, &tup->t_self, newtup);
ReleaseSysCache(tup);
heap_close(pg_index, RowExclusiveLock);

6. Header comment in ReindexPartitionedIndex() claims it "obtains the
list of children and releases the lock on parent before applying
reindex on each child.", but it does not do that. It just returns an
error.

7. I see you changed StoreSingleInheritance to have the seqnumber as
int32 instead of int16, but StoreCatalogInheritance1 still has an
int16 seqNumber parameter. Maybe that should be fixed independently
and backpatched?

8. ATExecCmd: You've added an Assert() in the DETACH PARTITION case to
ensure we're working with a table, but what makes you certain that the
"else" case on the ATTACH PARTITION is always going to get a
partitioned index?

Maybe it would be better to just move the Assert()s into the function
being called?

9. Error details claim that p2_a_idx is not a partition of p.
Shouldn't it say table "p2" is not a partition of "p"?

DROP TABLE p;
CREATE TABLE p (a INT NOT NULL, b INT NOT NULL) PARTITION BY LIST (a);
CREATE TABLE p1 PARTITION OF p FOR VALUES IN(1) PARTITION BY LIST (b);
CREATE TABLE p2 PARTITION OF p1 FOR VALUES IN(1);

CREATE INDEX p_a_idx ON ONLY p (a);

CREATE INDEX p2_a_idx ON p2 (a);
ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx;
ERROR:  cannot attach index "p2_a_idx" as a partition of index "p_a_idx"
DETAIL:  Index "p2_a_idx" is not on a partition of table "p".

10. You've added code to get_relation_info() to handle partitioned
indexes, but that code is skipped due to:

/*
* Make list of indexes.  Ignore indexes on system catalogs if told to.
* Don't bother with indexes for an inheritance parent, either.
*/
if (inhparent ||

The new code should either be removed, or you should load the index
list for a partitioned table.

11. In ProcessUtilitySlow(), the following if needs to check if we're
dealing with a partitioned table:

/* Also, lock any descendant tables if recursive */
if (stmt->relation->inh)
inheritors = find_all_inheritors(relid, lockmode, NULL);

Otherwise we will (perhaps harmlessly) lock children in the following case:

DROP TABLE p;
CREATE TABLE p (a INT NOT NULL, b INT NOT NULL);
CREATE TABLE p1 (a INT NOT NULL, b INT NOT NULL) INHERITS (p);
CREATE INDEX ON p (a);

> I'll set this as Ready for Committer now.

Will set back to waiting on author until the above things are addressed.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

> I've just made another pass over the patch and have a few more comments.

Thanks!  I'm a bit tied up in a few other things ATM, so an updated
version will have to wait a couple of days.

> 1. Expression varattnos don't seem to get translated correctly.

Yesterday while creating a test case for something suggested by Jaime, I
found that the same problem of bogus attno mapping also appears in
predicates of partial indexes.  My overall conclusion is that I need to
examine varattno mapping carefully, because it seems broken in several
places.

> 2. No stats are gathered for the partitioned expression index.
> 
> drop table p;
> create table p (a int, b int) partition by range (a);
> create table p1 (b int, a int);
> alter table p attach partition p1 for values from (1) to (1001);
> create index on p1 (abs(a));
> create index on p (abs(a));
> insert into p select x,x from generate_series(1,1000) x;
> analyze p, p1;
> 
> select * from pg_statistic where starelid = 'p_abs_idx'::regclass;
> select * from pg_statistic where starelid = 'p1_abs_idx'::regclass;

Fun :-(  I think this is down to expand_vacuum_rel() not considering a
table worth vacuuming.  While I agree that this is a problem, I'm
disclaiming responsibility for fixing it for the time being.  Seems well
outside the scope of this patch.

> 3. deadlocking: I've not yet debugged this to see if this can be avoided.

Yeah, we discussed this before.  I think there's not a lot of room for
improvement in this particular case, though it's pretty unfortunate.

> 4. nparts initialized twice in DefineIndex()
> 
> int nparts = partdesc->nparts;
> Oid    *part_oids;
> TupleDesc parentDesc;
> bool invalidate_parent = false;
> 
> nparts = partdesc->nparts;

Hah.  Having second, third and fourth thoughts on restructuring the
block can do that ... seems I just need a fifth thought.

> 5. In DefineIndex, should the following have a heap_freetuple(newtup); ?
> 
> newtup = heap_copytuple(tup);
> ((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false;
> CatalogTupleUpdate(pg_index, &tup->t_self, newtup);
> ReleaseSysCache(tup);
> heap_close(pg_index, RowExclusiveLock);

I can add it, but *shrug*.  We're not too stressed about never leaking
memory in heavy-handed DDL operations.  Consider indexInfo in the same
routine, for example: we just leave it for end-of-transaction to clean
up.

> 6. Header comment in ReindexPartitionedIndex() claims it "obtains the
> list of children and releases the lock on parent before applying
> reindex on each child.", but it does not do that. It just returns an
> error.

Yeah, it's a stub for now.  We can implement it later.  Fixed the
comment.

> 7. I see you changed StoreSingleInheritance to have the seqnumber as
> int32 instead of int16, but StoreCatalogInheritance1 still has an
> int16 seqNumber parameter. Maybe that should be fixed independently
> and backpatched?

Yes.

> 8. ATExecCmd: You've added an Assert() in the DETACH PARTITION case to
> ensure we're working with a table, but what makes you certain that the
> "else" case on the ATTACH PARTITION is always going to get a
> partitioned index?
> 
> Maybe it would be better to just move the Assert()s into the function
> being called?

> 9. Error details claim that p2_a_idx is not a partition of p.
> Shouldn't it say table "p2" is not a partition of "p"?

You missed the "on" in the DETAIL:
  DETAIL:  Index "p2_a_idx" is not on a partition of table "p".
You could argue that this is obscurely worded, but if you look at the
command:
   ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx;
nowhere is table p2 mentioned, so I'm not sure it's a great idea to
mention the table in the error message.
> 
> 10. You've added code to get_relation_info() to handle partitioned
> indexes, but that code is skipped [...]
> The new code should either be removed, or you should load the index
> list for a partitioned table.

That's a leftover from previous versions too.  YAGNI principle says I
should remove it rather than activate it, I think, since the optimizer
is not going to use the data for anything.

> 11. In ProcessUtilitySlow(), the following if needs to check if we're
> dealing with a partitioned table:

Sure thing; fixed.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
David Rowley
Date:
On 17 January 2018 at 03:58, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> 9. Error details claim that p2_a_idx is not a partition of p.
>> Shouldn't it say table "p2" is not a partition of "p"?
>
> You missed the "on" in the DETAIL:
>   DETAIL:  Index "p2_a_idx" is not on a partition of table "p".
> You could argue that this is obscurely worded, but if you look at the
> command:
>    ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx;
> nowhere is table p2 mentioned, so I'm not sure it's a great idea to
> mention the table in the error message.

I think I did miss the "on".

>> 10. You've added code to get_relation_info() to handle partitioned
>> indexes, but that code is skipped [...]
>> The new code should either be removed, or you should load the index
>> list for a partitioned table.
>
> That's a leftover from previous versions too.  YAGNI principle says I
> should remove it rather than activate it, I think, since the optimizer
> is not going to use the data for anything.

Agreed.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Robert Haas
Date:
On Tue, Jan 16, 2018 at 6:08 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 17 January 2018 at 03:58, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>> 9. Error details claim that p2_a_idx is not a partition of p.
>>> Shouldn't it say table "p2" is not a partition of "p"?
>>
>> You missed the "on" in the DETAIL:
>>   DETAIL:  Index "p2_a_idx" is not on a partition of table "p".
>> You could argue that this is obscurely worded, but if you look at the
>> command:
>>    ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx;
>> nowhere is table p2 mentioned, so I'm not sure it's a great idea to
>> mention the table in the error message.
>
> I think I did miss the "on".

I think you will not be the only person to make that mistake.  I think
it would be better phrased as

DETAIL: "p2_a_idx" is not an index of any partition of table "p"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Peter Eisentraut wrote:
> On 1/11/18 13:52, Alvaro Herrera wrote:
> > The delta patch turned out to have at least one stupid bug, and a few
> > non-stupid bugs.  Here's an updated version that should behave
> > correctly, and addresses all reported problems.
> 
> It seems that CompareIndexInfo() still doesn't compare indexes' operator
> classes and collations.
> 
> Also, some new test cases for pg_dump would be nice.

Fixed CompareIndexInfo to compare collations and opfamilies; also added
tests about that.

I think I fixed all the items David reported too, including rewording
the error message to Robert's suggestion.  (One thing remaining is the
int16 used in StoreCatalogInheritance1 signature.)

pg_dump tests are still missing here, but I think this version is good
enough step forward.  I'll add a few tests for pg_dump, and see about
getting this pushed early tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Here's a patch to add pg_dump tests.  This verifies that the
CREATE INDEX on parent and partition appear, as well as ALTER INDEX ..
ATTACH PARTITION.

While doing this, I noticed a small bug: the ALTER INDEX would not be
dumped when only one of the schemas are specified to pg_dump (schema of
parent and schema of partition), but it would be if both were specified.
This patch fixes that problem too.  AFAICS the test is correct after
that fix (ie. the "like" and "unlike" sets are correct now.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Jesper Pedersen
Date:
Hi Alvaro,

On 01/18/2018 04:43 PM, Alvaro Herrera wrote:
> Here's a patch to add pg_dump tests.  This verifies that the
> CREATE INDEX on parent and partition appear, as well as ALTER INDEX ..
> ATTACH PARTITION.
> 
> While doing this, I noticed a small bug: the ALTER INDEX would not be
> dumped when only one of the schemas are specified to pg_dump (schema of
> parent and schema of partition), but it would be if both were specified.
> This patch fixes that problem too.  AFAICS the test is correct after
> that fix (ie. the "like" and "unlike" sets are correct now.)
> 

I get

pg_restore: creating INDEX "pg_catalog.pg_authid_oid_index"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 5493; 1259 2677 INDEX 
pg_authid_oid_index jpedersen
pg_restore: [archiver (db)] could not execute query: ERROR:  permission 
denied: "pg_authid" is a system catalog
     Command was:
-- For binary upgrade, must preserve pg_class oids
SELECT 
pg_catalog.binary_upgrade_set_next_index_pg_class_oid('2677'::pg_catalog.oid);

CREATE UNIQUE INDEX "pg_authid_oid_index" ON "pg_authid" USING "btree" 
("oid");

during check-world.

Best regards,
  Jesper


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Hi Jesper

Jesper Pedersen wrote:

> I get
> 
> pg_restore: creating INDEX "pg_catalog.pg_authid_oid_index"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 5493; 1259 2677 INDEX
> pg_authid_oid_index jpedersen
> pg_restore: [archiver (db)] could not execute query: ERROR:  permission
> denied: "pg_authid" is a system catalog
>     Command was:
> -- For binary upgrade, must preserve pg_class oids
> SELECT pg_catalog.binary_upgrade_set_next_index_pg_class_oid('2677'::pg_catalog.oid);
> 
> CREATE UNIQUE INDEX "pg_authid_oid_index" ON "pg_authid" USING "btree"
> ("oid");
> 
> during check-world.

Wow, thanks for reporting.  That's freaking weird ...  Maybe I need to
use a more restrictive condition.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
David Rowley wrote:

> 10. You've added code to get_relation_info() to handle partitioned
> indexes, but that code is skipped due to:
> 
> /*
> * Make list of indexes.  Ignore indexes on system catalogs if told to.
> * Don't bother with indexes for an inheritance parent, either.
> */
> if (inhparent ||
> 
> The new code should either be removed, or you should load the index
> list for a partitioned table.

I realized that this wasn't an oversight after all.  You can invoke this
code, and it fails in current master:

CREATE TABLE at_partitioned (col1 INT) PARTITION BY RANGE (col1);
CREATE INDEX ON at_partitioned (col1);
CREATE TABLE at_regular2 (col2 INT);
SELECT col2 FROM at_regular2 fk LEFT OUTER JOIN at_partitioned pk ON (col1 = col2);

fails with:
ERROR:  could not open file "base/16384/16395": No such file or directory

where the mentioned path corresponds to the index filenode:

alvherre=# select oid::regclass, relkind from pg_class where relfilenode = '16395';
           oid           │ relkind 
─────────────────────────┼─────────
 at_partitioned_col1_idx │ I
(1 fila)


The error comes from get_relation_info called for at_partitioned, where the
rte->inh flag is false.  So this is the correct behavior:  we consider the
table on its own, not as an inheritance parent, and so the code is not skipped.
Backtrace:

(gdb) bt
#0  errfinish (dummy=0)
    at /pgsql/source/master/src/backend/utils/error/elog.c:414
#1  0x0000564936f79365 in mdopen (reln=<optimized out>, 
    forknum=forknum@entry=MAIN_FORKNUM, behavior=behavior@entry=1)
    at /pgsql/source/master/src/backend/storage/smgr/md.c:606
#2  0x0000564936f7983b in mdopen (behavior=1, forknum=MAIN_FORKNUM, 
    reln=0x5649382f8dd0)
    at /pgsql/source/master/src/backend/storage/smgr/md.c:922
#3  mdnblocks (reln=0x5649382f8dd0, forknum=MAIN_FORKNUM)
    at /pgsql/source/master/src/backend/storage/smgr/md.c:875
#4  0x0000564936eeb6ea in get_relation_info (root=root@entry=0x5649382ef398, 
    relationObjectId=16385, inhparent=<optimized out>, 
    rel=rel@entry=0x5649382f0198)
    at /pgsql/source/master/src/backend/optimizer/util/plancat.c:377
#5  0x0000564936eeee55 in build_simple_rel (root=root@entry=0x5649382ef398, 
    relid=2, parent=parent@entry=0x0)
    at /pgsql/source/master/src/backend/optimizer/util/relnode.c:182
#6  0x0000564936ec6f23 in add_base_rels_to_query (
    root=root@entry=0x5649382ef398, jtnode=<optimized out>)
    at /pgsql/source/master/src/backend/optimizer/plan/initsplan.c:113

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
I think this is the right fix for this problem.  I wonder about
exploring other callers of RelationGetIndexList to see who else could be
confused ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] Proposal: Local indexes for partitioned table

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> I think this is the right fix for this problem.  I wonder about
> exploring other callers of RelationGetIndexList to see who else could be
> confused ...

CLUSTER was also affected (and ALTER TABLE .. CLUSTER ON).  Patched both
and added simple tests.  Couldn't detect any other immediate problems.
Maybe UNIQUE indexes will have a similar hazard, with replica identity.

Case closed.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services