Thread: [HACKERS] Proposal: Local indexes for partitioned table
Hi hackers! As I've understood from thread [1] the main issue of creating local indexes for partitions is supporting REINDEX and DROP INDEX operations on parent partitioned tables. Furthermore Robert Haas mentioned the problem of creating index on key that is represented in partitions with single value (or primitive interval) [1] i.e. under the list-partitioning or range-partitioning with unit interval. I would like to propose the following solution: 1. Create index for hierarchy of partitioned tables and partitions recursively. Don't create relfilenode for indexes on parents, only entries in catalog (much like the partitioned table's storage elimination in [2]). Abstract index for partitioned tables is only for the reference on indexes of child tables to perform REINDEX and DROP INDEX operations. 2. Specify created indexes in pg_depend table so that indexes of child tables depend on corresponding indexes of parent tables with type of dependency DEPENDENCY_NORMAL so that index could be removed separately for partitions and recursively/separately for partitioned tables. 3. REINDEX on index of partitioned table would perform this operation on existing indexes of corresponding partitions. In this case it is necessary to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times in a row. Any thoughts? 1. https://www.postgresql.org/message-id/CA+TgmoZUwj=QYnaK+F7xEf4w_e2g3XxdMnSNZMZjuinHRcOB8A@mail.gmail.com 2. https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Wed, Mar 1, 2017 at 4:23 PM, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote: > As I've understood from thread [1] the main issue of creating local indexes > for partitions is supporting REINDEX and DROP INDEX operations on parent > partitioned tables. Furthermore Robert Haas mentioned the problem of > creating index on key that is represented in partitions with single value > (or primitive interval) [1] i.e. under the list-partitioning or > range-partitioning with unit interval. > > I would like to propose the following solution: > > 1. Create index for hierarchy of partitioned tables and partitions > recursively. Don't create relfilenode for indexes on parents, only entries > in catalog (much like the partitioned table's storage elimination in [2]). > Abstract index for partitioned tables is only for the reference on indexes > of child tables to perform REINDEX and DROP INDEX operations. > > 2. Specify created indexes in pg_depend table so that indexes of child > tables depend on corresponding indexes of parent tables with type of > dependency DEPENDENCY_NORMAL so that index could be removed separately for > partitions and recursively/separately for partitioned tables. > > 3. REINDEX on index of partitioned table would perform this operation on > existing indexes of corresponding partitions. In this case it is necessary > to consider such operations as REINDEX SCHEMA | DATABASE | SYSTEM so that > partitions' indexes wouldn't be re-indexed multiple times in a row. > > Any thoughts? Sounds generally good. One thing to keep in mind is that - in this system - a UNIQUE index on the parent doesn't actually guarantee uniqueness across the whole partitioning hierarchy unless it so happens that the index columns or expressions are the same as the partitioning columns or expressions. That's a little a counterintuitive, and people have already been complaining that a partitioned table + partitions doesn't look enough like a plain table. However, I'm not sure there's a better alternative, because somebody might want partition-wise unique indexes even though that doesn't guarantee global uniqueness. So I think if someday we have global indexes, then we can plan to use some other syntax for that, like CREATE GLOBAL [ UNIQUE ] INDEX. But, of course, that's just my opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02.03.2017 11:41, Robert Haas wrote: > Sounds generally good. One thing to keep in mind is that - in this > system - a UNIQUE index on the parent doesn't actually guarantee > uniqueness across the whole partitioning hierarchy unless it so > happens that the index columns or expressions are the same as the > partitioning columns or expressions. That's a little a > counterintuitive, and people have already been complaining that a > partitioned table + partitions doesn't look enough like a plain table. > However, I'm not sure there's a better alternative, because somebody > might want partition-wise unique indexes even though that doesn't > guarantee global uniqueness. So I think if someday we have global > indexes, then we can plan to use some other syntax for that, like > CREATE GLOBAL [ UNIQUE ] INDEX. Yes, I absolutely agree with your message that cross-partition uniqueness is guaranteed through global index on partitioned table apart from the case when the index key are the same as partitioning key (or index comprises partitioning key in general). Thanks for your comment. I'll try to propose the first patches as soon as possible. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 01.03.2017 13:53, Maksim Milyutin wrote: > Hi hackers! > > As I've understood from thread [1] the main issue of creating local > indexes for partitions is supporting REINDEX and DROP INDEX operations > on parent partitioned tables. Furthermore Robert Haas mentioned the > problem of creating index on key that is represented in partitions with > single value (or primitive interval) [1] i.e. under the > list-partitioning or range-partitioning with unit interval. > > I would like to propose the following solution: > > 1. Create index for hierarchy of partitioned tables and partitions > recursively. Don't create relfilenode for indexes on parents, only > entries in catalog (much like the partitioned table's storage > elimination in [2]). Abstract index for partitioned tables is only for > the reference on indexes of child tables to perform REINDEX and DROP > INDEX operations. > > 2. Specify created indexes in pg_depend table so that indexes of child > tables depend on corresponding indexes of parent tables with type of > dependency DEPENDENCY_NORMAL so that index could be removed separately > for partitions and recursively/separately for partitioned tables. > > 3. REINDEX on index of partitioned table would perform this operation on > existing indexes of corresponding partitions. In this case it is > necessary to consider such operations as REINDEX SCHEMA | DATABASE | > SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times > in a row. > > Any thoughts? > > 1. > https://www.postgresql.org/message-id/CA+TgmoZUwj=QYnaK+F7xEf4w_e2g3XxdMnSNZMZjuinHRcOB8A@mail.gmail.com > > 2. > https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp > > I want to present the first version of patches that implement local indexes for partitioned tables and discuss some technical details of that implementation. 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX (literal 'l'). This was done because physical storage is created in the 'heap_create' function and we need to revoke the creating storage as with partitioned tables. Since information that this index belongs to partitioned tables is not available in 'heap_create' function (pg_index entry on the index is not created yet) I chose the least painful way - added a specific relkind for index on partitioned table. I suppose that this act will require the integrating new relkind to different places of source code so I'm ready to consider another proposals on this point. 2. My implementation doesn't support the concurrent creating of local index (CREATE INDEX CONCURRENTLY). As I understand, this operation involves nontrivial manipulation with snapshots and I don't know how to implement concurrent creating of multiple indexes. In this point I ask help from community. 3. As I noticed early pg_depend table is used for cascade deleting indexes on partitioned table and its children. I also use pg_depend to determine relationship between parent and child indexes when reindex executes recursively on child indexes. Perhaps, it's not good way to use pg_depend to determine the relationship between parent and child indexes because the kind of this relationship is not defined. I could propose to add into pg_index table specific field of 'oidvector' type that specify oids of dependent indexes for the current local index. On this stage I want to discuss only technical details of local indexes' implementation. The problems related to merging existing indexes of partitions within local index tree, determination uniqueness of field in global sense through local index and syntax notes I want to arise later. CC welcome! -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
On 4 April 2017 at 17:10, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote: > > 3. As I noticed early pg_depend table is used for cascade deleting indexes > on partitioned table and its children. I also use pg_depend to determine > relationship between parent and child indexes when reindex executes > recursively on child indexes. > > Perhaps, it's not good way to use pg_depend to determine the relationship > between parent and child indexes because the kind of this relationship is > not defined. I could propose to add into pg_index table specific field of > 'oidvector' type that specify oids of dependent indexes for the current > local index. Alternately you could have an single oid in pg_index on each of the children that specifies which local index is its parent. That would probably require a new index on that column so you could look up all the children efficiently. I think it would behave more sensibly when you're adding or removing a partition, especially if you want to add many partitions in parallel using multiple transactions. An oidvector of children would effectively mean you could only be doing one partition creation or deletion at a time. -- greg
On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote: > 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX > (literal 'l'). Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's nothing particularly "local" about it. I suppose what you're going for is that it's not global, but in a way it *is* global to the partitioning hierarchy. That's the point. It's just that it's partitioned. > Perhaps, it's not good way to use pg_depend to determine the relationship > between parent and child indexes because the kind of this relationship is > not defined. I could propose to add into pg_index table specific field of > 'oidvector' type that specify oids of dependent indexes for the current > local index. I agree with Greg's comment on this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10.04.2017 13:46, Greg Stark wrote: > On 4 April 2017 at 17:10, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote: >> >> 3. As I noticed early pg_depend table is used for cascade deleting indexes >> on partitioned table and its children. I also use pg_depend to determine >> relationship between parent and child indexes when reindex executes >> recursively on child indexes. >> >> Perhaps, it's not good way to use pg_depend to determine the relationship >> between parent and child indexes because the kind of this relationship is >> not defined. I could propose to add into pg_index table specific field of >> 'oidvector' type that specify oids of dependent indexes for the current >> local index. > > > Alternately you could have an single oid in pg_index on each of the > children that specifies which local index is its parent. That would > probably require a new index on that column so you could look up all > the children efficiently. > > I think it would behave more sensibly when you're adding or removing a > partition, especially if you want to add many partitions in parallel > using multiple transactions. An oidvector of children would > effectively mean you could only be doing one partition creation or > deletion at a time. > Thanks for your comment. Your approach sounds better than mine. I'll try it. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 10.04.2017 14:20, Robert Haas wrote: > On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin > <m.milyutin@postgrespro.ru> wrote: >> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX >> (literal 'l'). > > Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's > nothing particularly "local" about it. I suppose what you're going > for is that it's not global, but in a way it *is* global to the > partitioning hierarchy. That's the point. It's just that it's > partitioned. > Ok, thanks for the note. But I want to discuss the relevancy of introduction of a new relkind for partitioned index. I could to change the control flow in partitioned index creation (specify conditional statement in the 'index_create' routine in attached patch) and not enter to the 'heap_create' routine. This case releases us from integrating new relkind into different places of Postgres code. But we have to copy-paste some specific code from 'heap_create' function, e.g., definition of relfilenode and tablespaceid for the new index and perhaps something more when 'heap_create' routine will be extended. What do you think about this way? -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, On 2017/04/17 23:00, Maksim Milyutin wrote: > On 10.04.2017 14:20, Robert Haas wrote: >> On Tue, Apr 4, 2017 at 12:10 PM, Maksim Milyutin >> <m.milyutin@postgrespro.ru> wrote: >>> 1. I have added a new relkind for local indexes named RELKIND_LOCAL_INDEX >>> (literal 'l'). >> >> Seems like it should maybe be RELKIND_PARTITIONED_INDEX. There's >> nothing particularly "local" about it. I suppose what you're going >> for is that it's not global, but in a way it *is* global to the >> partitioning hierarchy. That's the point. It's just that it's >> partitioned. >> > > Ok, thanks for the note. > > But I want to discuss the relevancy of introduction of a new relkind for > partitioned index. I could to change the control flow in partitioned index > creation (specify conditional statement in the 'index_create' routine in > attached patch) and not enter to the 'heap_create' routine. This case > releases us from integrating new relkind into different places of Postgres > code. But we have to copy-paste some specific code from 'heap_create' > function, e.g., definition of relfilenode and tablespaceid for the new > index and perhaps something more when 'heap_create' routine will be extended. I may be missing something, but isn't it that a new relkind will be needed anyway? How does the rest of the code distinguish such index objects once they are created? Is it possible that some other code may try to access the storage for an index whose indrelid is a partitioned table? Thanks, Amit
On 18.04.2017 13:08, Amit Langote wrote: > Hi, > Hi, Amit! > On 2017/04/17 23:00, Maksim Milyutin wrote: >> >> Ok, thanks for the note. >> >> But I want to discuss the relevancy of introduction of a new relkind for >> partitioned index. I could to change the control flow in partitioned index >> creation (specify conditional statement in the 'index_create' routine in >> attached patch) and not enter to the 'heap_create' routine. This case >> releases us from integrating new relkind into different places of Postgres >> code. But we have to copy-paste some specific code from 'heap_create' >> function, e.g., definition of relfilenode and tablespaceid for the new >> index and perhaps something more when 'heap_create' routine will be extended. > > I may be missing something, but isn't it that a new relkind will be needed > anyway? How does the rest of the code distinguish such index objects once > they are created? Local partitioned indexes can be recognized through the check on the relkind of table to which the index refers. Something like this: heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) /* indexid is local index on partitioned table */ > Is it possible that some other code may try to access > the storage for an index whose indrelid is a partitioned table? > Thеsе cases must be caught. But as much as partitioned tables doesn't participate in query plans their indexes are unaccessible by executor. Reindex operation is overloaded with my patch. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote: > On 18.04.2017 13:08, Amit Langote wrote: >> >> Hi, >> > > Hi, Amit! > >> On 2017/04/17 23:00, Maksim Milyutin wrote: >>> >>> >>> Ok, thanks for the note. >>> >>> But I want to discuss the relevancy of introduction of a new relkind for >>> partitioned index. I could to change the control flow in partitioned >>> index >>> creation (specify conditional statement in the 'index_create' routine in >>> attached patch) and not enter to the 'heap_create' routine. This case >>> releases us from integrating new relkind into different places of >>> Postgres >>> code. But we have to copy-paste some specific code from 'heap_create' >>> function, e.g., definition of relfilenode and tablespaceid for the new >>> index and perhaps something more when 'heap_create' routine will be >>> extended. >> >> >> I may be missing something, but isn't it that a new relkind will be needed >> anyway? How does the rest of the code distinguish such index objects once >> they are created? > > > Local partitioned indexes can be recognized through the check on the relkind > of table to which the index refers. Something like this: > > heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); > if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > /* indexid is local index on partitioned table */ An index on partitioned table can be global index (yet to be implemented) or a local index. We can not differentiate between those just by looking at the relation on which they are built. > >> Is it possible that some other code may try to access >> the storage for an index whose indrelid is a partitioned table? >> > > Thеsе cases must be caught. But as much as partitioned tables doesn't > participate in query plans their indexes are unaccessible by executor. > Reindex operation is overloaded with my patch. > A global index would have storage for a partitioned table whereas a local index wouldn't have any storage for a partitioned table. I agree with Amit that we need new relkinds for local as well as global indexes. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 19.04.2017 11:42, Ashutosh Bapat wrote: > On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin > <m.milyutin@postgrespro.ru> wrote: >> >> Local partitioned indexes can be recognized through the check on the relkind >> of table to which the index refers. Something like this: >> >> heap = relation_open(IndexGetRelation(indexid, false), heapLockmode); >> if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >> /* indexid is local index on partitioned table */ > > An index on partitioned table can be global index (yet to be > implemented) or a local index. We can not differentiate between those > just by looking at the relation on which they are built. > We could to refine the criteria for the local partitioned index later encapsulating it in a macro, e.g., adding a new flag from pg_index that differentiate the type of index on partitioned table. >> Thеsе cases must be caught. But as much as partitioned tables doesn't >> participate in query plans their indexes are unaccessible by executor. >> Reindex operation is overloaded with my patch. >> > > A global index would have storage for a partitioned table whereas a > local index wouldn't have any storage for a partitioned table. > > I agree with Amit that we need new relkinds for local as well as global indexes. > Ok, thanks for the feedback. Then I'll use a new relkind for local partitioned index in further development. -- Maksim Milyutin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin <m.milyutin@postgrespro.ru> wrote: > Ok, thanks for the feedback. Then I'll use a new relkind for local > partitioned index in further development. Any update on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
10.08.17 23:01, Robert Haas wrote: > On Wed, Apr 19, 2017 at 5:25 AM, Maksim Milyutin > <m.milyutin@postgrespro.ru> wrote: >> Ok, thanks for the feedback. Then I'll use a new relkind for local >> partitioned index in further development. > Any update on this? > I'll continue to work soon. Sorry for so long delay. -- Regards, Maksim Milyutin
Hello I've been thinking about this issue too. I think your patch is not ambitious enough. Here's my brain dump on the issue, to revive discussion. As you propose, IMO this new feature would use the standard index creation syntax: CREATE [UNIQUE] INDEX someindex ON parted_table (a, b); This command currently throws an error. We'd have it do two things: 1. create catalog rows in pg_class and pg_index for the main table, indicating the existance of this partitioned index. These would not point to an actual index (since the main table itself is empty), but instead to an "abstract" index. This abstract index can be used by various operations; see below. 2. create one index for each existing partition. These would be identical to what would happen if you created the indexdirectly on each partition, except that there is an additional dependency to the parent's abstract index. If partitions are themselves partitioned, we would recursively apply the indexes to the sub-partitions by doing (1) above for the partitioned partition. Once the index has been created for all existing partitions, the hierarchy-wide index becomes valid and can be used normally by the planner/executor. I think we could use the pg_index.indisvalid property for the abstract index for this. I propose that an index declared UNIQUE throws an error for now. We can implement uniqueness (for the case where the indexed columns match the partitioning key) later, once we sort out all the issues here first. I think unique indexes would be very useful even with that limitation, but let's have it as a separate project. I think using pg_depend as the mechanism to link partition indexes to parent is a bad idea. How about pg_inherits instead? Seems more appropriate. Creating hierachy-wide indexes for existing partitioned tables is likely to take a long time, so we must include CONCURRENTLY as an option. This will need some transaction machinery support in order to ensure that each partition gets its index created at some point in a long chain of transactions, and that the whole thing is marked valid only at the end. Also, if the creation is interrupted (because of a crash or a regular shutdown), it'll be useful to be able to continue rather than being forced to start from scratch. When a new partition is added, indexes satisfying the partitioned table's abstract indexes are created automatically. During ALTER TABLE ... ATTACH PARTITION, we check that an indexing satisfying the abstract index exist (and we create a pg_inherit link). If not, the command is aborted. REINDEX is easily supported (just reindex each partition's index individually), but that command is blocking, which is not good. For concurrent operation for tables not partitioned, a typical pattern to avoid blocking the table is to suggest creation of an identical index using CREATE INDEX CONCURRENTLY, then drop the original one. That's not going to work with these partitioned indexes, which is going to be a problem. I don't have any great ideas about this part yet. We need to come up with some way to generate names for each partition index. I am going to work on this now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi!
On 06.10.2017 19:37, Alvaro Herrera wrote:
As you propose, IMO this new feature would use the standard index creation syntax: CREATE [UNIQUE] INDEX someindex ON parted_table (a, b); This command currently throws an error. We'd have it do two things: 1. create catalog rows in pg_class and pg_index for the main table, indicating the existance of this partitioned index. These would not point to an actual index (since the main table itself is empty), but instead to an "abstract" index. This abstract index can be used by various operations; see below.
Robert Haas proposed[1] to use the name "partitioned index" (instead of abstract) that have to be reflected in 'relkind' field of pg_class as the RELKIND_PARTITIONED_INDEX value.
I propose that an index declared UNIQUE throws an error for now. We can implement uniqueness (for the case where the indexed columns match the partitioning key) later, once we sort out all the issues here first. I think unique indexes would be very useful even with that limitation, but let's have it as a separate project.
Yes, global uniqueness through local unique indexes causes further work related with foreign keys on partitioned table, full support of INSERT OF CONFLICT, etc. It make sense to implement after the current stage of work.
I think using pg_depend as the mechanism to link partition indexes to parent is a bad idea. How about pg_inherits instead? Seems more appropriate.
Greg Stark proposed[2] to use new pg_index field of oid type that refers to the parent pg_index item. AFAIC pg_inherits also makes sense but semantically it deals with inheriting tables. IMHO the using of this catalog table to define relation between partitioned table and partitions looks like a hack to make use of constraint exclusion logic for partition pruning.
Creating hierachy-wide indexes for existing partitioned tables is likely to take a long time, so we must include CONCURRENTLY as an option. This will need some transaction machinery support in order to ensure that each partition gets its index created at some point in a long chain of transactions, and that the whole thing is marked valid only at the end. Also, if the creation is interrupted (because of a crash or a regular shutdown), it'll be useful to be able to continue rather than being forced to start from scratch.
This option was very difficult for me. I would be interested to see the implementation.
During ALTER TABLE ... ATTACH PARTITION, we check that an indexing satisfying the abstract index exist (and we create a pg_inherit link). If not, the command is aborted.
We could create necessary index for partition, not abort ALTER TABLE ... ATTACH PARTITION statement.
We need to come up with some way to generate names for each partition index.
I think the calling 'ChooseIndexName(RelationGetRelationName(childrel), namespaceId, indexColNames, ...)' resolves this problem.
I am going to work on this now.
It will be great! I think this project is difficult for me on the part of integration described above functionality with the legacy postgres code. Also IMO this project is very important because it opens the way for such feature as global uniqueness of fields of partitioned tables. And any protraction in implementation is bad.
I would like to review and test your intermediate results.
1. https://www.postgresql.org/message-id/CA%2BTgmoY5UOUnW%3DMcwT7xUB_2W5dAkvOg5kD20Spx5gF-Ad47cA%40mail.gmail.com
2. https://www.postgresql.org/message-id/CAM-w4HOVftuv5RVi3a%2BsRV6nBpg204w7%3DL8MwPXVvYBFo1uM1Q%40mail.gmail.com
-- Regards, Maksim Milyutin
On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > 2. create one index for each existing partition. These would be > identical to what would happen if you created the index directly on > each partition, except that there is an additional dependency to the > parent's abstract index. One thing I'm a bit worried about is how to name these subordinate indexes. They have to have names because that's how pg_class works, and those names can't all be the same, again because that's how pg_class works. There's no problem right away when you first create the partitioned index, because you can just pick names out of a hat using whatever name-generating algorithm seems best. However, when you dump-and-restore (including but not limited to the pg_upgrade case) you've got to preserve those names. If you just generate a new name that may or may not be the same as the old one, then it may collide with a user-specified name that only occurs later in the dump. Also, you'll have trouble if the user has applied a COMMENT or a SECURITY LABEL to the index because that command works by name, or if the user has a reference to the index name inside a function or whatever. These are pretty annoying corner-case bugs because they're not likely to come up very often. Most people won't notice or care if the index name changes. But I don't think it's acceptable to just ignore the problem. An idea I had was to treat the abstract index - to use your term - sort of the way we treat an extension. Normally, when you create an index on a partitioned table, it cascades, but for dump and restore purpose, we tag on some syntax that says "well, don't actually create the subordinate indexes, i'll tell you about those later". Then for each subordinate index we issue a separate CREATE INDEX command followed by ALTER INDEX abstract_index ATTACH PARTITION concrete_index or something of that sort. That means you can't absolutely count on the parent index to have all of the children it's supposed to have but maybe that's OK. Another thing that would let you do is CREATE INDEX CONCURRENTLY replacement_concrete_index; ALTER INDEX abstract_index DETACH PARTITION old_concrete_index, ATTACH PARTITION replacement_concrete_index; DROP INDEX CONCURRENTLY old_concrete_index, which seems like a thing someone might want to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
07.10.17 16:34, Robert Haas wrote: > On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > One thing I'm a bit worried about is how to name these subordinate > indexes. They have to have names because that's how pg_class works, > and those names can't all be the same, again because that's how > pg_class works. There's no problem right away when you first create > the partitioned index, because you can just pick names out of a hat > using whatever name-generating algorithm seems best. However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. > > These are pretty annoying corner-case bugs because they're not likely > to come up very often. Most people won't notice or care if the index > name changes. But I don't think it's acceptable to just ignore the > problem. An idea I had was to treat the abstract index - to use your > term - sort of the way we treat an extension. Normally, when you > create an index on a partitioned table, it cascades, but for dump and > restore purpose, we tag on some syntax that says "well, don't actually > create the subordinate indexes, i'll tell you about those later". > Then for each subordinate index we issue a separate CREATE INDEX > command followed by ALTER INDEX abstract_index ATTACH PARTITION > concrete_index or something of that sort. That means you can't > absolutely count on the parent index to have all of the children it's > supposed to have but maybe that's OK. AFAICS, the main problem with naming is generating new unique names for subordinate indexes on the stage of migrating data scheme (pg_dump, pg_upgrade, etc). And we cannot specify these names in the 'CREATE INDEX partitioned_index' statement therefore we have to regenerate their. In this case I propose to restore index names' hierarchy *bottom-up*, i.e. first of all create indexes for the leaf partitions and then create ones for parents up to root explicitly specifying names. When creating index on parent table we have to check is there exist any index on child table that could be child index (identical criteria). If so, not generate new index but implicitly attach that index into parent one. If we have incomplete index hierarchy, e.g. we dropped some indexes of partitions previously, then recreating of parent's index would regenerate (not attach) indexes for those partitions. We could drop those odd generated indexes after building of parent's index. This decision is not straightforward but provides to consider 'CREATE INDEX paritioned_table' statement as a cascade operation. As a result, we can specify name for each concrete index while recreating a whole hierarchy. -- Regards, Maksim Milyutin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 7, 2017 at 7:04 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> 2. create one index for each existing partition. These would be >> identical to what would happen if you created the index directly on >> each partition, except that there is an additional dependency to the >> parent's abstract index. > > One thing I'm a bit worried about is how to name these subordinate > indexes. They have to have names because that's how pg_class works, > and those names can't all be the same, again because that's how > pg_class works. There's no problem right away when you first create > the partitioned index, because you can just pick names out of a hat > using whatever name-generating algorithm seems best. However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. > > These are pretty annoying corner-case bugs because they're not likely > to come up very often. Most people won't notice or care if the index > name changes. But I don't think it's acceptable to just ignore the > problem. An idea I had was to treat the abstract index - to use your > term - sort of the way we treat an extension. Normally, when you > create an index on a partitioned table, it cascades, but for dump and > restore purpose, we tag on some syntax that says "well, don't actually > create the subordinate indexes, i'll tell you about those later". > Then for each subordinate index we issue a separate CREATE INDEX > command followed by ALTER INDEX abstract_index ATTACH PARTITION > concrete_index or something of that sort. That means you can't > absolutely count on the parent index to have all of the children it's > supposed to have but maybe that's OK. +1. How about CREATE INDEX ... PARTITION OF ... FOR TABLE ...? to create the index and attach it? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote: > On Fri, Oct 6, 2017 at 12:37 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > 2. create one index for each existing partition. These would be > > identical to what would happen if you created the index directly on > > each partition, except that there is an additional dependency to the > > parent's abstract index. > > One thing I'm a bit worried about is how to name these subordinate > indexes. They have to have names because that's how pg_class works, > and those names can't all be the same, again because that's how > pg_class works. There's no problem right away when you first create > the partitioned index, because you can just pick names out of a hat > using whatever name-generating algorithm seems best. However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. > > These are pretty annoying corner-case bugs because they're not likely > to come up very often. Most people won't notice or care if the index > name changes. But I don't think it's acceptable to just ignore the > problem. I agree it's a problem that needs to be addressed directly. > An idea I had was to treat the abstract index - to use your > term - sort of the way we treat an extension. Normally, when you > create an index on a partitioned table, it cascades, but for dump and > restore purpose, we tag on some syntax that says "well, don't actually > create the subordinate indexes, i'll tell you about those later". > Then for each subordinate index we issue a separate CREATE INDEX > command followed by ALTER INDEX abstract_index ATTACH PARTITION > concrete_index or something of that sort. That means you can't > absolutely count on the parent index to have all of the children it's > supposed to have but maybe that's OK. Hmm ... yeah, ATTACH and DETACH sound acceptable to me. On DETACH, the abstract index should be marked indisvalid=false unless a substitute index already exists; and on ATTACH when indisvalid=false we verify that all local indexes exist, and if so we can flip indisvalid. That way, we can continue to rely on the parent index always having all its children when the flag is set. I'm not clear on a syntax that creates the main index and hopes to later have the sub-indexes created. Another approach is to do it the other way around, i.e. create the children first, then once they're all in place create the main one normally, which merely verifies that all the requisite children exist. This is related to what I proposed to occur when a regular table is joined as a partition of the partitioned table: we run a verification that an index matching the parent's abstract indexes exists, and if not we raise an error. (Alternatively we could allow the case, and mark the abstract index as indisvalid=false, but that seems to violate POLA). > Another thing that would let you do is CREATE INDEX CONCURRENTLY > replacement_concrete_index; ALTER INDEX abstract_index DETACH > PARTITION old_concrete_index, ATTACH PARTITION > replacement_concrete_index; DROP INDEX CONCURRENTLY > old_concrete_index, which seems like a thing someone might want to do. Yeah, this is a point I explicitly mentioned, and this proposal seems like a good way. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 10, 2017 at 11:22 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Hmm ... yeah, ATTACH and DETACH sound acceptable to me. On DETACH, the > abstract index should be marked indisvalid=false unless a substitute > index already exists; and on ATTACH when indisvalid=false we verify that > all local indexes exist, and if so we can flip indisvalid. That way, we > can continue to rely on the parent index always having all its children > when the flag is set. True. I don't see an immediate use for that guarantee, actually, because the index can't be "scanned" either way. But it might be useful someday, if for example we wanted to try plan large numbers of children symmetrically rather than (as we do currently) planning each one individually, or if we've got an idea about making foreign keys work. I think you could ignore this for now, though, if you want. > I'm not clear on a syntax that creates the main index and hopes to later > have the sub-indexes created. Another approach is to do it the other > way around, i.e. create the children first, then once they're all in > place create the main one normally, which merely verifies that all the > requisite children exist. This is related to what I proposed to occur > when a regular table is joined as a partition of the partitioned table: > we run a verification that an index matching the parent's abstract > indexes exists, and if not we raise an error. (Alternatively we could > allow the case, and mark the abstract index as indisvalid=false, but > that seems to violate POLA). It's not much different than pg_catalog.binary_upgrade_create_empty_extension but I don't really care which way pg_dump emits it. >> Another thing that would let you do is CREATE INDEX CONCURRENTLY >> replacement_concrete_index; ALTER INDEX abstract_index DETACH >> PARTITION old_concrete_index, ATTACH PARTITION >> replacement_concrete_index; DROP INDEX CONCURRENTLY >> old_concrete_index, which seems like a thing someone might want to do. > > Yeah, this is a point I explicitly mentioned, and this proposal seems > like a good way. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hello I started with Maksim's submitted code, and developed according to the ideas discussed in this thread. Attached is a very WIP patch series for this feature. Many things remain to be done before this is committable: pg_dump support needs to be written. ALTER INDEX ATTACH/DETACH not yet implemented. No REINDEX support yet. Docs not updated (but see the regression test as a guide for how this is supposed to work; see patch 0005). CREATE INDEX CONCURRENTLY not done yet. I'm now working on the ability to build unique indexes (and unique constraints) on top of this. The docs have not been updated yet, but the new regression test file illustrates how this works. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I started with Maksim's submitted code, and developed according to the > ideas discussed in this thread. Attached is a very WIP patch series for > this feature. > > Many things remain to be done before this is committable: pg_dump > support needs to be written. ALTER INDEX ATTACH/DETACH not yet > implemented. No REINDEX support yet. Docs not updated (but see the > regression test as a guide for how this is supposed to work; see patch > 0005). CREATE INDEX CONCURRENTLY not done yet. > > I'm now working on the ability to build unique indexes (and unique > constraints) on top of this. Cool. Are you planning to do that by (a) only allowing the special case where the partition key columns/expressions are included in the indexed columns/expressions, (b) trying to make every insert to any index check all of the indexes for uniqueness conflicts, or (c) implementing global indexes? Because (b) sounds complex - think about attach operations, for example - and (c) sounds super-hard. I'd suggest doing (a) first, just on the basis of complexity. I hope that you don't get so involved in making this unique index stuff work that we don't get the cascading index feature, at least, committed to v11. That's already a considerable step forward in terms of ease of use, and I'd really like to have it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote: > On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: > > I started with Maksim's submitted code, and developed according to the > > ideas discussed in this thread. Attached is a very WIP patch series for > > this feature. > > > > Many things remain to be done before this is committable: pg_dump > > support needs to be written. ALTER INDEX ATTACH/DETACH not yet > > implemented. No REINDEX support yet. Docs not updated (but see the > > regression test as a guide for how this is supposed to work; see patch > > 0005). CREATE INDEX CONCURRENTLY not done yet. > > > > I'm now working on the ability to build unique indexes (and unique > > constraints) on top of this. > > Cool. Are you planning to do that by (a) only allowing the special > case where the partition key columns/expressions are included in the > indexed columns/expressions, (b) trying to make every insert to any > index check all of the indexes for uniqueness conflicts, or (c) > implementing global indexes? Because (b) sounds complex - think about > attach operations, for example - and (c) sounds super-hard. I'd > suggest doing (a) first, just on the basis of complexity. Yes, I think (a) is a valuable thing to have -- not planning on doing (c) at all because I fear it'll be a huge time sink. I'm not sure about (b), but it's not currently on my plan. > I hope that you don't get so involved in making this unique index > stuff work that we don't get the cascading index feature, at least, > committed to v11. That's already a considerable step forward in terms > of ease of use, and I'd really like to have it. Absolutely -- I do plan to get this one finished regardless of unique indexes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/10/24 1:15, Alvaro Herrera wrote: > Robert Haas wrote: >> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera >> <alvherre@alvh.no-ip.org> wrote: >>> I started with Maksim's submitted code, and developed according to the >>> ideas discussed in this thread. Attached is a very WIP patch series for >>> this feature. Nice! >>> Many things remain to be done before this is committable: pg_dump >>> support needs to be written. ALTER INDEX ATTACH/DETACH not yet >>> implemented. No REINDEX support yet. Docs not updated (but see the >>> regression test as a guide for how this is supposed to work; see patch >>> 0005). CREATE INDEX CONCURRENTLY not done yet. >>> >>> I'm now working on the ability to build unique indexes (and unique >>> constraints) on top of this. >> >> Cool. Are you planning to do that by (a) only allowing the special >> case where the partition key columns/expressions are included in the >> indexed columns/expressions, (b) trying to make every insert to any >> index check all of the indexes for uniqueness conflicts, or (c) >> implementing global indexes? Because (b) sounds complex - think about >> attach operations, for example - and (c) sounds super-hard. I'd >> suggest doing (a) first, just on the basis of complexity. > > Yes, I think (a) is a valuable thing to have -- not planning on doing > (c) at all because I fear it'll be a huge time sink. I'm not sure about > (b), but it's not currently on my plan. +1 to proceeding with (a) first. >> I hope that you don't get so involved in making this unique index >> stuff work that we don't get the cascading index feature, at least, >> committed to v11. That's already a considerable step forward in terms >> of ease of use, and I'd really like to have it. > > Absolutely -- I do plan to get this one finished regardless of unique > indexes. +1 Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera wrote: > I'm now working on the ability to build unique indexes (and unique > constraints) on top of this. So I think there's not a lot of additional code required to support unique indexes with the restrictions mentioned; proof-of-concept (with several holes still) attached. As before, this is not finished, as there a few things that are wrong (such as generateClonedIndexStmt taking a RangeVar), and others that I don't understand (such as why is rd_partkey NULL for partitioned partitions when DefineIndex cascades to them and the columns are checked). I noticed that RelationBuildPartitionKey is generating a partition key in a temp context, then creating a private context and copying the key into that. That seems leftover from some previous iteration of some other patch; I think it's pretty reasonable to create the new context right from the start and allocate the key there directly instead. Then there's no need for copy_partition_key at all. Anyway, here is a preliminary submission before I close shop for the week. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
- v2-0001-Tweak-index_create-index_constraint_create-APIs.patch
- v2-0002-Allow-indexes-on-partitioned-tables.patch
- v2-0003-export-generateClonedIndexStmt.patch
- v2-0004-Match-create-indexes-during-ATTACH-PARTITION.patch
- v2-0005-add-regression-test.patch
- v2-0006-Get-rid-of-copy_partition_key.patch
- v2-0007-allow-indexes-on-partitioned-tables-to-be-unique.patch
On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I noticed that RelationBuildPartitionKey is generating a partition key > in a temp context, then creating a private context and copying the key > into that. That seems leftover from some previous iteration of some > other patch; I think it's pretty reasonable to create the new context > right from the start and allocate the key there directly instead. Then > there's no need for copy_partition_key at all. We could do that, but the motivation for the current system was to avoid leaking memory in a long-lived context. I think the same technique is used elsewhere for similar reasons. I admit I haven't checked whether there would actually be a leak here if we did it as you propose, but I wouldn't find it at all surprising. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> I noticed that RelationBuildPartitionKey is generating a partition key >> in a temp context, then creating a private context and copying the key >> into that. That seems leftover from some previous iteration of some >> other patch; I think it's pretty reasonable to create the new context >> right from the start and allocate the key there directly instead. Then >> there's no need for copy_partition_key at all. > We could do that, but the motivation for the current system was to > avoid leaking memory in a long-lived context. I think the same > technique is used elsewhere for similar reasons. I admit I haven't > checked whether there would actually be a leak here if we did it as > you propose, but I wouldn't find it at all surprising. Another key point is to avoid leaving a corrupted relcache entry behind if you fail partway through. I think that the current coding is RelationBuildPartitionKey is safe, although it's far too undercommented for my taste. (The ordering of the last few steps is *critical*.) It might work to build the new key in a context that's initially a child of CurrentMemoryContext, then reparent it to be a child of CacheMemoryContext when done. But you'd have to look at whether that would end up wasting long-lived memory, if the build process generates any cruft in the same context. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> Robert Haas <robertmhaas@gmail.com> writes: > > We could do that, but the motivation for the current system was to > > avoid leaking memory in a long-lived context. Yeah, my approach here is to use a CATCH block that deletes the memory context just created, thus avoiding a long-lived leak. Tom Lane wrote: > Another key point is to avoid leaving a corrupted relcache entry behind > if you fail partway through. Sure ... in the code as I have it we only assign the local variable to the relcache entry if everything is succesful. So no relcache corruption should result. > It might work to build the new key in a context that's initially a > child of CurrentMemoryContext, then reparent it to be a child of > CacheMemoryContext when done. That's another way (than the PG_TRY block), but I think it's more complicated with no gain. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> It might work to build the new key in a context that's initially a >> child of CurrentMemoryContext, then reparent it to be a child of >> CacheMemoryContext when done. > That's another way (than the PG_TRY block), but I think it's more > complicated with no gain. I disagree: PG_TRY blocks are pretty expensive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Here's a third iteration of this work; I've polished a few rough edges and made sure that the behavior on ALTER TABLE ATTACH is consistent with what you get when creating a new partition via CREATE TABLE .. PARTITION OF, and with what happens when you do CREATE INDEX on an existing partitioning tree, and added some docs. The main big item missing is the CONCURRENTLY option to CREATE INDEX. With that and once I review a couple of minor items marked XXX, I'm about to call committable. I also added ONLY to create index: "CREATE INDEX .. ON ONLY tab". Normally if you create an index on a partitioned table, it recurses; with this option, it doesn't. This is used by pg_dump to ensure that restore doesn't create indexes on partitions that didn't have them on the original database. Then the others are attached using the new command ALTER INDEX ATTACH, as discussed. For symmetry there's also ALTER INDEX DETACH. Can be used to replace an index on a partition with a fresh version, for cases of bloat. The last patch is in WIP state yet, and mostly untouched since I last posted it. It allows creation of unique constraints (and PKs). But any interaction with the rest of the system is untested (such as FKs, ON CONFLICT, etc). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
- v3-0001-Fix-summarization-concurrent-with-relation-extens.patch
- v3-0001-Simplify-index_-constraint_-create-API.patch
- v3-0002-Get-rid-of-copy_partition_key.patch
- v3-0003-export-generateClonedIndexStmt.patch
- v3-0004-Allow-indexes-on-partitioned-tables.patch
- v3-0005-allow-indexes-on-partitioned-tables-to-be-unique.patch
Somehow I managed to include an unrelated patch as attachment. Here's another attempt (on which I also lightly touched ddl.sgml with relevant changes). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 13 November 2017 at 12:55, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Somehow I managed to include an unrelated patch as attachment. Here's > another attempt (on which I also lightly touched ddl.sgml with relevant > changes). Looks good. Some minor comments below. 0001- Simplify Seems useful as separate step; agree with everything, no further comments Why uint16? Why not just uint? 0003-export I don't like Assert((heapRel != NULL) ^ OidIsValid(heapRelid)); Standard boolean logic is clearer I couldn't see why this was a separate patch 0004-Allow If we do ATTACH PARTITION, do we also need to do a separate ATTACH INDEX step to allow an index to join the main index? Hopefully not. The tests seem to indicate to me that it does work that way, just no docs in altertable.sgml to describe that typo "they all have a matching indexes" - no plural needed typo "whether equivalent" - insert "an" unsure what "regardless of whether this option was specified" means, probably just remove 0005-Allow RelationCacheInitializePhase3() asserts that rd_partkey is not NULL for a partitoned table after RelationBuildPartitionKey() runs You removed the test to check whether "create unique index" works, not sure if that was intentional There is no test for attach index to a unique index -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8 October 2017 at 02:34, Robert Haas <robertmhaas@gmail.com> wrote: > However, when > you dump-and-restore (including but not limited to the pg_upgrade > case) you've got to preserve those names. If you just generate a new > name that may or may not be the same as the old one, then it may > collide with a user-specified name that only occurs later in the dump. > Also, you'll have trouble if the user has applied a COMMENT or a > SECURITY LABEL to the index because that command works by name, or if > the user has a reference to the index name inside a function or > whatever. I might be arriving late for the party here, but I see mention on here of when we ATTACH a partition that we'd check to see if any of that table's indexes match the index definition required for the partitioned table's "partitioned index", and just use those indexes and incorporate them into the partitioned index by setting the new OID column in pg_index (or whatever way was decided to record the dependency)... So, if that works for ATTACH, then can't pg_dump just create each of the partition's indexes first, then create the partitioned table's partition index, and that command would just go through checking each partition similar to how ATTACH would work. That way we get to keep all the names as pg_dump would just dump the indexes belonging to each partition like it were any other index. Those indexes would only become a bit more special once the partitioned index is created on the parent. I've not read the patch yet, but reading the thread it does not seem to have gone in this direction. Maybe I'm overlooking something? -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Thanks for looking. David Rowley wrote: > So, if that works for ATTACH, then can't pg_dump just > create each of the partition's indexes first, then create the > partitioned table's partition index, and that command would just go > through checking each partition similar to how ATTACH would work. That was my first thought too (and I actually implemented it like this before the current approach). But there's a small problem: if a partition exists which *doesn't* have the index, restoring things this way would create the index in that partition too, which is unwanted because the end state is different to what was in the dumped database. This is because the command is either recursive (finds a matching index and attachs it, or create a new one) or not recursive (does not look at children at all). But a command that is partially recursive (attach any matching index, but do nothing if none exists) would be way too confusing, ISTM. Another small issue is that each partition might have two matching indexes, maybe because one was in the process of being removed; how would CREATE INDEX at restore time know which one to attach? It could do so by choosing arbitrarily, but that doesn't seem reliable. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15 November 2017 at 01:09, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > if a > partition exists which *doesn't* have the index, restoring things this > way would create the index in that partition too, which is unwanted > because the end state is different to what was in the dumped database. hmm, but surely the all those indexes must already exist if the partitioned index exists. Won't we be disallowing DROP INDEX of the leaf partition indexes if that index is marked as being part of the partitioned index? If so, then surely this could only happen if someone manually edited the pg_dump file to remove the CREATE INDEX statement for the leaf partition, and if they do that, then maybe they won't be so surprised that CREATE INDEX has to create some indexes. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley wrote: > On 15 November 2017 at 01:09, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > if a > > partition exists which *doesn't* have the index, restoring things this > > way would create the index in that partition too, which is unwanted > > because the end state is different to what was in the dumped database. > > hmm, but surely the all those indexes must already exist if the > partitioned index exists. Won't we be disallowing DROP INDEX of the > leaf partition indexes if that index is marked as being part of the > partitioned index? In normal cases, sure -- but you can do ALTER INDEX DETACH on a partition, then drop the index. This is useful for example if you want to replace some partition's index because of bloat: create a replacement index, detach the old one, attach the new one, drop the old one. Now you probably don't *want* to take a pg_dump in the middle of this process ... but pg_dump's charter is not to produce the output we think would be most convenient to users most of the time, but rather to accurately describe what is in the database. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15 November 2017 at 01:30, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > David Rowley wrote: >> hmm, but surely the all those indexes must already exist if the >> partitioned index exists. Won't we be disallowing DROP INDEX of the >> leaf partition indexes if that index is marked as being part of the >> partitioned index? > > In normal cases, sure -- but you can do ALTER INDEX DETACH on a > partition, then drop the index. This is useful for example if you want > to replace some partition's index because of bloat: create a replacement > index, detach the old one, attach the new one, drop the old one. But if you can DETACH a partition from requiring an index to back up this partitioned index, then the partitioned index is not valid while the leaf table's index is missing. Maybe I misunderstood what you mean by DETACH here, but if you remove an index and there's some period of time where none exists for a leaf partition, then you're not going to be able to ever do partitioned UNIQUE indexes. I'd have thought some sort of: ALTER INDEX name_of_partitioned_index REPLACE INDEX FOR partitioned_tablename WITH name_of_new_matching_bloat_free_index; ... or something along those lines, and just have an atomic swap out of the index with some new one that's been created. Something similar would be quite good for a foreign key constraint. Today we have no standard way to replace the referenced table's index without dropping the FK first. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs wrote: > On 13 November 2017 at 12:55, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Somehow I managed to include an unrelated patch as attachment. Here's > > another attempt (on which I also lightly touched ddl.sgml with relevant > > changes). > > Looks good. Some minor comments below. > > 0001- Simplify > Seems useful as separate step; agree with everything, no further comments Thanks, pushed. > Why uint16? Why not just uint? *Shrug*. 16 bits seem plenty enough. I changed it to bits16, which we use in other places for bitmasks. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David Rowley wrote: > But if you can DETACH a partition from requiring an index to back up > this partitioned index, then the partitioned index is not valid while > the leaf table's index is missing. Yeah. I don't see this as too much of a problem myself. At present, from the planner's POV each partition is its own table and incurs its own planning, so there's no restriction on having different sets of indexes. If in the future we want to see partition hierarchies as a single relation which must have identical properties such as the same index everywhere, this is easily changed: just set indisready flag false for the parent table until every partition can be proven to have the index. Example: on DETACH we set indisready=false on the parent; on ATTACH we scan the whole hierarchy and set it true if all partitions have the index. So the planner can rely on the parent's indisready. Since those operations are catalog-only, the transaction which attaches/detaches the index can be short (though AFAICS it requires AEL on the whole hierarchy, sadly). > Maybe I misunderstood what you mean by DETACH here, but if you remove > an index and there's some period of time where none exists for a leaf > partition, then you're not going to be able to ever do partitioned > UNIQUE indexes. Well, UNIQUE indexes have additional restrictions of their own. I have no problem decreeing that you cannot DETACH an index from a primary key. Then a future project can implement index replacement like you propose: > I'd have thought some sort of: ALTER INDEX name_of_partitioned_index > REPLACE INDEX FOR partitioned_tablename WITH > name_of_new_matching_bloat_free_index; > > ... or something along those lines, and just have an atomic swap out > of the index with some new one that's been created. (My intention here is to avoid scope creep.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Simon Riggs wrote: > > On 13 November 2017 at 12:55, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > Somehow I managed to include an unrelated patch as attachment. Here's > > > another attempt (on which I also lightly touched ddl.sgml with relevant > > > changes). > > > > Looks good. Some minor comments below. > > > > 0001- Simplify > > Seems useful as separate step; agree with everything, no further comments > > Thanks, pushed. Here's the remaining bits, rebased. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Nov 14, 2017 at 12:49 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Here's the remaining bits, rebased. It's true that Tom and I reviewed patch 0001, as your proposed commit message states. But it's also true that we both said that it probably wasn't a good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 November 2017 at 13:12, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Nov 14, 2017 at 12:49 PM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: >> Here's the remaining bits, rebased. > > It's true that Tom and I reviewed patch 0001, as your proposed commit > message states. But it's also true that we both said that it probably > wasn't a good idea. I don't see any comments from you or Tom about patch 0001, which was simple refactoring and not much to complain about. Perhaps there is some confusion about the numbering? I see that Alvaro had taken your comments on memory contexts into account in his later patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 11/14/2017 12:49 PM, Alvaro Herrera wrote: >> Thanks, pushed. > > Here's the remaining bits, rebased. > First of all, thanks for working on this. I have been looking at the "CREATE INDEX ... ONLY" syntax, and I think it could cause some confusion due to the "Note" described in create_index.sgml. An alternative, maybe crazy, could be to treat a partitioned index as one; e.g. all operations are on the definition. That way ONLY, ATTACH and DETACH could be eliminated. Once a partition is detached any partitioned indexes would be marked as a normal index, and a partition could only be attached if it had indexes satisfying all partition index definitions. Bloat could be handled by swapping the index as suggested by David [1]. It would be less flexible, but people always have the option to define indexes directly on the partitions. Some comments. 0004's alter_index.sgml are missing the "PARTITION" keyword for both the ATTACH and DETACH commands. A small thing, for -- test.sql -- CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a); CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, REMAINDER 0); CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, REMAINDER 1); CREATE INDEX idx_test_a ON test (a); -- test.sql -- I would expect that the index names were 'test_p00_idx_test_a' and 'test_p01_idx_test_a'. psql completion for "CREATE INDEX blah ON tes<TAB>" only lists the child tables. Completion for the ALTER INDEX commands is missing. [1] https://www.postgresql.org/message-id/CAKJS1f_Wf%3DM06o8iQi72SxN%2BZpLV4c0CwYoN5xYVY4aWWX-jBQ%40mail.gmail.com Best regards, Jesper
On Tue, Nov 14, 2017 at 1:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I don't see any comments from you or Tom about patch 0001, which was > simple refactoring and not much to complain about. We both commented that getting rid of copy_partition_data could introduce memory leaks. > Perhaps there is some confusion about the numbering? I don't think so. > I see that Alvaro had taken your comments on memory contexts into > account in his later patch. Which later patch? It seems like any changes meant to mitigate the problems with removing copy_partition_data ought to be folded into the patch that removes copy_partition_data, rather than being in some other patch later in the series. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Jesper, Thanks for reviewing. Jesper Pedersen wrote: > I have been looking at the "CREATE INDEX ... ONLY" syntax, and I think it > could cause some confusion due to the "Note" described in create_index.sgml. > > An alternative, maybe crazy, could be to treat a partitioned index as one; > e.g. all operations are on the definition. Well, honestly that's the way I wanted partitioning to be defined, right from the start -- my first proposal was that partitions were not to be standalone objects. But I was outvoted once I stepped out of the implementation, and partitioning is now the way we have it -- each partition its own separate object. Until we change that view (if we ever do), I think the only sensible decision is that the whole feature works that way. > Some comments. [...] Thanks, will fix. Except for this one: > A small thing, for > > -- test.sql -- > CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a); > CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, > REMAINDER 0); > CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, > REMAINDER 1); > CREATE INDEX idx_test_a ON test (a); > -- test.sql -- > > I would expect that the index names were 'test_p00_idx_test_a' and > 'test_p01_idx_test_a'. Hmm, the code in my patch just uses the standard naming routine for indexes. I'm disinclined to change it in the initial commit, but perhaps we can change that later. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15 November 2017 at 04:23, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > David Rowley wrote: >> I'd have thought some sort of: ALTER INDEX name_of_partitioned_index >> REPLACE INDEX FOR partitioned_tablename WITH >> name_of_new_matching_bloat_free_index; >> >> ... or something along those lines, and just have an atomic swap out >> of the index with some new one that's been created. > > (My intention here is to avoid scope creep.) OK, I didn't really mean that this would be required for an initial patch. It's something that could be invented at some later date. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 November 2017 at 06:49, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
-- Here's the remaining bits, rebased.
Hi,
I've not had time for a thorough look at this, but on a quick scan I noticed that CompareIndexInfo() missed checking if the Index AM matches the AM of the partitioned index.
Testing with:
create table p (a int not null) partition by range (a);
create table p1 partition of p for values from (1) to (10);
create table p2 partition of p for values from (10) to (20);
create index on p1 using btree (a);
create index on p2 using hash (a);
create index on p (a);
I see it ends up making use of the hash index on p2 to support the index that's stored as a btree on the partitioned table. I think these should match so that the operations we can perform on the index are all aligned.
On 15 November 2017 at 06:49, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
-- Here's the remaining bits, rebased.
I wonder if it should be this patches job to alter the code in get_relation_info() which causes the indexes not to be loaded for partitioned tables:
/*
* Make list of indexes. Ignore indexes on system catalogs if told to.
* Don't bother with indexes for an inheritance parent, either.
*/
if (inhparent ||
(IgnoreSystemIndexes && IsSystemRelation(relation)))
hasindex = false;
else
hasindex = relation->rd_rel->relhasindex;
A partitioned table will always go into the hasindex = false code path.
I'm kind of thinking this patch should change that, even if the patch is not making use of the indexes, you could argue that something using set_rel_pathlist_hook might want to do something there, although, there's likely a bunch of counter arguments too.
What do you think?
On Fri, Nov 17, 2017 at 6:24 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > > I'm kind of thinking this patch should change that, even if the patch is not > making use of the indexes, you could argue that something using > set_rel_pathlist_hook might want to do something there, although, there's > likely a bunch of counter arguments too. > How do you think optimizer would use this information? A partitioned table which doesn't have data doesn't need that information. Partitioned table's behaviour here is similar to that of the inheritance parent. Even though an inheritance parent may have indexes on it, that information is not read and stored by optimizer for that table in its role as an inheritance parent. It stores and uses that information for that table in its role as the child, which gets scanned. May be in future, we will try to create common index paths for all children for the indexes which are inherited from the parent. Then we might want to load the index information for partitioned table as well, but I don't think we are any close to do that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Nov 15, 2017 at 2:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Here's the remaining bits, rebased. At least patch 3 has conflicts with HEAD. I am moving this patch to next CF per a lack of reviews, switching status to waiting on author. -- Michael
David Rowley wrote: > I wonder if it should be this patches job to alter the code in > get_relation_info() which causes the indexes not to be loaded for > partitioned tables: > > /* > * Make list of indexes. Ignore indexes on system catalogs if told to. > * Don't bother with indexes for an inheritance parent, either. > */ > if (inhparent || > (IgnoreSystemIndexes && IsSystemRelation(relation))) > hasindex = false; > else > hasindex = relation->rd_rel->relhasindex; > > A partitioned table will always go into the hasindex = false code path. > > I'm kind of thinking this patch should change that, even if the patch is > not making use of the indexes, you could argue that something > using set_rel_pathlist_hook might want to do something there, although, > there's likely a bunch of counter arguments too. > > What do you think? Great question. So you're thinking that the planner might have an interest in knowing what indexes are defined at the parent table level for planning purposes; but for that to actually have any effect we would need to change the planner and executor also. And one more point, also related to something you said before: we currently (I mean after my patch) don't mark partitioned-table-level indexes as valid or not valid depending on whether all its children exist, so trying to use that in the planner without having a flag could cause invalid plans to be generated (i.e. ones that would cause nonexistent indexes to be referenced). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Michael Paquier wrote: > On Wed, Nov 15, 2017 at 2:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Here's the remaining bits, rebased. > > At least patch 3 has conflicts with HEAD. I am moving this patch to > next CF per a lack of reviews, switching status to waiting on author. Thanks, will submit a new version before the start of the next commitfest. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Nov 30, 2017 at 7:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Great question. So you're thinking that the planner might have an > interest in knowing what indexes are defined at the parent table level > for planning purposes; but for that to actually have any effect we would > need to change the planner and executor also. And one more point, also > related to something you said before: we currently (I mean after my > patch) don't mark partitioned-table-level indexes as valid or not valid > depending on whether all its children exist, so trying to use that in > the planner without having a flag could cause invalid plans to be > generated (i.e. ones that would cause nonexistent indexes to be > referenced). Did you do it this way due to locking concerns? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 30, 2017 at 11:39 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > I feel like we could do better here with little extra effort. The > DETACH index feature does not really seem required for this patch. Because of the dump/restore considerations mentioned in http://postgr.es/m/CA+TgmobUhGHg9v8SAswkHbBfyWg5A0QB+jGt0UOvq5YcBDUGig@mail.gmail.com I believe we need a way to create the index on the parent without immediately triggering index builds on the children, plus a way to create an index on a child after-the-fact and attach it to the parent. Detach isn't strictly required, but why support attach and not detach? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
David Rowley wrote: > So, then this patch is only really intended as a syntax shortcut for > mass adding of indexes to each partition? This patch is intended to serve as a basis on which to construct further features, just like every other patch we apply. > I feel like we could do better here with little extra effort. The > DETACH index feature does not really seem required for this patch. I > think just ensuring a matching index exists on each leaf partition and > creating any which don't exist before creating the index on the target > partitioned table seems like the correct solution. That way we can > make that index indisvalid flag have a valid meaning all the time. > Some later patch can invent some way to replace a bloated index. What you're saying is that I've written code for A+B, and you're "interested in C (which is incompatible with B), so can we have A+C and drop B". But in reality, there exists (unwritten) D that solves the incompatiblity between B and C. I'm just saying it's essentially the same to postpone C+D than to postpone B+D, and I already have B written; plus that way we don't have to come up with some novel way to handle pg_dump support. So can we get A+B committed and discuss C+D later? A = partitioned indexes B = pg_dump support based on ATTACH C = your proposed planner stuff D = correct indisvalid setting for partitioned indexes (set to false when a partition does not contain the index) The patch in this thread is A+B. > Perhaps later we can invent some generic way to replace a physical > leaf index for a given partitioned index perhaps with the same patch > that might allow us to replace an index which is used by a constraint, > which to me seems like a feature we should have had years ago. This is a hypothetical feature E which would be nice (for partitioned indexes and for ordinary indexes too) but is not strictly necessary. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2 December 2017 at 11:10, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > What you're saying is that I've written code for A+B, and you're > "interested in C (which is incompatible with B), so can we have A+C and > drop B". But in reality, there exists (unwritten) D that solves the > incompatiblity between B and C. I'm just saying it's essentially the > same to postpone C+D than to postpone B+D, and I already have B written; > plus that way we don't have to come up with some novel way to handle > pg_dump support. So can we get A+B committed and discuss C+D later? > > A = partitioned indexes > B = pg_dump support based on ATTACH > C = your proposed planner stuff > D = correct indisvalid setting for partitioned indexes (set to false > when a partition does not contain the index) > > The patch in this thread is A+B. Okay, I wasn't insisting, just asking if you thought this was missing from the patch. However, I do still feel that if we're adding an index to an object then it should be available in RelOptInfo->indexlist, but this patch is still good progress even if we don't add it there. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2 December 2017 at 03:39, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Nov 30, 2017 at 11:39 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> I feel like we could do better here with little extra effort. The >> DETACH index feature does not really seem required for this patch. > > Because of the dump/restore considerations mentioned in > http://postgr.es/m/CA+TgmobUhGHg9v8SAswkHbBfyWg5A0QB+jGt0UOvq5YcBDUGig@mail.gmail.com > I believe we need a way to create the index on the parent without > immediately triggering index builds on the children, plus a way to > create an index on a child after-the-fact and attach it to the parent. > Detach isn't strictly required, but why support attach and not detach? I proposed that this worked a different way in [1]. I think the way I mention is cleaner as it means there's no extra reason for a partitioned index to be indisvalid=false than there is for any other normal index. My primary reason for not liking this way is around leaving indexes around as indisvalid=false, however, I suppose that an index could be replaced atomically with a DETACH and ATTACH within a single transaction. I had previously not really liked the idea of invalidating an index by DETACHing a leaf table's index from it. Of course, this patch does nothing special with partitioned indexes, but I believe one day we will want to do something with these and that the DETACH/ATTACH is not the best design for replacing part of the index. [1] https://www.postgresql.org/message-id/CAKJS1f_o6v%2BXtT%2B3gfieUdCiU5Sn84humT-CVW5So_x_f%3DkLxQ%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Dec 4, 2017 at 6:57 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 2 December 2017 at 03:39, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Nov 30, 2017 at 11:39 PM, David Rowley >> <david.rowley@2ndquadrant.com> wrote: >>> I feel like we could do better here with little extra effort. The >>> DETACH index feature does not really seem required for this patch. >> >> Because of the dump/restore considerations mentioned in >> http://postgr.es/m/CA+TgmobUhGHg9v8SAswkHbBfyWg5A0QB+jGt0UOvq5YcBDUGig@mail.gmail.com >> I believe we need a way to create the index on the parent without >> immediately triggering index builds on the children, plus a way to >> create an index on a child after-the-fact and attach it to the parent. >> Detach isn't strictly required, but why support attach and not detach? > > I proposed that this worked a different way in [1]. I think the way I > mention is cleaner as it means there's no extra reason for a > partitioned index to be indisvalid=false than there is for any other > normal index. How does that proposal keep pg_dump from latching onto the wrong index, if there's more than one index on the same columns? > My primary reason for not liking this way is around leaving indexes > around as indisvalid=false, however, I suppose that an index could be > replaced atomically with a DETACH and ATTACH within a single > transaction. I had previously not really liked the idea of > invalidating an index by DETACHing a leaf table's index from it. Of > course, this patch does nothing special with partitioned indexes, but > I believe one day we will want to do something with these and that the > DETACH/ATTACH is not the best design for replacing part of the index. What do you propose? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6 December 2017 at 04:29, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 4, 2017 at 6:57 PM, David Rowley >> I proposed that this worked a different way in [1]. I think the way I >> mention is cleaner as it means there's no extra reason for a >> partitioned index to be indisvalid=false than there is for any other >> normal index. > > How does that proposal keep pg_dump from latching onto the wrong > index, if there's more than one index on the same columns? I'm not hugely concerned about that. It's not a new problem and it's not a problem that I recall seeing anyone complain about, at least not to the extent that we've ever bothered to fix it. The existing problem is with FOREIGN KEY constraints just choosing the first matching index in transformFkeyCheckAttrs() We can see the issue today with: create table t1 (id int not null); create unique index t1_idx_b on t1 (id); create table t2 (id int references t1 (id)); create unique index t1_idx_a on t1 (id); <pg_dump> <pg_restore> # drop index t1_idx_a; ERROR: cannot drop index t1_idx_a because other objects depend on it DETAIL: constraint t2_id_fkey on table t2 depends on index t1_idx_a HINT: Use DROP ... CASCADE to drop the dependent objects too. >> My primary reason for not liking this way is around leaving indexes >> around as indisvalid=false, however, I suppose that an index could be >> replaced atomically with a DETACH and ATTACH within a single >> transaction. I had previously not really liked the idea of >> invalidating an index by DETACHing a leaf table's index from it. Of >> course, this patch does nothing special with partitioned indexes, but >> I believe one day we will want to do something with these and that the >> DETACH/ATTACH is not the best design for replacing part of the index. > > What do you propose? For this patch, I propose we do nothing about that, but for the future, I already mentioned an idea to do this above. > I'd have thought some sort of: ALTER INDEX name_of_partitioned_index > REPLACE INDEX FOR partitioned_tablename WITH > name_of_new_matching_bloat_free_index; This way there is no period where we have to mark the index as indisvalid=false. So, for when we invent something that uses this feature, there's no window of time that concurrently running queries are in danger of generating a poor plan. The index swap operation is atomic. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley wrote: > On 6 December 2017 at 04:29, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Dec 4, 2017 at 6:57 PM, David Rowley > >> I proposed that this worked a different way in [1]. I think the way I > >> mention is cleaner as it means there's no extra reason for a > >> partitioned index to be indisvalid=false than there is for any other > >> normal index. > > > > How does that proposal keep pg_dump from latching onto the wrong > > index, if there's more than one index on the same columns? > > I'm not hugely concerned about that. It's not a new problem and it's > not a problem that I recall seeing anyone complain about, at least not > to the extent that we've ever bothered to fix it. Another reason to do things the proposed way is to let parallel restore create the indexes in parallel. If we just have a single CREATE INDEX that cascades to the partitions, that can be run by a single backend only, which is a loser. (There's also the fact that you'd lose any COMMENTs etc). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 5, 2017 at 3:53 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > I'm not hugely concerned about that. It's not a new problem and it's > not a problem that I recall seeing anyone complain about, at least not > to the extent that we've ever bothered to fix it. > > The existing problem is with FOREIGN KEY constraints just choosing the > first matching index in transformFkeyCheckAttrs() > > We can see the issue today with: > > create table t1 (id int not null); > create unique index t1_idx_b on t1 (id); > create table t2 (id int references t1 (id)); > create unique index t1_idx_a on t1 (id); > > <pg_dump> > <pg_restore> > > # drop index t1_idx_a; > ERROR: cannot drop index t1_idx_a because other objects depend on it > DETAIL: constraint t2_id_fkey on table t2 depends on index t1_idx_a > HINT: Use DROP ... CASCADE to drop the dependent objects too. Ugh, that sucks. I don't think it's a good argument for making the problem worse, though. What are we giving up by explicitly attaching the correct index? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6 December 2017 at 09:58, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > David Rowley wrote: >> On 6 December 2017 at 04:29, Robert Haas <robertmhaas@gmail.com> wrote: >> > How does that proposal keep pg_dump from latching onto the wrong >> > index, if there's more than one index on the same columns? >> >> I'm not hugely concerned about that. It's not a new problem and it's >> not a problem that I recall seeing anyone complain about, at least not >> to the extent that we've ever bothered to fix it. > > Another reason to do things the proposed way is to let parallel restore > create the indexes in parallel. If we just have a single CREATE INDEX > that cascades to the partitions, that can be run by a single backend > only, which is a loser. I'm not all that sure why parallel restore could not work with this. The leaf partition indexes would all be created first, then the CREATE INDEX for the partitioned index would just be a metadata change with some checks to ensure all the leaf partition indexes exist. > (There's also the fact that you'd lose any COMMENTs etc). I can't see why that would be a problem with my proposal. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote: > What are we giving up by explicitly attaching > the correct index? The part I don't like about the ATTACH and DETACH of partitioned index is that it seems to be trying to just follow the syntax we use to remove a partition from a partitioned table, however, there's a huge difference between the two, as DETACHing a partition from a partitioned table leaves the partitioned table in a valid state, it simply just no longer contains the detached partition. With the partitioned index, we leave the index in an invalid state after a DETACH. It can only be made valid again once another leaf index has been ATTACHED again and that we've verified that all other indexes on every leaf partition is also there and are valid. If we're going to use these indexes to answer queries, then it seems like we should try to keep them valid so that queries can actually use them for something. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Dec 6, 2017 at 9:42 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote: >> What are we giving up by explicitly attaching >> the correct index? > > The part I don't like about the ATTACH and DETACH of partitioned index > is that it seems to be trying to just follow the syntax we use to > remove a partition from a partitioned table, however, there's a huge > difference between the two, as DETACHing a partition from a > partitioned table leaves the partitioned table in a valid state, it > simply just no longer contains the detached partition. With the > partitioned index, we leave the index in an invalid state after a > DETACH. It can only be made valid again once another leaf index has > been ATTACHED again and that we've verified that all other indexes on > every leaf partition is also there and are valid. If we're going to > use these indexes to answer queries, then it seems like we should try > to keep them valid so that queries can actually use them for > something. > +1 for all that. Exactly my thoughts. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 6 December 2017 at 13:42, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote: >> What are we giving up by explicitly attaching >> the correct index? > > The part I don't like about the ATTACH and DETACH of partitioned index > is that it seems to be trying to just follow the syntax we use to > remove a partition from a partitioned table, however, there's a huge > difference between the two, as DETACHing a partition from a > partitioned table leaves the partitioned table in a valid state, it > simply just no longer contains the detached partition. With the > partitioned index, we leave the index in an invalid state after a > DETACH. It can only be made valid again once another leaf index has > been ATTACHED again and that we've verified that all other indexes on > every leaf partition is also there and are valid. If we're going to > use these indexes to answer queries, then it seems like we should try > to keep them valid so that queries can actually use them for > something. Also, ATTACH and DETACH are especially useless when it comes to UNIQUE indexes. If we simply want to replace out a bloated index using a DETACH quickly followed by an ATTACH then it leaves a non-zero window of time that we can't be certain that the uniqueness is enforced. This would still work on an individual partition level, but if we ever want to reference a UNIQUE partitioned index in a foreign key constraint then what happens to the foreign key when the index is in the invalid state? Should we just disallow DETACH when a foreign key exists? or just invalidate the foreign key constraint too? Both seem like a nightmare from a DBA point-of-view. You might argue that concurrently recreating an index used by a foreign key is just as difficult today, but there's no reason to make this as problematic, is there? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 5, 2017 at 7:42 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 6 December 2017 at 11:35, Robert Haas <robertmhaas@gmail.com> wrote: >> What are we giving up by explicitly attaching >> the correct index? > > The part I don't like about the ATTACH and DETACH of partitioned index > is that it seems to be trying to just follow the syntax we use to > remove a partition from a partitioned table, however, there's a huge > difference between the two, as DETACHing a partition from a > partitioned table leaves the partitioned table in a valid state, it > simply just no longer contains the detached partition. With the > partitioned index, we leave the index in an invalid state after a > DETACH. It can only be made valid again once another leaf index has > been ATTACHED again and that we've verified that all other indexes on > every leaf partition is also there and are valid. If we're going to > use these indexes to answer queries, then it seems like we should try > to keep them valid so that queries can actually use them for > something. I think keeping them valid is a great goal, just like I like low interest rates and a chicken in every pot. However, I'm pretty skeptical of our ability to always succeed in meeting that goal with absolutely zero corner cases. What about a CREATE INDEX CONCURRENTLY that fails midway through, or similarly DROP INDEX CONCURRENTLY? Those operations can leave around artifacts in the unpartitioned table case, and I bet they will also leave around artifacts for partitioned tables, and maybe there will be cases where they don't leave the same artifacts for every table in the hierarchy. Even if they do, there is future development to consider. Maybe REINDEX INDEX CONCURRENTLY will carry a possibility of creating a mix of states. Or maybe someone will implement Simon's idea from a few years ago of allowing an unlogged index on a permanent table, with the index being somehow marked not-valid after a restart. In that situation, each one would need to be reindexed independently to become valid. I do agree with you that an index which is currently enforcing a unique constraint has to remain continuously valid -- or if it unavoidably doesn't, for example if we allowed an unlogged unique index on a logged table, then we'd have to do something unpleasant like disallow inserts and updates to the key column until that gets fixed. However, that doesn't seem to preclude gracefully swapping out indexes for individual partitions; instead of providing a DETACH operation, we could provide a REPLACE operation that effectively does DETACH + ATTACH. It's possible that we are not that far apart here. I don't like the ATTACH syntax because it's like what we do for partitions; I like it because it solves the pg_dump problem. And it seems to me that your reservations are more about DETACH than ATTACH. I have no issue with punting DETACH to the curb, recasting it as REPLACE, or whatever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 December 2017 at 10:07, Robert Haas <robertmhaas@gmail.com> wrote: > I do agree with you that an index which is currently enforcing a > unique constraint has to remain continuously valid -- or if it > unavoidably doesn't, for example if we allowed an unlogged unique > index on a logged table, then we'd have to do something unpleasant > like disallow inserts and updates to the key column until that gets > fixed. However, that doesn't seem to preclude gracefully swapping out > indexes for individual partitions; instead of providing a DETACH > operation, we could provide a REPLACE operation that effectively does > DETACH + ATTACH. Yeah, that's what I've been trying to push for. I've mentioned a handy wavy REPLACE syntax a few times. > It's possible that we are not that far apart here. I don't like the > ATTACH syntax because it's like what we do for partitions; I like it > because it solves the pg_dump problem. And it seems to me that your > reservations are more about DETACH than ATTACH. I have no issue with > punting DETACH to the curb, recasting it as REPLACE, or whatever. I just don't quite see the pg_dump problem with my proposed approach. Let me pseudo write out what I imagine in a pg_dump file. CREATE TABLE p (a INT NOT NULL, b INT NOT NULL) PARTITION BY RANGE (a); CREATE TABLE p1 PARTITION OF p FOR VALUES FROM (1) TO (10); CREATE TABLE p2 PARTITION OF p FOR VALUES FROM (10) TO (20); <load data> -- create indexes on partitions CREATE INDEX p1_a_idx ON p1 (a); -- normal create index CREATE INDEX p2_a_idx ON p2 (a); -- normal create index. COMMENT ON INDEX p1_a_idx IS 'the comment'; -- create partitioned indexes -- checks indexes exist for each partition of p with matching columns, -- AM and unique property then perform a metadata only change to -- say there's an index on this table, also updates each index uses to -- say its parent index is this index. If someone happens to have edited -- the dump file to remove one of the indexes on a leaf partition then the -- following would have to create that index again. CREATE INDEX p_a_idx ON p (a); -- -- PostgreSQL database dump complete -- We'd just need to ensure the indexes of leafs are created in an order that takes into account the dependency order. And yeah, this does nothing for making sure we choose the correct index if more than one matching index exists on the leaf partition, but perhaps we can dump a series of ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx; ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx; ... which would be no-ops most of the time, but at least would ensure we use the correct index. (Likely we could fix the FOREIGN KEY constraint choosing the first matching index with some variation of this syntax) Also, DROP INDEX should be disallowed on a leaf partition index which is in use for a partitioned index. I imagine pg_index just needs a new column indparentid which would be InvalidOid if it's not used. I'm just not that clear on if that column should be set to the leaf partition's direct parent, or the parent that the CREATE INDEX was done on. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 7, 2017 at 4:43 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > And yeah, this does nothing for making sure we choose the correct > index if more than one matching index exists on the leaf partition, > but perhaps we can dump a series of > > ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx; > ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx; > > ... which would be no-ops most of the time, but at least would ensure > we use the correct index. (Likely we could fix the FOREIGN KEY > constraint choosing the first matching index with some variation of > this syntax) Sure, that would fix the problem I'm concerned about, but creating the parent index first, as a shell, and then creating each child and attaching it to the parent *also* fixes the problem I'm concerned about, and without dragging any bystander objects into the fray. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 December 2017 at 10:51, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 7, 2017 at 4:43 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> And yeah, this does nothing for making sure we choose the correct >> index if more than one matching index exists on the leaf partition, >> but perhaps we can dump a series of >> >> ALTER INDEX p_a_idx REPLACE INDEX FOR p1 WITH p1_a_idx; >> ALTER INDEX p_a_idx REPLACE INDEX FOR p2 WITH p2_a_idx; >> >> ... which would be no-ops most of the time, but at least would ensure >> we use the correct index. (Likely we could fix the FOREIGN KEY >> constraint choosing the first matching index with some variation of >> this syntax) > > Sure, that would fix the problem I'm concerned about, but creating the > parent index first, as a shell, and then creating each child and > attaching it to the parent *also* fixes the problem I'm concerned > about, and without dragging any bystander objects into the fray. That's true, but is it worth inventing/maintaining an ATTACH syntax for that? It's not a very common case that multiple matching indexes existing. If it is worth it, do we need DETACH at all? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 7, 2017 at 4:57 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> Sure, that would fix the problem I'm concerned about, but creating the >> parent index first, as a shell, and then creating each child and >> attaching it to the parent *also* fixes the problem I'm concerned >> about, and without dragging any bystander objects into the fray. > > That's true, but is it worth inventing/maintaining an ATTACH syntax > for that? It's not a very common case that multiple matching indexes > existing. If it is worth it, do we need DETACH at all? I think it is totally worth it. A little piece of extra DDL syntax isn't that really costing us anything. Your proposed approach probably still has obscure failure cases - e.g. I bet the deadlock-avoidance logic in parallel restore won't know about the dependency on a seemingly-unrelated object. Also, the use of a seemingly-useless REPLACE syntax in every dump file will probably confuse at least a few users, and maybe a few developers, too. I think there is considerable value, both for the system and for the humans who use it, in having something that has *exactly* the semantics we want rather than *almost* the semantics we want. I suppose if we want to get cute, we could have ONLY the ATTACH syntax; if you attach an index for a partition that already has an index attached, that could mean attach to the new one instead of the old one (i.e. replace). But I would just add support for both ATTACH and REPLACE and call it good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 December 2017 at 11:23, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 7, 2017 at 4:57 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> That's true, but is it worth inventing/maintaining an ATTACH syntax >> for that? It's not a very common case that multiple matching indexes >> existing. If it is worth it, do we need DETACH at all? > > I think it is totally worth it. A little piece of extra DDL syntax > isn't that really costing us anything. Your proposed approach > probably still has obscure failure cases - e.g. I bet the > deadlock-avoidance logic in parallel restore won't know about the > dependency on a seemingly-unrelated object. Also, the use of a > seemingly-useless REPLACE syntax in every dump file will probably > confuse at least a few users, and maybe a few developers, too. I > think there is considerable value, both for the system and for the > humans who use it, in having something that has *exactly* the > semantics we want rather than *almost* the semantics we want. > > I suppose if we want to get cute, we could have ONLY the ATTACH > syntax; if you attach an index for a partition that already has an > index attached, that could mean attach to the new one instead of the > old one (i.e. replace). But I would just add support for both ATTACH > and REPLACE and call it good. ATTACH/REPLACE sounds fine. My objection was more about the DETACH/ATTACH method to replace an index. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley wrote: > ATTACH/REPLACE sounds fine. My objection was more about the > DETACH/ATTACH method to replace an index. So what happens if you do ALTER INDEX .. ATTACH and you already have another index in that partition that is attached to the same parent in the index? With my code, what happens is you have two indexes attached to the same parent, and you can't drop any. If we don't have DETACH, how can you get out of that situation? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 11, 2017 at 10:04 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > David Rowley wrote: > >> ATTACH/REPLACE sounds fine. My objection was more about the >> DETACH/ATTACH method to replace an index. > > So what happens if you do ALTER INDEX .. ATTACH and you already have > another index in that partition that is attached to the same parent in > the index? I think that should be an ERROR. You can use REPLACE if you want to switch which index is attached, but you shouldn't be able to attach two indexes from the same partition at the same time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hmm, so I'm now unsure what the actual proposals for handling pg_dump are. We seem to have the following three proposals: 1. Alvaro: use CREATE INDEX ON ONLY <parent> (not recursive ), followed by CREATE INDEX ON <partition>, followed by ALTER INDEX <on_parent> ATTACH PARTITION <on_partition>. I provide an ALTER INDEX DETACH PARTITION for symmetry and because it can be used to replace the index. Pros: the database is always restored identically to what was in the original. Con: The index hierarchy might be "partial", that is, lack a component index on some partition. 2. David's: use CREATE INDEX ON <partition>, followed by CREATE INDEX ON <parent>. This will use the matching mechanism to automatically attach the index on partition to index on parent. If some partition lacks a matching index, one is created automatically by the creation on parent. If you want to replace the index on a partition, use a new (as yet unimplemented) ALTER INDEX REPLACE. No need to add ONLY to the table name in CREATE INDEX, since the command always recurses. (This seems good to me, because I Pro: the index is never "partial" (missing a partition). Con: the matching mechanism might choose a different index on restore than what was selected in the database being dumped. 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell index on parent only (no recursion), followed by CREATE INDEX ON <partition>. DETACH is not provided. If you ATTACH an index for a partition that already has one index attached, then (1) the newly attached one replaces the original (i.e. effectively REPLACE) or (2) you get an error and we implement a separate ALTER INDEX REPLACE command. It's not clear to me how or when the shell index becomes a real index. Robert, can you please clarify the terms of your proposal? How is it better than mine? Is David's concern about a "partial" index (i.e. an index that doesn't exist in some partition) solved by it? I have code for proposals 1 and 2. I don't like proposal 2, and David & Ashutosh don't like (1). Maybe if we all understand (3) we can agree on using that one? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell > index on parent only (no recursion), followed by CREATE INDEX ON > <partition>. DETACH is not provided. If you ATTACH an index for a > partition that already has one index attached, then (1) the newly > attached one replaces the original (i.e. effectively REPLACE) or (2) > you get an error and we implement a separate ALTER INDEX REPLACE > command. It's not clear to me how or when the shell index becomes a > real index. As I understand the whole purpose of this design is that there is no point during the restore at which the index lacks indexes on partitions: it is either complete, or it doesn't exist yet. If we create the index on parent first, and later the indexes on partitions, that condition is not satisfied. To solve this, we could create the children indexes first and then the parent; but how do we indicate to the parent creation which are the indexes that oughta be marked as children? ALTER INDEX ATTACH PARTITION doesn't cut it, because it occurs after the index on parent is created, which is too late. We would do something like CREATE INDEX ON parent (columns) [other stuff] ATTACH idx_on_child1, idx_on_child2, idx_on_child3; but this seems mighty ugly. Anyway if we do that, what is the point of ALTER INDEX ATTACH PARTITION? We might as leave it out and just keep the door open for a later ALTER INDEX REPLACE. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 15, 2017 at 4:02 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell > index on parent only (no recursion), followed by CREATE INDEX ON > <partition>. DETACH is not provided. If you ATTACH an index for a > partition that already has one index attached, then (1) the newly > attached one replaces the original (i.e. effectively REPLACE) or (2) > you get an error and we implement a separate ALTER INDEX REPLACE > command. It's not clear to me how or when the shell index becomes a > real index. With this proposal, I think the index can be invalid initially, but once you've attached an index for every child partition, it becomes irrevocably valid. After that, the only supported operation is REPLACE, which preserves validity. > Robert, can you please clarify the terms of your proposal? How is it > better than mine? Is David's concern about a "partial" index (i.e. an > index that doesn't exist in some partition) solved by it? I think the perceived advantage is that, once valid, the index can't subsequently become not-valid. That seemed to be David's big concern (which is not without foundation). > I have code for proposals 1 and 2. I don't like proposal 2, and David & > Ashutosh don't like (1). Maybe if we all understand (3) we can agree on > using that one? Yes, it would be nice to achieve some sort of consensus and I think (3) gives everyone a little of what they want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Nov 30, 2017 at 7:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Great question. So you're thinking that the planner might have an > > interest in knowing what indexes are defined at the parent table level > > for planning purposes; but for that to actually have any effect we would > > need to change the planner and executor also. And one more point, also > > related to something you said before: we currently (I mean after my > > patch) don't mark partitioned-table-level indexes as valid or not valid > > depending on whether all its children exist, so trying to use that in > > the planner without having a flag could cause invalid plans to be > > generated (i.e. ones that would cause nonexistent indexes to be > > referenced). > > Did you do it this way due to locking concerns? No -- just because since the index-on-parent is a different kind of object (RELKIND_PARTITIONED_INDEX) it is not considered for anything in the planner anyway, so there's no need for indisvalid to be "correct". Changing the flag in the parent index only needs to examine state on the children, not modify them, so I don't think there would be any serious locking problem. By the way, my implementation of ALTER INDEX ATTACH PARTITION was prone to deadlocks because it needs locks on parent table, index-on-parent, partition, and index-on-partition; and there was no consideration to the ordering in which these were being acquired. I wrote a callback for RangeVarGetRelidExtended to be called in ATExecAttachPartitionIdx() that tries to acquire locks in the right order -- but I recently realized that even that is not sufficient, because (unless I misread it) ALTER INDEX itself does not worry about locking the containing table before the index. I think some tweaking needs to be done in RangeVarCallbackForAlterRelation to lock the table if an index is being altered (conditionally, depending on the precise ALTER TABLE operation). It surprises me that we don't need to do that yet, but I haven't looked into it any further. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Robert Haas wrote: > On Fri, Dec 15, 2017 at 4:02 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > 3. Robert's: use CREATE INDEX ON ONLY <parent>, which creates a shell > > index on parent only (no recursion), followed by CREATE INDEX ON > > <partition>. DETACH is not provided. If you ATTACH an index for a > > partition that already has one index attached, then (1) the newly > > attached one replaces the original (i.e. effectively REPLACE) or (2) > > you get an error and we implement a separate ALTER INDEX REPLACE > > command. It's not clear to me how or when the shell index becomes a > > real index. > > With this proposal, I think the index can be invalid initially, but > once you've attached an index for every child partition, it becomes > irrevocably valid. After that, the only supported operation is > REPLACE, which preserves validity. Sounds okay to me. (I had already deleted ALTER INDEX DETACH from my patch earlier today.) I admit I had also deleted the ONLY clause from CREATE INDEX, and I don't like having to put it back :-) But on the whole, having it sounds better than the alternatives. We have two options for marking valid: 1. after each ALTER INDEX ATTACH, verify whether the set of partitions that contain the index is complete; if so, mark it valid, otherwise do nothing. This sucks because we have to check that over and over for every index that we attach 2. We invent yet another command, say ALTER INDEX <idx-on-parent> VALIDATE -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 15, 2017 at 5:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > We have two options for marking valid: > > 1. after each ALTER INDEX ATTACH, verify whether the set of partitions > that contain the index is complete; if so, mark it valid, otherwise do > nothing. This sucks because we have to check that over and over for > every index that we attach > > 2. We invent yet another command, say > ALTER INDEX <idx-on-parent> VALIDATE If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I think it might as well try to validate while it's at it. But if not then we might want to go with #2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17 December 2017 at 16:22, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Dec 15, 2017 at 5:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> We have two options for marking valid: >> >> 1. after each ALTER INDEX ATTACH, verify whether the set of partitions >> that contain the index is complete; if so, mark it valid, otherwise do >> nothing. This sucks because we have to check that over and over for >> every index that we attach >> >> 2. We invent yet another command, say >> ALTER INDEX <idx-on-parent> VALIDATE It's not perfect that we need to validate each time, but it might not be that expensive to validate since we only really need to count the pg_index rows that have indisvalid = true rows which have a parent index listed as the index we're ATTACHing too, then ensure that matches the number of leaf partitions. > If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I > think it might as well try to validate while it's at it. But if not > then we might want to go with #2. I'm now not that clear on what the behaviour is if the ONLY keyword is not specified on the CREATE INDEX for the partitioned index. Does that go and create each leaf partition index regardless of if there is a suitable candidate to ATTACH? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas wrote: > On Fri, Dec 15, 2017 at 5:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > We have two options for marking valid: > > > > 1. after each ALTER INDEX ATTACH, verify whether the set of partitions > > that contain the index is complete; if so, mark it valid, otherwise do > > nothing. This sucks because we have to check that over and over for > > every index that we attach > > > > 2. We invent yet another command, say > > ALTER INDEX <idx-on-parent> VALIDATE > > If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I > think it might as well try to validate while it's at it. But if not > then we might want to go with #2. The problem I have with it is that restoring a dump containing indexes on partitions becomes a O(N^2) deal as it has to do the full check once for every index we attach. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Dec 17, 2017 at 10:50 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I >> think it might as well try to validate while it's at it. But if not >> then we might want to go with #2. > > The problem I have with it is that restoring a dump containing indexes > on partitions becomes a O(N^2) deal as it has to do the full check once > for every index we attach. Sure. If the constant factor is high enough to matter, then VALIDATE makes sense. IMHO, anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Dec 17, 2017 at 5:29 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> If ALTER INDEX .. ATTACH is already taking AEL on the parent, then I >> think it might as well try to validate while it's at it. But if not >> then we might want to go with #2. > > I'm now not that clear on what the behaviour is if the ONLY keyword is > not specified on the CREATE INDEX for the partitioned index. Does that > go and create each leaf partition index regardless of if there is a > suitable candidate to ATTACH? No, the other way around. ONLY is being proposed as a way to create an initially-not-valid parent to which we can then ATTACH subsequently-created child indexes. But because we will have REPLACE rather than DETACH, once you get the index valid it never goes back to not-valid. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18 December 2017 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Dec 17, 2017 at 5:29 AM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> I'm now not that clear on what the behaviour is if the ONLY keyword is >> not specified on the CREATE INDEX for the partitioned index. Does that >> go and create each leaf partition index regardless of if there is a >> suitable candidate to ATTACH? > > No, the other way around. ONLY is being proposed as a way to create > an initially-not-valid parent to which we can then ATTACH > subsequently-created child indexes. But because we will have REPLACE > rather than DETACH, once you get the index valid it never goes back to > not-valid. I understand what the ONLY is proposed to do. My question was in regards to the behaviour without ONLY. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Dec 17, 2017 at 9:38 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 18 December 2017 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Dec 17, 2017 at 5:29 AM, David Rowley >> <david.rowley@2ndquadrant.com> wrote: >>> I'm now not that clear on what the behaviour is if the ONLY keyword is >>> not specified on the CREATE INDEX for the partitioned index. Does that >>> go and create each leaf partition index regardless of if there is a >>> suitable candidate to ATTACH? >> >> No, the other way around. ONLY is being proposed as a way to create >> an initially-not-valid parent to which we can then ATTACH >> subsequently-created child indexes. But because we will have REPLACE >> rather than DETACH, once you get the index valid it never goes back to >> not-valid. > > I understand what the ONLY is proposed to do. My question was in > regards to the behaviour without ONLY. Oh, sorry -- I was confused. I'm not sure whether that should try to attach to something if it exists, or just create unconditionally... what do you think? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18 December 2017 at 15:59, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Dec 17, 2017 at 9:38 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> On 18 December 2017 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Sun, Dec 17, 2017 at 5:29 AM, David Rowley >>> <david.rowley@2ndquadrant.com> wrote: >>>> I'm now not that clear on what the behaviour is if the ONLY keyword is >>>> not specified on the CREATE INDEX for the partitioned index. Does that >>>> go and create each leaf partition index regardless of if there is a >>>> suitable candidate to ATTACH? >>> >>> No, the other way around. ONLY is being proposed as a way to create >>> an initially-not-valid parent to which we can then ATTACH >>> subsequently-created child indexes. But because we will have REPLACE >>> rather than DETACH, once you get the index valid it never goes back to >>> not-valid. >> >> I understand what the ONLY is proposed to do. My question was in >> regards to the behaviour without ONLY. > > Oh, sorry -- I was confused. I'm not sure whether that should try to > attach to something if it exists, or just create unconditionally... > what do you think? I think you feel quite strongly about not having the code select a random matching index, so if we want to stick to that rule, then we'll need to create a set of new leaf indexes rather than select a random one. If we don't do that, then we might as well go with my idea and ditch the ONLY syntax and have the CREATE INDEX try to find a suitable leaf index, or create a new leaf index if one cannot be found. Would it be surprising to users if CREATE INDEX ON partitioned_table created a bunch of duplicate new indexes on the leaf tables? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Dec 17, 2017 at 10:05 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > I think you feel quite strongly about not having the code select a > random matching index, so if we want to stick to that rule, then we'll > need to create a set of new leaf indexes rather than select a random > one. I feel quite strongly about it *in the context of pg_dump*. Outside of that scenario, I'm happy to go with whatever behavior you and others think will satisfy the POLA. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
After this discussion, this is how I see things working: 1. pg_dump a) creates indexes on partitions normally b) once all existing indexes are done, index on parent is created, with ONLY. No cascading occurs, no indexes are attached. c) ATTACH is run for each existing partition index. After each ATTACH, we check that all indexes exist. If so, the parent is marked valid. d) if not all indexes existed in partitions, index on parent remains invalid. (It was invalid in the dumped database, so this is correct.) 2. other uses Normal CREATE INDEX (without ONLY) recurses and attaches the first matching index it finds (no duplicate indexes are created); partitions without a matching index get one created. 3. ALTER INDEX DETACH is not provided. Therefore: once index is valid, it remains so forever. I think this satisfies all concerns. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > After this discussion, this is how I see things working: > > 1. pg_dump > a) creates indexes on partitions normally > b) once all existing indexes are done, index on parent is created, > with ONLY. No cascading occurs, no indexes are attached. > c) ATTACH is run for each existing partition index. After each > ATTACH, we check that all indexes exist. If so, the parent is > marked valid. > d) if not all indexes existed in partitions, index on parent remains > invalid. (It was invalid in the dumped database, so this is > correct.) > > 2. other uses > Normal CREATE INDEX (without ONLY) recurses and attaches the first > matching index it finds (no duplicate indexes are created); > partitions without a matching index get one created. > > 3. ALTER INDEX DETACH is not provided. Therefore: once index is valid, > it remains so forever. > > I think this satisfies all concerns. Sounds great to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 19 December 2017 at 05:08, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: >> After this discussion, this is how I see things working: >> >> 1. pg_dump >> a) creates indexes on partitions normally >> b) once all existing indexes are done, index on parent is created, >> with ONLY. No cascading occurs, no indexes are attached. >> c) ATTACH is run for each existing partition index. After each >> ATTACH, we check that all indexes exist. If so, the parent is >> marked valid. >> d) if not all indexes existed in partitions, index on parent remains >> invalid. (It was invalid in the dumped database, so this is >> correct.) >> >> 2. other uses >> Normal CREATE INDEX (without ONLY) recurses and attaches the first >> matching index it finds (no duplicate indexes are created); >> partitions without a matching index get one created. >> >> 3. ALTER INDEX DETACH is not provided. Therefore: once index is valid, >> it remains so forever. >> >> I think this satisfies all concerns. > > Sounds great to me. and me. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Great, thanks for the input. pg_dump behaves as described upthread -- thanks David and Robert for the input. I did this by injecting a fake "INDEX ATTACH" object in pg_dump's model, which depends on the index-on-parent-table; which in turn depends on the index-on-partition. Because of the dependencies, the attach is dumped last, and the index-on-parent is dumped after all the indexes-on-partitions have been dumped. I needed to apply a little bit of surgery so that each table object would contain links to its indexes, which previously wasn't there. A followup step to obtaining index info creates the INDEX ATTACH objects. In the backend code: There is now a difference in initial setting of indisvalid and indisready when creating an index. Previously we always set them the same (!concurrent) but now indisready depends only on "concurrent" while indisvalid additionally depends on whether a a non-inherited index on a partitioned table is being built. Failing to set indisready to true initially was causing the index to be invisible to pg_dump. There is no ALTER INDEX / DETACH, as discussed. Also, trying to attach an index for a partition that already contains an attached index raises an error -- which means that in order to determine whether attaching an index as partition makes the parent index valid, is a simple matter of counting tuples. All these behaviors are immortalized in the regression tests. I added tab-complete support too (thanks Jesper for the reminder). It is probably not exhaustive, but should be good enough. I have two patches to rebase on top of this, which I hope to post later today: 1) let these indexes be unique (i.e. allow unique and PK constraints) 2) allow foreign keys on partitioned tables I have a further patch to allow FOR EACH ROW triggers on partitioned tables, but I think it's largely unrelated to this (and, as I was surprised to discover, it's also unrelated to the FKs one). Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Dec 20, 2017 at 12:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I have two patches to rebase on top of this, which I hope to post later > today: > > 1) let these indexes be unique (i.e. allow unique and PK constraints) > 2) allow foreign keys on partitioned tables > > I have a further patch to allow FOR EACH ROW triggers on partitioned > tables, but I think it's largely unrelated to this (and, as I was > surprised to discover, it's also unrelated to the FKs one). Sounds great! I made the comment during my talk at PGCONF.EU that partitioned tables in PostgreSQL are really just a bunch of tables glued together, but that over time "we'll make the glue better", and I think the improvements on which you are working will go a long way in that direction. With regard to indexes, presumably we can only have a unique index if the index definition matches the partition key definition. Otherwise, we can only guarantee per-partition uniqueness, unless (1) we implement "global" indexes, which sounds like a patch that would take a fair amount of work, or (2) every index insertion goes and checks for conflicting values in all of the "sibling" indexes. This latter approach was suggested to me by Andres, and certainly solves a bunch of problems with vacuuming and dropping indexes, but seems like it would probably have lousy insert performance unless the number of partitions is very small. With regard to foreign keys, it seems like there are two directions to worry about. An "outbound" foreign key needs to cascade, I think, more or less like an index does, with every child inheriting the foreign key from the partition root. An "inbound" foreign key just needs a unique index to point at and depend on. Given the work you are doing on unique indexes, I'm guessing that you are talking about supporting the latter, not the former, but the way you phrased it sounds like the opposite. I haven't thought much about the possibility of supporting FOR EACH ROW triggers, but it sounds like a good idea if we can do it. I'm not sure how that will work, though, if users directly access the leaf partitions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > Sounds great! I made the comment during my talk at PGCONF.EU that > partitioned tables in PostgreSQL are really just a bunch of tables > glued together, but that over time "we'll make the glue better", and I > think the improvements on which you are working will go a long way in > that direction. Thanks -- yes, that's my intention, to improve our partitioning story. Many other projects related to improving the glue are already underway by others, so I think we're making great progress on the overall partitioning front. > With regard to indexes, presumably we can only have a unique index if > the index definition matches the partition key definition. Otherwise, > we can only guarantee per-partition uniqueness, unless (1) we > implement "global" indexes, which sounds like a patch that would take > a fair amount of work, or (2) every index insertion goes and checks > for conflicting values in all of the "sibling" indexes. This latter > approach was suggested to me by Andres, and certainly solves a bunch > of problems with vacuuming and dropping indexes, but seems like it > would probably have lousy insert performance unless the number of > partitions is very small. Actually, we discussed unique constraints before. What I proposed is to implement a useful subset: only allow a unique constraint if the partition columns are in it, which means that enforcing the constraint on each partition independently is enough to satisfy the constraint globally. This doesn't solve all cases where you want UNIQUE on partitioned tables, but the reason to implement this subset is precisely that it doesn't have those drawbacks, so it's quick and efficient. Another point is that global indexes are not only a lot of work, but they also come with additional drawbacks such as the necessity to clean up when partitions are dropped/detached, which is a huge pain. > With regard to foreign keys, it seems like there are two directions to > worry about. An "outbound" foreign key needs to cascade, I think, > more or less like an index does, with every child inheriting the > foreign key from the partition root. An "inbound" foreign key just > needs a unique index to point at and depend on. Given the work you > are doing on unique indexes, I'm guessing that you are talking about > supporting the latter, not the former, but the way you phrased it > sounds like the opposite. Actually, I intend to do both. I have a prototype patch which seems to work, though I haven't hammered it heavily yet. Some of the cascading options are not implemented -- as I recall I implemented it for the CREATE TABLE PARTITION OF case but not yet the ALTER TABLE ATTACH PARTITION one. > I haven't thought much about the possibility of supporting FOR EACH > ROW triggers, but it sounds like a good idea if we can do it. I'm not > sure how that will work, though, if users directly access the leaf > partitions. Yeah, I'm not sure yet either. Needs more thought than I've given it so far. I have a prototype patch for this also, and the basic case seems to give reasonable results, but I haven't tried to find any weird corner cases either. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Just to show what I'm talking about, here are a few prototype patches. Beware, these are all very lightly tested/reviewed. Input is welcome, particularly if it comes in the guise of patches to regression tests showing cases that misbehave. 0001 is a fixup for the v6 patch I posted upthread; it's needed so the 0002 patch doesn't end up with indexes marked invalid after pg_dump. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I modified the regression test so that a partitioning hierarchy would be left behind after the test is run, which is useful to test pg_upgrade and pg_dump -- this caught one small bug. That and some reading of the diff resulted in v8, attach. On my system, make check-world passes. However, Thomas Munro's automated patch tester seems to have a problem with the pg_upgrade test, though I don't know what it is. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi Alvaro, On 12/20/2017 04:25 PM, Alvaro Herrera wrote: > I modified the regression test so that a partitioning hierarchy would be > left behind after the test is run, which is useful to test pg_upgrade > and pg_dump -- this caught one small bug. That and some reading of the > diff resulted in v8, attach. > > On my system, make check-world passes. However, Thomas Munro's > automated patch tester seems to have a problem with the pg_upgrade test, > though I don't know what it is. > Passes check-world here too w/ TAP + cassert. index.c: + values[Anum_pg_index_indparentidx - 1 ] = ObjectIdGetDatum(parentIndexOid); Extra space. tab-complete.c: Contains references to DETACH. create_index.sgml: I think the new paragraph should mention the naming convention of the generated indexes, as they may differ from the index name on the partition table. reindex.sgml: Missing a note about REINDEX not being supported on the partition index. Best regards, Jesper
Hello Jesper, Jesper Pedersen wrote: > Passes check-world here too w/ TAP + cassert. Great, thanks for checking. > index.c: > [and a few other comments] I believe these are all fixed by the attached delta patch. If you have wording suggestions for the doc changes, please send them along. Thanks for reading, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi Alvaro, On 2017/12/23 0:10, Alvaro Herrera wrote: > I believe these are all fixed by the attached delta patch. @@ -1676,7 +1694,12 @@ psql_completion(const char *text, int start, int end) "UNION SELECT 'ALL IN TABLESPACE'"); /* ALTER INDEX <name> */ else if (Matches3("ALTER", "INDEX", MatchAny)) - COMPLETE_WITH_LIST5("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET", "RESET"); + COMPLETE_WITH_LIST7("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET", + "RESET", "ATTACH PARTITION"); This should be COMPLETE_WITH_LIST6(). Thanks, Amit
On 2017/12/25 13:52, Amit Langote wrote: > Hi Alvaro, > > On 2017/12/23 0:10, Alvaro Herrera wrote: >> I believe these are all fixed by the attached delta patch. > > @@ -1676,7 +1694,12 @@ psql_completion(const char *text, int start, int end) > "UNION SELECT 'ALL IN TABLESPACE'"); > /* ALTER INDEX <name> */ > else if (Matches3("ALTER", "INDEX", MatchAny)) > - COMPLETE_WITH_LIST5("ALTER COLUMN", "OWNER TO", "RENAME TO", > "SET", "RESET"); > + COMPLETE_WITH_LIST7("ALTER COLUMN", "OWNER TO", "RENAME TO", "SET", > + "RESET", "ATTACH PARTITION"); > > This should be COMPLETE_WITH_LIST6(). I noticed a few things I thought I'd report. I was using the latest patch (v8 + the delta patch). 1. The following crashes because of an Assert in ATExecCmd(): ALTER TABLE an_index DETACH PARTITION ... It seems ATPrepCmd() should not pass ATT_INDEX to ATSimplePermissions() for the ATDetachPartition sub-command type. 2. Something strange about how dependencies are managed: create table p (a char) partition by list (a); create table pa partition of p for values in ('a');; create table pb partition of p for values in ('b'); create index on p (a); \d pa Table "public.pa" Column | Type | Collation | Nullable | Default |-------+--------------+-----------+----------+--------- a | character(1) | | | Partition of: p FOR VALUES IN ('a') Indexes: "pa_a_idx" btree (a) drop table pa; ERROR: cannot drop table pa because other objects depend on it DETAIL: index p_a_idx depends on table pa HINT: Use DROP ... CASCADE to drop the dependent objects too. set client_min_messages to debug2; drop table pa; DEBUG: drop auto-cascades to index pa_a_idx DEBUG: drop auto-cascades to index pb_a_idx DEBUG: drop auto-cascades to type pa DEBUG: drop auto-cascades to type pa[] ERROR: cannot drop table pa because other objects depend on it DETAIL: index p_a_idx depends on table pa HINT: Use DROP ... CASCADE to drop the dependent objects too. Thanks, Amit
Amit Langote wrote: > 2. Something strange about how dependencies are managed: > > create table p (a char) partition by list (a); > create table pa partition of p for values in ('a');; > create table pb partition of p for values in ('b'); > create index on p (a); > drop table pa; > ERROR: cannot drop table pa because other objects depend on it > DETAIL: index p_a_idx depends on table pa > HINT: Use DROP ... CASCADE to drop the dependent objects too. Hmm, this is quite nasty and I'm not seeing any reasonable way to fix it. The problem is that we have an internal dep from the parent index to the partition index, which prompts us to recurse the chase of dependencies inverting the way the dependencies flow -- the "case 3" in findDependentObjects: /* * 3. Not all the owning objects have been visited, so * transform this deletion request into a delete of this * owning object. The problem is that we don't want the partition index to go away unless it's together with its parent index, except if the table which contains it goes away first. We have no way to specify this condition ... Maybe something like sticking a phony additional record in pg_depend that causes the partition index to get added to the targetObjects list ahead of time ... but that doesn't work either, because we still want to chase the internal deps in that case. Hm. Maybe we need a new "auto internal" deptype with a mix of semantics of the other two deptypes. It seems a bit ugly and I'm not sure it'd work either ... I'll try to code it tomorrow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Maybe we need a new "auto internal" deptype with a mix of semantics of > the other two deptypes. It seems a bit ugly and I'm not sure it'd work > either ... I'll try to code it tomorrow. This seems to work pretty well, much to my surprise. I was a bit scared of adding a new deptype, but actually the only affected code is findDependentObjects() and the semantics of the new type is a subset of the existing DEPTYPE_INTERNAL, so I think it's okay. I need to fill in its description in docs and comments though -- I left it out because the real difference between INTERNAL and INTERNAL_AUTO is not something that is covered by the existing description of INTERNAL, so maybe I'll need to expand that one. This version includes the fixes I posted as "delta" to the problems Jesper reported, as well as fixes to the ones reported by Amit. It's looking pretty good to me -- I'm seeing it as a candidate to commit early in the upcoming commitfest. Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I just realized there's a further problem in the area: when a partition is detached from its parent, its indexes are not made independent of the indexes on parent. So they can't be dropped on their own (booh!); and dropping the index on the former parent partitioned table drops the index on the former partition also (hiss!). I think the fix for this is to delete all dependencies, and re-register normal ones. Should be straightforward, but I'm not doing it this year. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro, On 12/22/2017 10:10 AM, Alvaro Herrera wrote: > I believe these are all fixed by the attached delta patch. > Thanks. > If you have wording suggestions for the doc changes, please send them > along. > Maybe we should make the default index name more explicit under the 'name' parameter as attached. Best regards, Jesper
Attachment
Hi Alvaro, On 12/29/2017 12:59 PM, Alvaro Herrera wrote: > This seems to work pretty well, much to my surprise. I was a bit scared > of adding a new deptype, but actually the only affected code is > findDependentObjects() and the semantics of the new type is a subset of > the existing DEPTYPE_INTERNAL, so I think it's okay. I need to fill in > its description in docs and comments though -- I left it out because the > real difference between INTERNAL and INTERNAL_AUTO is not something that > is covered by the existing description of INTERNAL, so maybe I'll need > to expand that one. > > This version includes the fixes I posted as "delta" to the problems > Jesper reported, as well as fixes to the ones reported by Amit. It's > looking pretty good to me -- I'm seeing it as a candidate to commit > early in the upcoming commitfest. > indexing.sql contains a \di command which list the owner of the db, so make check-world fails. Best regards, Jesper
Jesper Pedersen wrote: > On 12/22/2017 10:10 AM, Alvaro Herrera wrote: > > If you have wording suggestions for the doc changes, please send them > > along. > > Maybe we should make the default index name more explicit under the 'name' > parameter as attached. I'm -0.2 on documenting this. In general, I'm hesitant to promise more than strictly needed, mostly because it's more difficult to later change our minds. For example, what if soon after this patch we decide to invent a new mechanism for creating names on partitions? Also, in reality there are additional considerations (how do we truncate very long names, what do we do if the name is already taken) that I would rather not ossify in user-facing documentation. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/29/17 15:38, Alvaro Herrera wrote: > I just realized there's a further problem in the area: when a partition > is detached from its parent, its indexes are not made independent of the > indexes on parent. So they can't be dropped on their own (booh!); and > dropping the index on the former parent partitioned table drops the > index on the former partition also (hiss!). I think the fix for this is > to delete all dependencies, and re-register normal ones. Should be > straightforward, but I'm not doing it this year. We're waiting for an updated patch in this thread, aren't we? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut wrote: > On 12/29/17 15:38, Alvaro Herrera wrote: > > I just realized there's a further problem in the area: when a partition > > is detached from its parent, its indexes are not made independent of the > > indexes on parent. So they can't be dropped on their own (booh!); and > > dropping the index on the former parent partitioned table drops the > > index on the former partition also (hiss!). I think the fix for this is > > to delete all dependencies, and re-register normal ones. Should be > > straightforward, but I'm not doing it this year. > > We're waiting for an updated patch in this thread, aren't we? Dunno. The problems reported so far are rather very minor details that don't prevent reviewing the rest of it. I will of course send an updated version at some point, but there's only one me so I can either spend the whole commitfest focused on fixing tiny issues my own patches, or help with other stuff. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Jesper Pedersen wrote: > indexing.sql contains a \di command which list the owner of the db, so make > check-world fails. Wow, that was a stupid oversight, thanks for pointing it out. Here's a version that applies to today's master, and fixes this problem. No other changes. (I didn't incorporate your proposal to document the index naming pattern. I propose that you post it in its own thread, so that it can be discussed on its own merits. Note typo "formular" in your patch; I'd suggest "pattern" or some such rather than "formula" though.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Alvaro Herrera wrote: > Wow, that was a stupid oversight, thanks for pointing it out. Here's a > version that applies to today's master, and fixes this problem. > > No other changes. v11 fixes the dependency problem I mentioned in https://postgr.es/m/20171229203818.pqxf2cyl4g2wre6h@alvherre.pgsql and adds some test to cover that stuff. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 12/29/17 12:59, Alvaro Herrera wrote: >> Maybe we need a new "auto internal" deptype with a mix of semantics of >> the other two deptypes. It seems a bit ugly and I'm not sure it'd work >> either ... I'll try to code it tomorrow. > > This seems to work pretty well, much to my surprise. I was a bit scared > of adding a new deptype, but actually the only affected code is > findDependentObjects() and the semantics of the new type is a subset of > the existing DEPTYPE_INTERNAL, so I think it's okay. I need to fill in > its description in docs and comments though -- I left it out because the > real difference between INTERNAL and INTERNAL_AUTO is not something that > is covered by the existing description of INTERNAL, so maybe I'll need > to expand that one. So this patch appears to implement what was agreed upon earlier in the thread. Documentation and tests seem pretty comprehensive. The new dependency type obviously needs to be explained in detail. CompareIndexInfo() doesn't compare indexes' operator classes and collations. I'm not sure why this feature of automatically picking up matching indexes even exists. Is it for some specific workflows or upgrade scenarios? It's kind of a surprising feature in a way. The catalog representations of partitioned tables and partitioned indexes are completely different, which may or may not be desirable. As mentioned elsewhere already, the tests fail because \di shows the owner, which can vary between sites. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 4, 2018 at 11:44 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I'm not sure why this feature of automatically picking up matching > indexes even exists. Is it for some specific workflows or upgrade > scenarios? It's kind of a surprising feature in a way. It allows you to avoid building a new indexes unnecessarily when attaching a partition. > The catalog representations of partitioned tables and partitioned > indexes are completely different, which may or may not be desirable. How so? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Jan 4, 2018 at 11:44 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > I'm not sure why this feature of automatically picking up matching > > indexes even exists. Is it for some specific workflows or upgrade > > scenarios? It's kind of a surprising feature in a way. > > It allows you to avoid building a new indexes unnecessarily when > attaching a partition. Yeah -- this is important to reduce the duration of the DDL operation, since attaching a partition requires an access exclusive lock. Having to build the index at that point would make that feature pretty much useless for many use cases. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut wrote: > CompareIndexInfo() doesn't compare indexes' operator classes and collations. Hmm ... will look into these. > As mentioned elsewhere already, the tests fail because \di shows the > owner, which can vary between sites. Already fixed in v10 which I posted this morning. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/4/18 12:00, Robert Haas wrote: >> The catalog representations of partitioned tables and partitioned >> indexes are completely different, which may or may not be desirable. > > How so? If someone wants to write a query, show me all the partitions of this table versus show me all the partitions of this index, intuitively, those could be the same, different only by some relkind references. Currently, you'd have to write two completely different queries. It's not a big deal, but it's a consideration. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut wrote: > On 1/4/18 12:00, Robert Haas wrote: > >> The catalog representations of partitioned tables and partitioned > >> indexes are completely different, which may or may not be desirable. > > > > How so? > > If someone wants to write a query, show me all the partitions of this > table versus show me all the partitions of this index, intuitively, > those could be the same, different only by some relkind references. Well, this "indparentidx" stuff I came up with is a little bit strange. Perhaps we could use pg_inherits instead, but I think the general view is that pg_inherits is on its way out. Tangentially: I didn't like very much that I added a new index to pg_index to support this feature. I thought maybe it'd be better to change the index on indrelid to be on (indrelid,indparentidx) instead, but that doesn't seem great either because it bloats that index which is used to support common relcache operations ... (The more I think of this, the more I believe that pg_inherits is a better answer. Opinions?) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5 January 2018 at 11:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > (The more I think of this, the more I believe that pg_inherits is a > better answer. Opinions?) I admit to not having had a chance to look at any code with this yet, but I'm just thinking about a case like the following. CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a); CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10) PARTITION BY RANGE (b); CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10); CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on part_a1_b1) CREATE INDEX ON part (a); -- What do we do here? Should we: 1. Create another identical index on part_a1_b1; or 2. Allow the existing index on part_a1_b1 to have multiple parents; or 3. ERROR... (probably not) I guess pg_index.indparentidx won't allow #2 to work. Some pg_inherits arrangement should. We don't really want to go creating indexes that we don't need to, so I think we should probably make an effort to allow #2 to work. Question is, how likely is the above scenario to take place? Normally, I see customers requiring partitioning only when an existing table grows too large for maintenance tasks to complete a reasonable timeframe. Probably there might also come a time when each of the partitions they've then gone and created also grows too large. So it does not seem unrealistic to be attaching existing tables/partitioned tables as partitions multiple times. Wanting to reuse leaf indexes for some new higher level partition parent does seem reasonable. This argument might be voided by if we allowed a DROP INDEX on part_a1 without dropping the leaf indexes. I've not checked what the patch does here, but I'd imagine if the indexes are marked as parents of that index, then they'll be dropped. I'll go off and look at the code now... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/4/18 23:08, David Rowley wrote: > On 5 January 2018 at 11:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> (The more I think of this, the more I believe that pg_inherits is a >> better answer. Opinions?) > > I admit to not having had a chance to look at any code with this yet, > but I'm just thinking about a case like the following. > > CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a); > CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10) > PARTITION BY RANGE (b); > CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10); > > CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on > part_a1_b1) > > CREATE INDEX ON part (a); -- What do we do here? > > Should we: > > 1. Create another identical index on part_a1_b1; or > 2. Allow the existing index on part_a1_b1 to have multiple parents; or > 3. ERROR... (probably not) 4. It should adopt part_a1 and its subindexes into its hierarchy. That shouldn't be a problem under the current theory, should it? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 4, 2018 at 5:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Tangentially: I didn't like very much that I added a new index to > pg_index to support this feature. I thought maybe it'd be better to > change the index on indrelid to be on (indrelid,indparentidx) instead, > but that doesn't seem great either because it bloats that index which is > used to support common relcache operations ... > > (The more I think of this, the more I believe that pg_inherits is a > better answer. Opinions?) I actually haven't looked at the code, but the idea that pg_inherits is on the way out is news to me. If that method will work, I don't quite see why we should invent something new. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 5, 2018 at 4:57 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/4/18 23:08, David Rowley wrote: >> On 5 January 2018 at 11:01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >>> (The more I think of this, the more I believe that pg_inherits is a >>> better answer. Opinions?) >> >> I admit to not having had a chance to look at any code with this yet, >> but I'm just thinking about a case like the following. >> >> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a); >> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10) >> PARTITION BY RANGE (b); >> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10); >> >> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on >> part_a1_b1) >> >> CREATE INDEX ON part (a); -- What do we do here? >> >> Should we: >> >> 1. Create another identical index on part_a1_b1; or >> 2. Allow the existing index on part_a1_b1 to have multiple parents; or >> 3. ERROR... (probably not) > > 4. It should adopt part_a1 and its subindexes into its hierarchy. That > shouldn't be a problem under the current theory, should it? +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Fri, Jan 5, 2018 at 4:57 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 1/4/18 23:08, David Rowley wrote: > >> I admit to not having had a chance to look at any code with this yet, > >> but I'm just thinking about a case like the following. > >> > >> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a); > >> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10) > >> PARTITION BY RANGE (b); > >> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10); > >> > >> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on > >> part_a1_b1) > >> > >> CREATE INDEX ON part (a); -- What do we do here? > >> > >> Should we: > >> > >> 1. Create another identical index on part_a1_b1; or > >> 2. Allow the existing index on part_a1_b1 to have multiple parents; or > >> 3. ERROR... (probably not) > > > > 4. It should adopt part_a1 and its subindexes into its hierarchy. That > > shouldn't be a problem under the current theory, should it? > > +1. This is what the code does today. IIRC there's a test for this exact scenario. Now, part_a1_b1 is grandchild of part, not direct parent -- so there exists an intermediate relkind=I index in part_a1, and that is the one that becomes parent of part_a1_b1, not part's directly. Now, if you drop the index in part, then it gets dropped in all descendants too, including part_a1_b1. If you DETACH the partition first, then the index remains. All of that seems good to me, and is as discussed previously. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro, On 01/04/2018 09:30 AM, Alvaro Herrera wrote: > v11 fixes the dependency problem I mentioned in > https://postgr.es/m/20171229203818.pqxf2cyl4g2wre6h@alvherre.pgsql > and adds some test to cover that stuff. > Thanks, make check-world passes. Maybe a warning for existing indexes on the same column(s), but with a different type, should be emitted during ATTACH, e.g. -- test.sql -- CREATE TABLE test (a integer NOT NULL) PARTITION BY HASH(a); CREATE TABLE test_p00 PARTITION OF test FOR VALUES WITH (MODULUS 2, REMAINDER 0); CREATE TABLE test_p01 PARTITION OF test FOR VALUES WITH (MODULUS 2, REMAINDER 1); CREATE INDEX idx_test_a ON test (a); INSERT INTO test (SELECT i FROM generate_series(1, 1000000) AS i); ANALYZE; ALTER TABLE test DETACH PARTITION test_p00; DROP INDEX test_p00_a_idx; CREATE INDEX test_p00_a_idx ON test_p00 USING hash (a); ALTER TABLE test ATTACH PARTITION test_p00 FOR VALUES WITH (MODULUS 2, REMAINDER 0); -- test.sql -- Of course, this could also fall under index maintenance and no warning emitted. Docs have "Each partition is first checked to determine whether an equivalent index already exists," so it is covered. Best regards, Jesper
Jesper Pedersen wrote: > Maybe a warning for existing indexes on the same column(s), but with a > different type, should be emitted during ATTACH, e.g. > [detach one partition, replace index with a different one, attach > partition] > Of course, this could also fall under index maintenance and no warning > emitted. Docs have "Each partition is first checked to determine whether an > equivalent index already exists," so it is covered. Yeah, I'm of two minds about this also -- in the initial versions I had a code comment wondering exactly about having a hash index in a partition attached to a btree index on parent. As another example, having a documents table with two partitions (one "long term archival" and other "documents currently being messed with") you could have a text search index which is GIN in the former and GiST in the latter. There is a performance argument for doing it that way, so it's not merely academic. Anyway, while I think attaching an index that doesn't match the properties of the index on parent can be a useful feature, on the other hand it could be surprising (you end up losing an index because it was matched during attach that you didn't think was going to be matched). One idea would be to have a way to specify at ATTACH time what to do about indexes when they don't match exactly, but I think the user interface would be pretty messy: exactly how different do you want to allow the indexes to be? Is an index having one more column than the one in parent good enough? I think the answer to this is mostly a judgement call, and I'd rather not spend my time debating those. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro, On 01/08/2018 03:36 PM, Alvaro Herrera wrote: > Jesper Pedersen wrote: > >> Maybe a warning for existing indexes on the same column(s), but with a >> different type, should be emitted during ATTACH, e.g. > >> [detach one partition, replace index with a different one, attach >> partition] > >> Of course, this could also fall under index maintenance and no warning >> emitted. Docs have "Each partition is first checked to determine whether an >> equivalent index already exists," so it is covered. > > Yeah, I'm of two minds about this also -- in the initial versions I had > a code comment wondering exactly about having a hash index in a > partition attached to a btree index on parent. > > As another example, having a documents table with two partitions (one > "long term archival" and other "documents currently being messed with") > you could have a text search index which is GIN in the former and GiST > in the latter. There is a performance argument for doing it that way, > so it's not merely academic. > > Anyway, while I think attaching an index that doesn't match the > properties of the index on parent can be a useful feature, on the other > hand it could be surprising (you end up losing an index because it was > matched during attach that you didn't think was going to be matched). > One idea would be to have a way to specify at ATTACH time what to do > about indexes when they don't match exactly, but I think the user > interface would be pretty messy: exactly how different do you want to > allow the indexes to be? Is an index having one more column than the > one in parent good enough? I think the answer to this is mostly a > judgement call, and I'd rather not spend my time debating those. > Yeah, agreed - lets leave as is. Migrating an index to another type would mean to drop the top-level definition anyway. Best regards, Jesper
Robert Haas wrote: > On Thu, Jan 4, 2018 at 5:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Tangentially: I didn't like very much that I added a new index to > > pg_index to support this feature. I thought maybe it'd be better to > > change the index on indrelid to be on (indrelid,indparentidx) instead, > > but that doesn't seem great either because it bloats that index which is > > used to support common relcache operations ... > > > > (The more I think of this, the more I believe that pg_inherits is a > > better answer. Opinions?) > > I actually haven't looked at the code, but the idea that pg_inherits > is on the way out is news to me. If that method will work, I don't > quite see why we should invent something new. I removed the pg_index.indparentidx column that previous versions add, and replaced it with pg_inherits rows. This makes the code a little bit bulkier in a couple of places, but nothing terrible. As a benefit, there's no extra index in pg_index now. I fixed some outdated comments, too. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 9 January 2018 at 09:54, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I removed the pg_index.indparentidx column that previous versions add, > and replaced it with pg_inherits rows. This makes the code a little bit > bulkier in a couple of places, but nothing terrible. As a benefit, > there's no extra index in pg_index now. Hi Álvaro, I've made a pass over this patch. It's mostly in pretty good shape, but I did find a few things to note down. 1. "either partition" -> "either the partition" + so the partition index is dropped together with either partition it indexes, + or with the parent index it is attached to. 2. In findDependentObjects(), should the following if test be below the ReleaseDeletionLock call? + if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO) + break; ReleaseDeletionLock(object); 3. The following existing comment indicates that we'll be creating a disk file for the index. Should we update this comment to say that this is the case only for RELKIND_INDEX? /* * create the index relation's relcache entry and physical disk file. (If * we fail further down, it's the smgr's responsibility to remove the disk * file again.) */ Maybe: /* * create the index relation's relcache entry and for non-partitioned * indexes, a physical disk file. (If we fail further down, it's the * smgr's responsibility to remove the disk file again.) */ 4. Extra newline + recordDependencyOn(&myself, &referenced, DEPENDENCY_INTERNAL_AUTO); + } + + 5. Do you think it's out of scope to support expression indexes? /* * Expression indexes are currently not considered equal. Not needed for * current callers. */ if (info1->ii_Expressions != NIL || info2->ii_Expressions != NIL) return false; 6. pg_inherits.inhseqno is int, not smallint. I understand you copied this from StoreCatalogInheritance1(), so maybe a fix should be put in before this patch is? void StoreSingleInheritance(Oid relationId, Oid parentOid, int16 seqNumber) and values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber); 7. Is the following a bug fix? If so, should it not be fixed and backpatched, rather than sneaked in here? @@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid newOwnerId, bool recursing, LOCKMODE lock * relation, as well as its toast table (if it has one). */ if (tuple_class->relkind == RELKIND_RELATION || + tuple_class->relkind == RELKIND_PARTITIONED_TABLE || 8. Missing if (attachInfos) pfree(attachInfos); + if (idxes) + pfree(idxes); + if (attachRelIdxs) + pfree(attachRelIdxs); 9. The first OR branch of the Assert will always be false, so might as well get rid of it. + if (!has_superclass(idxid)) + continue; + + Assert(!has_superclass(idxid) || + (IndexGetRelation(get_partition_parent(idxid), false) == + RelationGetRelid(rel))); 10. lock -> locks: /* we already hold lock on both tables, so this is safe: */ 11. "already the right" -> "already in the right" /* Silently do nothing if already the right state */ 12. It seems to be RangeVarGetRelidExtended() that locks the partition index, not the callback. /* no deadlock risk: our callback above already acquired the lock */ 13. We seem to only be holding an AccessShareLock on the partitioned table. What happens if someone concurrently does ALTER TABLE partitioned_table DETACH our_partition;? parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock); and; /* Make sure it indexes a partition of the other index's table */ partDesc = RelationGetPartitionDesc(parentTbl); found = false; for (i = 0; i < partDesc->nparts; i++) { if (partDesc->oids[i] == state.partitionOid) { found = true; break; } } Wouldn't you also need an AccessExclusiveLock to prevent another session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index; at the same time? The concurrent session could be trying to ATTACH it to a subpartition and this comment could be attaching to the parent. 14. validatePartitionedIndex does not verify that the index it is checking is valid. Consider the following: create table part (a int, b int) partition by list(a); create table part1 partition of part for values in(1) partition by list(b) create table part1 partition of part for values in(1) partition by list(b); create table part2 partition of part1 for values in (1); create index on only part1 (a); create index on only part (a); alter index part_a_idx attach partition part1_a_idx; \d part Table "public.part" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | b | integer | | | Partition key: LIST (a) Indexes: "part_a_idx" btree (a) <--- should be INVALID Number of partitions: 1 (Use \d+ to list them.) \d part1 Table "public.part1" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | b | integer | | | Partition of: part FOR VALUES IN (1) Partition key: LIST (b) Indexes: "part1_a_idx" btree (a) INVALID Number of partitions: 1 (Use \d+ to list them.) But that probably means you also need more code to search up to parent partitions and try and validate any invalid indexes once a sub-partition index is made valid. As, if the part_a_idx had correctly been marked as INVALID, and then we do: CREATE INDEX part2_a_idx ON part2 (a); ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx; This would validate part1_a_idx, but we'd also then need to attempt validation on part_a_idx. (I'm still reading, so perhaps you've already thought of that.) 15. Can you explain why you do the following for CREATE INDEX, but not for CREATE INDEX IF NOT EXISTS? @@ -7338,6 +7363,7 @@ IndexStmt: CREATE opt_unique INDEX opt_concurrently opt_index_name n->concurrent = $4; n->idxname = $5; n->relation = $7; + n->relationId = InvalidOid; It's maybe not required anyway since makeNode() will memset zero the memory, but I think it's best to do it unless I've misunderstood what it's for. 16. It's not exactly a problem, but I had a quick look and I don't see any other uses of list_free() checking for a non-NIL list first: + if (inheritors) + list_free(inheritors); Probably might as well just remove the if test. 17. Just a stylistic thing; I think it would be neater to set isindex using the existing switch(). It would just need some case shuffling. @@ -439,6 +440,7 @@ RelationParseRelOptions(Relation relation, HeapTuple tuple) case RELKIND_RELATION: case RELKIND_TOASTVALUE: case RELKIND_INDEX: + case RELKIND_PARTITIONED_INDEX: case RELKIND_VIEW: case RELKIND_MATVIEW: case RELKIND_PARTITIONED_TABLE: @@ -452,10 +454,12 @@ RelationParseRelOptions(Relation relation, HeapTuple tuple) * we might not have any other for pg_class yet (consider executing this * code for pg_class itself) */ + isindex = relation->rd_rel->relkind == RELKIND_INDEX || + relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX; Or maybe it's even better to set relation->rd_amroutine->amoptions into a local variable and use it unconditionally in the call to extractRelOptions(). 18. At most it can't do any harm, but is the following still needed? I originally thought this would have been for pg_index changes. What's changed? -#define CATALOG_VERSION_NO 201712251 +#define CATALOG_VERSION_NO 201712291 I admit to not quite being as thorough in my review with the pg_dump code. The tests seem fine, but I'd like to see a test for #14 once that's fixed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley wrote: > Hi Álvaro, Hi David, Thanks for the review. Attached is a delta patch that fixes most things, except your item 14 below. Before getting into the details of the items you list, in this version I also fixed a couple of issues noted by Jaime Casanova; namely a) ALTER INDEX SET TABLESPACE threw an error for partitioned indexes. Handled the same was as for partitioned tables: silently do nothing. b) Expression indexes not considered at all. (You also commented on this point). Jaime reported it for this case: create index on t_part USING gin (string_to_array(btrim((t)::text, '-'::text), '--'::text)); I dealt with this by adding code to CompareIndexInfo that runs equal() on the expression lists, after applying map_variable_attnos() (to support the case of attaching a table with dropped or reordered columns). As far as I can see, this works correctly for very simple expressions, though I didn't test Jaime's case specifically; I suppose I should add a few more tests. On to your notes: > 1. "either partition" -> "either the partition" > 4. Extra newline > 8. Missing if (attachInfos) pfree(attachInfos); > 10. lock -> locks: > 11. "already the right" -> "already in the right" > 16. It's not exactly a problem, but I had a quick look and I don't see > any other uses of list_free() checking for a non-NIL list first: Fixed. > 2. In findDependentObjects(), should the following if test be below > the ReleaseDeletionLock call? > > + if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO) > + break; > ReleaseDeletionLock(object); No, as far as I understand we should keep that lock if we break out of the loop -- it's the lock on the object being deleted (the index partition in this case). I did break the comment in two, so that now the "First," paragraph appears below the "if () break" test. That seems to make more sense. It looks like this now: /* * 3. Not all the owning objects have been visited, so * transform this deletion request into a delete of this * owning object. * * For INTERNAL_AUTO dependencies, we don't enforce this; * in other words, we don't follow the links back to the * owning object. */ if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO) break; /* * First, release caller's lock on this object and get * deletion lock on the owning object. (We must release * caller's lock to avoid deadlock against a concurrent * deletion of the owning object.) */ ReleaseDeletionLock(object); AcquireDeletionLock(&otherObject, 0); > 3. The following existing comment indicates that we'll be creating a > disk file for the index. Should we update this comment to say that > this is the case only for RELKIND_INDEX? > Maybe: > > /* > * create the index relation's relcache entry and for non-partitioned > * indexes, a physical disk file. (If we fail further down, it's the > * smgr's responsibility to remove the disk file again.) > */ I used this: /* * create the index relation's relcache entry and, if necessary, the * physical disk file. (If we fail further down, it's the smgr's * responsibility to remove the disk file again, if any.) */ > 5. Do you think it's out of scope to support expression indexes? Answered above. > 6. pg_inherits.inhseqno is int, not smallint. I understand you copied > this from StoreCatalogInheritance1(), so maybe a fix should be put in > before this patch is? Hmm, yeah, will fix. > 7. Is the following a bug fix? If so, should it not be fixed and > backpatched, rather than sneaked in here? > > @@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid > newOwnerId, bool recursing, LOCKMODE lock > * relation, as well as its toast table (if it has one). > */ > if (tuple_class->relkind == RELKIND_RELATION || > + tuple_class->relkind == RELKIND_PARTITIONED_TABLE || No, it's not a bug fix -- this block is only concerned with searching index for this relkind, and prior to this patch PARTITIONED_TABLEs do not have indexes, so it's ok not to consider the case. > 9. The first OR branch of the Assert will always be false, so might as > well get rid of it. True. Removed. > 12. It seems to be RangeVarGetRelidExtended() that locks the partition > index, not the callback. > > /* no deadlock risk: our callback above already acquired the lock */ Hmm ... yeah, you're right. Fixed. > 13. We seem to only be holding an AccessShareLock on the partitioned > table. What happens if someone concurrently does ALTER TABLE > partitioned_table DETACH our_partition;? > > parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock); > > and; > /* Make sure it indexes a partition of the other index's table */ > partDesc = RelationGetPartitionDesc(parentTbl); > found = false; > for (i = 0; i < partDesc->nparts; i++) > { > if (partDesc->oids[i] == state.partitionOid) > { > found = true; > break; > } > } > > Wouldn't you also need an AccessExclusiveLock to prevent another > session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index; > at the same time? The concurrent session could be trying to ATTACH it > to a subpartition and this comment could be attaching to the parent. But AccessShare does conflict with AccessExclusive. So even with just AccessShare we've already prevented others from detaching/attaching partitions. I think I should spend a bit of time going over the locking considerations again and make sure it is all sound. > 14. validatePartitionedIndex does not verify that the index it is > checking is valid. Consider the following: > > create table part (a int, b int) partition by list(a); > create table part1 partition of part for values in(1) partition by list(b) > create table part1 partition of part for values in(1) partition by list(b); > create table part2 partition of part1 for values in (1); > create index on only part1 (a); > create index on only part (a); > alter index part_a_idx attach partition part1_a_idx; > \d part > Table "public.part" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > a | integer | | | > b | integer | | | > Partition key: LIST (a) > Indexes: > "part_a_idx" btree (a) <--- should be INVALID > Number of partitions: 1 (Use \d+ to list them.) > > \d part1 > Table "public.part1" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > a | integer | | | > b | integer | | | > Partition of: part FOR VALUES IN (1) > Partition key: LIST (b) > Indexes: > "part1_a_idx" btree (a) INVALID > Number of partitions: 1 (Use \d+ to list them.) > > But that probably means you also need more code to search up to parent > partitions and try and validate any invalid indexes once a > sub-partition index is made valid. > > As, if the part_a_idx had correctly been marked as INVALID, and then we do: > > CREATE INDEX part2_a_idx ON part2 (a); > ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx; > > This would validate part1_a_idx, but we'd also then need to attempt > validation on part_a_idx. (I'm still reading, so perhaps you've > already thought of that.) In prior incarnations of the patch, I had an if-test to prevent attaching invalid indexes, but I decided to remove it at some point -- mainly thinking of attaching a partition for which a CREATE INDEX CONCURRENTLY was running which already had the index as invalid and was later expected to become valid. I suppose that doesn't really work anyway because of locking considerations (you can't attach a partition in which CIC is concurrently running, can you). I'll think some more about this case and post an updated version later. > 15. Can you explain why you do the following for CREATE INDEX, but not > for CREATE INDEX IF NOT EXISTS? I cannot -- just an oversight. Fixed. It's true that makeNode() fixes the hole, but we're supposed to be strict about initializing structures. > 17. Just a stylistic thing; I think it would be neater to set isindex > using the existing switch(). It would just need some case shuffling. > Or maybe it's even better to set relation->rd_amroutine->amoptions > into a local variable and use it unconditionally in the call to > extractRelOptions(). Yeah, that last suggestion leads to simpler code, so I did it that way. > 18. At most it can't do any harm, but is the following still needed? I > originally thought this would have been for pg_index changes. What's > changed? > > -#define CATALOG_VERSION_NO 201712251 > +#define CATALOG_VERSION_NO 201712291 Yeah, probably not needed anymore. Will lose it. > The tests seem fine, but I'd like to see a test for #14 once that's fixed. Sure thing. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 11 January 2018 at 04:37, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > In prior incarnations of the patch, I had an if-test to prevent > attaching invalid indexes, but I decided to remove it at some point -- > mainly thinking of attaching a partition for which a CREATE INDEX > CONCURRENTLY was running which already had the index as invalid and was > later expected to become valid. I suppose that doesn't really work > anyway because of locking considerations (you can't attach a partition > in which CIC is concurrently running, can you). I'll think some more > about this case and post an updated version later. I guess CIC will need to check if the index has a parent index when setting indisvalid = true, and do likewise to the parent index if all other siblings are valid. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
The delta patch turned out to have at least one stupid bug, and a few non-stupid bugs. Here's an updated version that should behave correctly, and addresses all reported problems. (One minor issue Jaime found and I hadn't listed is that alter_index and alter_table SGML pages said "ATTACH" instead of "ATTACH PARTITION". Fixed those too for this version.) I'll set this as Ready for Committer now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 1/11/18 13:52, Alvaro Herrera wrote: > The delta patch turned out to have at least one stupid bug, and a few > non-stupid bugs. Here's an updated version that should behave > correctly, and addresses all reported problems. It seems that CompareIndexInfo() still doesn't compare indexes' operator classes and collations. Also, some new test cases for pg_dump would be nice. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12 January 2018 at 07:52, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > The delta patch turned out to have at least one stupid bug, and a few > non-stupid bugs. Here's an updated version that should behave > correctly, and addresses all reported problems. Thanks for updating the patch. I've just made another pass over the patch and have a few more comments. 1. Expression varattnos don't seem to get translated correctly. drop table p; create table p (a int, b int) partition by list (a); create table p1 (b int, a int); alter table p attach partition p1 for values in(1); create index on p (abs(a)); \d p1 Table "public.p1" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- b | integer | | | a | integer | | | Partition of: p FOR VALUES IN (1) Indexes: "p1_abs_idx" btree (abs(b)) CompareIndexInfo() seems to validate existing indexes correctly. Can you also write a test for this? 2. No stats are gathered for the partitioned expression index. drop table p; create table p (a int, b int) partition by range (a); create table p1 (b int, a int); alter table p attach partition p1 for values from (1) to (1001); create index on p1 (abs(a)); create index on p (abs(a)); insert into p select x,x from generate_series(1,1000) x; analyze p, p1; select * from pg_statistic where starelid = 'p_abs_idx'::regclass; select * from pg_statistic where starelid = 'p1_abs_idx'::regclass; 3. deadlocking: I've not yet debugged this to see if this can be avoided. drop table p; create table p (a int not null) partition by list (a); create table p1 partition of p for values in(1); insert into p1 select 1 from generate_Series(1,10000000); -- Session 1: create index concurrently on p1 (a); -- Session 2: create index on p (a); -- Session 1: ERROR: deadlock detected DETAIL: Process 10860 waits for ShareLock on virtual transaction 3/97; blocked by process 7968. Process 7968 waits for ShareLock on relation 90492 of database 12342; blocked by process 10860. HINT: See server log for query details. 4. nparts initialized twice in DefineIndex() int nparts = partdesc->nparts; Oid *part_oids; TupleDesc parentDesc; bool invalidate_parent = false; nparts = partdesc->nparts; 5. In DefineIndex, should the following have a heap_freetuple(newtup); ? newtup = heap_copytuple(tup); ((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false; CatalogTupleUpdate(pg_index, &tup->t_self, newtup); ReleaseSysCache(tup); heap_close(pg_index, RowExclusiveLock); 6. Header comment in ReindexPartitionedIndex() claims it "obtains the list of children and releases the lock on parent before applying reindex on each child.", but it does not do that. It just returns an error. 7. I see you changed StoreSingleInheritance to have the seqnumber as int32 instead of int16, but StoreCatalogInheritance1 still has an int16 seqNumber parameter. Maybe that should be fixed independently and backpatched? 8. ATExecCmd: You've added an Assert() in the DETACH PARTITION case to ensure we're working with a table, but what makes you certain that the "else" case on the ATTACH PARTITION is always going to get a partitioned index? Maybe it would be better to just move the Assert()s into the function being called? 9. Error details claim that p2_a_idx is not a partition of p. Shouldn't it say table "p2" is not a partition of "p"? DROP TABLE p; CREATE TABLE p (a INT NOT NULL, b INT NOT NULL) PARTITION BY LIST (a); CREATE TABLE p1 PARTITION OF p FOR VALUES IN(1) PARTITION BY LIST (b); CREATE TABLE p2 PARTITION OF p1 FOR VALUES IN(1); CREATE INDEX p_a_idx ON ONLY p (a); CREATE INDEX p2_a_idx ON p2 (a); ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx; ERROR: cannot attach index "p2_a_idx" as a partition of index "p_a_idx" DETAIL: Index "p2_a_idx" is not on a partition of table "p". 10. You've added code to get_relation_info() to handle partitioned indexes, but that code is skipped due to: /* * Make list of indexes. Ignore indexes on system catalogs if told to. * Don't bother with indexes for an inheritance parent, either. */ if (inhparent || The new code should either be removed, or you should load the index list for a partitioned table. 11. In ProcessUtilitySlow(), the following if needs to check if we're dealing with a partitioned table: /* Also, lock any descendant tables if recursive */ if (stmt->relation->inh) inheritors = find_all_inheritors(relid, lockmode, NULL); Otherwise we will (perhaps harmlessly) lock children in the following case: DROP TABLE p; CREATE TABLE p (a INT NOT NULL, b INT NOT NULL); CREATE TABLE p1 (a INT NOT NULL, b INT NOT NULL) INHERITS (p); CREATE INDEX ON p (a); > I'll set this as Ready for Committer now. Will set back to waiting on author until the above things are addressed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley wrote: > I've just made another pass over the patch and have a few more comments. Thanks! I'm a bit tied up in a few other things ATM, so an updated version will have to wait a couple of days. > 1. Expression varattnos don't seem to get translated correctly. Yesterday while creating a test case for something suggested by Jaime, I found that the same problem of bogus attno mapping also appears in predicates of partial indexes. My overall conclusion is that I need to examine varattno mapping carefully, because it seems broken in several places. > 2. No stats are gathered for the partitioned expression index. > > drop table p; > create table p (a int, b int) partition by range (a); > create table p1 (b int, a int); > alter table p attach partition p1 for values from (1) to (1001); > create index on p1 (abs(a)); > create index on p (abs(a)); > insert into p select x,x from generate_series(1,1000) x; > analyze p, p1; > > select * from pg_statistic where starelid = 'p_abs_idx'::regclass; > select * from pg_statistic where starelid = 'p1_abs_idx'::regclass; Fun :-( I think this is down to expand_vacuum_rel() not considering a table worth vacuuming. While I agree that this is a problem, I'm disclaiming responsibility for fixing it for the time being. Seems well outside the scope of this patch. > 3. deadlocking: I've not yet debugged this to see if this can be avoided. Yeah, we discussed this before. I think there's not a lot of room for improvement in this particular case, though it's pretty unfortunate. > 4. nparts initialized twice in DefineIndex() > > int nparts = partdesc->nparts; > Oid *part_oids; > TupleDesc parentDesc; > bool invalidate_parent = false; > > nparts = partdesc->nparts; Hah. Having second, third and fourth thoughts on restructuring the block can do that ... seems I just need a fifth thought. > 5. In DefineIndex, should the following have a heap_freetuple(newtup); ? > > newtup = heap_copytuple(tup); > ((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false; > CatalogTupleUpdate(pg_index, &tup->t_self, newtup); > ReleaseSysCache(tup); > heap_close(pg_index, RowExclusiveLock); I can add it, but *shrug*. We're not too stressed about never leaking memory in heavy-handed DDL operations. Consider indexInfo in the same routine, for example: we just leave it for end-of-transaction to clean up. > 6. Header comment in ReindexPartitionedIndex() claims it "obtains the > list of children and releases the lock on parent before applying > reindex on each child.", but it does not do that. It just returns an > error. Yeah, it's a stub for now. We can implement it later. Fixed the comment. > 7. I see you changed StoreSingleInheritance to have the seqnumber as > int32 instead of int16, but StoreCatalogInheritance1 still has an > int16 seqNumber parameter. Maybe that should be fixed independently > and backpatched? Yes. > 8. ATExecCmd: You've added an Assert() in the DETACH PARTITION case to > ensure we're working with a table, but what makes you certain that the > "else" case on the ATTACH PARTITION is always going to get a > partitioned index? > > Maybe it would be better to just move the Assert()s into the function > being called? > 9. Error details claim that p2_a_idx is not a partition of p. > Shouldn't it say table "p2" is not a partition of "p"? You missed the "on" in the DETAIL: DETAIL: Index "p2_a_idx" is not on a partition of table "p". You could argue that this is obscurely worded, but if you look at the command: ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx; nowhere is table p2 mentioned, so I'm not sure it's a great idea to mention the table in the error message. > > 10. You've added code to get_relation_info() to handle partitioned > indexes, but that code is skipped [...] > The new code should either be removed, or you should load the index > list for a partitioned table. That's a leftover from previous versions too. YAGNI principle says I should remove it rather than activate it, I think, since the optimizer is not going to use the data for anything. > 11. In ProcessUtilitySlow(), the following if needs to check if we're > dealing with a partitioned table: Sure thing; fixed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 January 2018 at 03:58, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> 9. Error details claim that p2_a_idx is not a partition of p. >> Shouldn't it say table "p2" is not a partition of "p"? > > You missed the "on" in the DETAIL: > DETAIL: Index "p2_a_idx" is not on a partition of table "p". > You could argue that this is obscurely worded, but if you look at the > command: > ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx; > nowhere is table p2 mentioned, so I'm not sure it's a great idea to > mention the table in the error message. I think I did miss the "on". >> 10. You've added code to get_relation_info() to handle partitioned >> indexes, but that code is skipped [...] >> The new code should either be removed, or you should load the index >> list for a partitioned table. > > That's a leftover from previous versions too. YAGNI principle says I > should remove it rather than activate it, I think, since the optimizer > is not going to use the data for anything. Agreed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 16, 2018 at 6:08 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 17 January 2018 at 03:58, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >>> 9. Error details claim that p2_a_idx is not a partition of p. >>> Shouldn't it say table "p2" is not a partition of "p"? >> >> You missed the "on" in the DETAIL: >> DETAIL: Index "p2_a_idx" is not on a partition of table "p". >> You could argue that this is obscurely worded, but if you look at the >> command: >> ALTER INDEX p_a_idx ATTACH PARTITION p2_a_idx; >> nowhere is table p2 mentioned, so I'm not sure it's a great idea to >> mention the table in the error message. > > I think I did miss the "on". I think you will not be the only person to make that mistake. I think it would be better phrased as DETAIL: "p2_a_idx" is not an index of any partition of table "p" -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Peter Eisentraut wrote: > On 1/11/18 13:52, Alvaro Herrera wrote: > > The delta patch turned out to have at least one stupid bug, and a few > > non-stupid bugs. Here's an updated version that should behave > > correctly, and addresses all reported problems. > > It seems that CompareIndexInfo() still doesn't compare indexes' operator > classes and collations. > > Also, some new test cases for pg_dump would be nice. Fixed CompareIndexInfo to compare collations and opfamilies; also added tests about that. I think I fixed all the items David reported too, including rewording the error message to Robert's suggestion. (One thing remaining is the int16 used in StoreCatalogInheritance1 signature.) pg_dump tests are still missing here, but I think this version is good enough step forward. I'll add a few tests for pg_dump, and see about getting this pushed early tomorrow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Here's a patch to add pg_dump tests. This verifies that the CREATE INDEX on parent and partition appear, as well as ALTER INDEX .. ATTACH PARTITION. While doing this, I noticed a small bug: the ALTER INDEX would not be dumped when only one of the schemas are specified to pg_dump (schema of parent and schema of partition), but it would be if both were specified. This patch fixes that problem too. AFAICS the test is correct after that fix (ie. the "like" and "unlike" sets are correct now.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi Alvaro, On 01/18/2018 04:43 PM, Alvaro Herrera wrote: > Here's a patch to add pg_dump tests. This verifies that the > CREATE INDEX on parent and partition appear, as well as ALTER INDEX .. > ATTACH PARTITION. > > While doing this, I noticed a small bug: the ALTER INDEX would not be > dumped when only one of the schemas are specified to pg_dump (schema of > parent and schema of partition), but it would be if both were specified. > This patch fixes that problem too. AFAICS the test is correct after > that fix (ie. the "like" and "unlike" sets are correct now.) > I get pg_restore: creating INDEX "pg_catalog.pg_authid_oid_index" pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 5493; 1259 2677 INDEX pg_authid_oid_index jpedersen pg_restore: [archiver (db)] could not execute query: ERROR: permission denied: "pg_authid" is a system catalog Command was: -- For binary upgrade, must preserve pg_class oids SELECT pg_catalog.binary_upgrade_set_next_index_pg_class_oid('2677'::pg_catalog.oid); CREATE UNIQUE INDEX "pg_authid_oid_index" ON "pg_authid" USING "btree" ("oid"); during check-world. Best regards, Jesper
Hi Jesper Jesper Pedersen wrote: > I get > > pg_restore: creating INDEX "pg_catalog.pg_authid_oid_index" > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 5493; 1259 2677 INDEX > pg_authid_oid_index jpedersen > pg_restore: [archiver (db)] could not execute query: ERROR: permission > denied: "pg_authid" is a system catalog > Command was: > -- For binary upgrade, must preserve pg_class oids > SELECT pg_catalog.binary_upgrade_set_next_index_pg_class_oid('2677'::pg_catalog.oid); > > CREATE UNIQUE INDEX "pg_authid_oid_index" ON "pg_authid" USING "btree" > ("oid"); > > during check-world. Wow, thanks for reporting. That's freaking weird ... Maybe I need to use a more restrictive condition. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David Rowley wrote: > 10. You've added code to get_relation_info() to handle partitioned > indexes, but that code is skipped due to: > > /* > * Make list of indexes. Ignore indexes on system catalogs if told to. > * Don't bother with indexes for an inheritance parent, either. > */ > if (inhparent || > > The new code should either be removed, or you should load the index > list for a partitioned table. I realized that this wasn't an oversight after all. You can invoke this code, and it fails in current master: CREATE TABLE at_partitioned (col1 INT) PARTITION BY RANGE (col1); CREATE INDEX ON at_partitioned (col1); CREATE TABLE at_regular2 (col2 INT); SELECT col2 FROM at_regular2 fk LEFT OUTER JOIN at_partitioned pk ON (col1 = col2); fails with: ERROR: could not open file "base/16384/16395": No such file or directory where the mentioned path corresponds to the index filenode: alvherre=# select oid::regclass, relkind from pg_class where relfilenode = '16395'; oid │ relkind ─────────────────────────┼───────── at_partitioned_col1_idx │ I (1 fila) The error comes from get_relation_info called for at_partitioned, where the rte->inh flag is false. So this is the correct behavior: we consider the table on its own, not as an inheritance parent, and so the code is not skipped. Backtrace: (gdb) bt #0 errfinish (dummy=0) at /pgsql/source/master/src/backend/utils/error/elog.c:414 #1 0x0000564936f79365 in mdopen (reln=<optimized out>, forknum=forknum@entry=MAIN_FORKNUM, behavior=behavior@entry=1) at /pgsql/source/master/src/backend/storage/smgr/md.c:606 #2 0x0000564936f7983b in mdopen (behavior=1, forknum=MAIN_FORKNUM, reln=0x5649382f8dd0) at /pgsql/source/master/src/backend/storage/smgr/md.c:922 #3 mdnblocks (reln=0x5649382f8dd0, forknum=MAIN_FORKNUM) at /pgsql/source/master/src/backend/storage/smgr/md.c:875 #4 0x0000564936eeb6ea in get_relation_info (root=root@entry=0x5649382ef398, relationObjectId=16385, inhparent=<optimized out>, rel=rel@entry=0x5649382f0198) at /pgsql/source/master/src/backend/optimizer/util/plancat.c:377 #5 0x0000564936eeee55 in build_simple_rel (root=root@entry=0x5649382ef398, relid=2, parent=parent@entry=0x0) at /pgsql/source/master/src/backend/optimizer/util/relnode.c:182 #6 0x0000564936ec6f23 in add_base_rels_to_query ( root=root@entry=0x5649382ef398, jtnode=<optimized out>) at /pgsql/source/master/src/backend/optimizer/plan/initsplan.c:113 -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I think this is the right fix for this problem. I wonder about exploring other callers of RelationGetIndexList to see who else could be confused ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Alvaro Herrera wrote: > I think this is the right fix for this problem. I wonder about > exploring other callers of RelationGetIndexList to see who else could be > confused ... CLUSTER was also affected (and ALTER TABLE .. CLUSTER ON). Patched both and added simple tests. Couldn't detect any other immediate problems. Maybe UNIQUE indexes will have a similar hazard, with replica identity. Case closed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services