Thread: patch for check constraints using multiple inheritance
Hi, We ran into a problem on 9.0beta3 with check constraints using table inheritance in a multi-level hierarchy with multiple inheritance. A test script is provided below and a proposed patch is attached to this email. Regards, Henk Enting, Yeb Havinga MGRID B.V. http://www.mgrid.net /* First, create a local inheritance structure: level_0_parent level_0_child inherits level_0_parent This structure is the base level. The table definition and also check constraints are defined on this level. Add two levels that inherit this structure: level_1_parent inherits level_0_parent level_1_child inherits level_1_parent, level_0_child level_2_parent inherits level_1_parent level_2_child inherits level_2_parent, level_1_child BTW: there is a reason that we want e.g. level_1_child to inherit from both level_1_parent and level_0_child: we want the data of level_1_child to be visible in both level_0_child and level_1_parent */ DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE level_0_parent (i int); CREATE TABLE level_0_child (a text) INHERITS (level_0_parent); CREATE TABLE level_1_parent() INHERITS (level_0_parent); CREATE TABLE level_1_child() INHERITS (level_0_child, level_1_parent); CREATE TABLE level_2_parent() INHERITS (level_1_parent); CREATE TABLE level_2_child() INHERITS (level_1_child, level_2_parent); -- Now add a check constraint on the top level table: ALTER TABLE level_0_parent ADD CONSTRAINT a_check_constraint CHECK (i IN (0,1)); /* Check the "coninhcount" attribute of pg_constraint Doxygen says this about the parameter: coninhcount: Number of times inherited from direct parent relation(s) On our machine (running 9.0beta3) the query below returns a coninhcount of 3 for the level_2_child table. This doesn't seem correct because the table only has two direct parents. */ SELECT t.oid, t.relname, c.coninhcount FROM pg_class t JOIN pg_constraint c ON (c.conrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' ORDER BY t.oid; -- Next, drop the constraint on the top level table ALTER TABLE level_0_parent DROP CONSTRAINT a_check_constraint; /* The constraint should now be dropped from all the tables in the hierarchy, but the constraint hasn't been dropped on the level_2_child table. It is still there and has a coninhcount of 1. */ SELECT t.oid, t.relname, c.conname, c.coninhcount FROM pg_class t JOIN pg_constraint c ON (c.conrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' ORDER BY t.oid; /* Trying to drop this constraint that shouldn't be there anymore won't work. The "drop constraint" statement below returns: ERROR: cannot drop inherited constraint "a_check_constraint" of relation "level_2_child" NB after fixing this bug, the statement should return "constraint does not exist" */ ALTER TABLE level_2_child DROP CONSTRAINT a_check_constraint;
Attachment
On Thu, Jul 29, 2010 at 6:57 AM, Henk Enting <h.d.enting@mgrid.net> wrote: > We ran into a problem on 9.0beta3 with check constraints using table > inheritance in a multi-level hierarchy with multiple inheritance. > > A test script is provided below and a proposed patch is attached to this > email. Thanks for the report. This bug also appears to exist in 8.4; I'm not sure yet how far back it goes. I'm not so sure your proposed patch is the right fix, though; it seems like it ought to be the job of AddRelationNewConstraints() and MergeWithExistingConstraint() to make sure that the right thing happens, here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jul 29, 2010 at 6:57 AM, Henk Enting <h.d.enting@mgrid.net> wrote: >> We ran into a problem on 9.0beta3 with check constraints using table >> inheritance in a multi-level hierarchy with multiple inheritance. > Thanks for the report. This bug also appears to exist in 8.4; I'm not > sure yet how far back it goes. I'm not so sure your proposed patch is > the right fix, though; it seems like it ought to be the job of > AddRelationNewConstraints() and MergeWithExistingConstraint() to make > sure that the right thing happens, here. The original design idea was that coninhcount/conislocal would act exactly like attinhcount/attislocal do for multiply-inherited columns. Where did we fail to copy that logic? regards, tom lane
On Fri, Jul 30, 2010 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jul 29, 2010 at 6:57 AM, Henk Enting <h.d.enting@mgrid.net> wrote: >>> We ran into a problem on 9.0beta3 with check constraints using table >>> inheritance in a multi-level hierarchy with multiple inheritance. > >> Thanks for the report. This bug also appears to exist in 8.4; I'm not >> sure yet how far back it goes. I'm not so sure your proposed patch is >> the right fix, though; it seems like it ought to be the job of >> AddRelationNewConstraints() and MergeWithExistingConstraint() to make >> sure that the right thing happens, here. > > The original design idea was that coninhcount/conislocal would act > exactly like attinhcount/attislocal do for multiply-inherited columns. > Where did we fail to copy that logic? We didn't. That logic is broken, too. Using the OP's test setup: rhaas=# alter table level_0_parent add column a_new_column integer; NOTICE: merging definition of column "a_new_column" for child "level_1_child" NOTICE: merging definition of column "a_new_column" for child "level_2_child" NOTICE: merging definition of column "a_new_column" for child "level_2_child" ALTER TABLE rhaas=# SELECT t.oid, t.relname, a.attinhcount FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname = 'a_new_column' ORDER BY t.oid; oid | relname | attinhcount -------+----------------+------------- 16420 | level_0_parent | 0 16423 | level_0_child | 1 16429 | level_1_parent | 1 16432 | level_1_child | 2 16438 | level_2_parent | 1 16441 | level_2_child | 3 (6 rows) The attached patch (please review) appears to fix the coninhcount case. I haven't tracked down the attinhcount case yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jul 30, 2010 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The original design idea was that coninhcount/conislocal would act >> exactly like attinhcount/attislocal do for multiply-inherited columns. >> Where did we fail to copy that logic? > We didn't. That logic is broken, too. Uh, full stop there. If you think that the multiply-inherited column logic is wrong, it's you that is mistaken --- or at least you're going to have to do a lot more than just assert that you don't like it. We spent a *lot* of time hashing that behavior out, back around 7.3. regards, tom lane
On Fri, Jul 30, 2010 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jul 30, 2010 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The original design idea was that coninhcount/conislocal would act >>> exactly like attinhcount/attislocal do for multiply-inherited columns. >>> Where did we fail to copy that logic? > >> We didn't. That logic is broken, too. > > Uh, full stop there. If you think that the multiply-inherited column > logic is wrong, it's you that is mistaken --- or at least you're going > to have to do a lot more than just assert that you don't like it. > We spent a *lot* of time hashing that behavior out, back around 7.3. Well, I'm glad you spent a lot of time hashing it out, but you've got a bug. :-) Since the output in the previous email apparently wasn't sufficient for you to understand what the problem is, here it is in more detail. Initial setup: DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE level_0_parent (i int); CREATE TABLE level_0_child (a text) INHERITS (level_0_parent); CREATE TABLE level_1_parent() INHERITS (level_0_parent); CREATE TABLE level_1_child() INHERITS (level_0_child, level_1_parent); CREATE TABLE level_2_parent() INHERITS (level_1_parent); CREATE TABLE level_2_child() INHERITS (level_1_child, level_2_parent); Then: rhaas=# \d level_2_child Table "test_inheritance.level_2_child"Column | Type | Modifiers --------+---------+-----------i | integer |a | text | Inherits: level_1_child, level_2_parent rhaas=# ALTER TABLE level_0_parent ADD COLUMN a_new_column integer; NOTICE: merging definition of column "a_new_column" for child "level_1_child" NOTICE: merging definition of column "a_new_column" for child "level_2_child" NOTICE: merging definition of column "a_new_column" for child "level_2_child" ALTER TABLE rhaas=# ALTER TABLE level_0_parent DROP COLUMN a_new_column; ALTER TABLE rhaas=# \d level_2_child Table "test_inheritance.level_2_child" Column | Type | Modifiers --------------+---------+-----------i | integer |a | text |a_new_column | integer | Inherits: level_1_child, level_2_parent Adding a column to the toplevel parent of the inheritance hierarchy and then dropping it again shouldn't leave a leftover copy of the column in the grandchild. rhaas=# ALTER TABLE level_2_child DROP COLUMN a_new_column; ERROR: cannot drop inherited column "a_new_column" -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jul 30, 2010 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Uh, full stop there. �If you think that the multiply-inherited column >> logic is wrong, it's you that is mistaken --- or at least you're going >> to have to do a lot more than just assert that you don't like it. >> We spent a *lot* of time hashing that behavior out, back around 7.3. > Since the output in the previous email apparently wasn't sufficient > for you to understand what the problem is, here it is in more detail. > ... > Adding a column to the toplevel parent of the inheritance hierarchy > and then dropping it again shouldn't leave a leftover copy of the > column in the grandchild. Actually, it probably should. The inheritance rules were intentionally designed to avoid dropping inherited columns that could conceivably still contain valuable data. There isn't enough information in the inhcount/islocal data structure to recognize that a multiply-inherited column is ultimately derived from only one distant ancestor, but it was agreed that an exact tracking scheme would be more complication than it was worth. Instead, the mechanism is designed to ensure that no column will be dropped if it conceivably is still wanted --- not that columns might not be left behind and require another drop step. *Please* go re-read the old discussions before you propose tampering with this behavior. In particular I really really do not believe that any one-line fix is going to make things better --- almost certainly it will break other cases. Being materially more intelligent would require a more complex data structure. regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Since the output in the previous email apparently wasn't sufficient >> for you to understand what the problem is, here it is in more detail. >> ... >> Adding a column to the toplevel parent of the inheritance hierarchy >> and then dropping it again shouldn't leave a leftover copy of the >> column in the grandchild. > > Actually, it probably should. The inheritance rules were intentionally > designed to avoid dropping inherited columns that could conceivably > still contain valuable data. There isn't enough information in the > inhcount/islocal data structure to recognize that a multiply-inherited > column is ultimately derived from only one distant ancestor, but it was > agreed that an exact tracking scheme would be more complication than it > was worth. Instead, the mechanism is designed to ensure that no column > will be dropped if it conceivably is still wanted --- not that columns > might not be left behind and require another drop step. This is not about dropping columns or not, but about proper maintenance of the housekeeping of coninhcount, defined as "The number of direct inheritance ancestors this constraint has. A constraint with a nonzero number of ancestors cannot be dropped nor renamed". Regard the following lattice (direction from top to bottom): 1 |\ 2 3\|\ 4 5 \| 6 When adding a constraint to 1, the proper coninhcount for that constraint on relation 6 is 2. But the code currently counts to 3, since 6 is reached by paths 1-2-4-5, 1-3-4-6, 1-3-5-6. This wrong number is a bug. > *Please* go re-read the old discussions before you propose tampering > with this behavior. In particular I really really do not believe that > any one-line fix is going to make things better --- almost certainly > it will break other cases. Our (more than one line) patch was explicitly designed to walk from a specific parent to a child exactly once. It passes regression tests. regards, Yeb Havinga
On Fri, Jul 30, 2010 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jul 30, 2010 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Uh, full stop there. If you think that the multiply-inherited column >>> logic is wrong, it's you that is mistaken --- or at least you're going >>> to have to do a lot more than just assert that you don't like it. >>> We spent a *lot* of time hashing that behavior out, back around 7.3. > >> Since the output in the previous email apparently wasn't sufficient >> for you to understand what the problem is, here it is in more detail. >> ... >> Adding a column to the toplevel parent of the inheritance hierarchy >> and then dropping it again shouldn't leave a leftover copy of the >> column in the grandchild. > > Actually, it probably should. The inheritance rules were intentionally > designed to avoid dropping inherited columns that could conceivably > still contain valuable data. There isn't enough information in the > inhcount/islocal data structure to recognize that a multiply-inherited > column is ultimately derived from only one distant ancestor, but it was > agreed that an exact tracking scheme would be more complication than it > was worth. Instead, the mechanism is designed to ensure that no column > will be dropped if it conceivably is still wanted --- not that columns > might not be left behind and require another drop step. While I certainly agree that accidentally dropping a column that should be retained is a lot worse than accidentally retaining a column that should be dropped, that doesn't make the current behavior correct. I don't think that's really an arguable point. Another drop step doesn't help: the leftover column is undroppable short of nuking the whole table. So let's focus on what the actual problem is here, what is required to fix it, and whether or not and how far the fix can be back-patched. There are two separate bugs here, so we should consider them individually. In each case, the problem is that with a certain (admittedly rather strange) inheritance hierarchy, adding a column to the top-level parent results in the inheritance count in the bottom-most child ending up as 3 rather than 2. 2 is the correct value because the bottom-most child has 2 immediate parents. The consequence of ending up with a count of 3 rather than 2 is that if the constraint/column added at the top-level is subsequent dropped, necessarily also from the top-level, the bottom-level child ends up with the constraint/column still in existence and with an inheritance count of 1. This isn't the correct behavior, and furthermore the child object can't be dropped, because it still believes that it's inherited (even though its parents no longer have a copy to inherit). In the case of coninhcount, I believe that the fix actually is quite simple, although of course it's possible that I'm missing something. Currently, ATAddConstraint() first calls AddRelationNewConstraints() to add the constraint, or possibly merge it into an existing, inherited constraint; it then adds the constraint to the phase-3 work queue so that it will be checked; and finally it recurses to its own children. It seems to me that it should only recurse to its children if the constraint was really added, rather than merged into an existing constraint, because if it was merged into an existing constraint, then the children already have it. I'm still trying to figure out why this bug doesn't show up in simpler cases, but I think that's the root of the issue. attinhcount exhibits analogous misbehavior, but if you're saying the fix for that case is going to be a lot more complicated, I think I agree with you. In that case, it seems that we traverse the inheritance hierarchy during the prep phase, at which point we don't yet know whether we're going to end up merging anything. It might be that a fix for this part of the problem ends up being too complex to back-patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Yeb Havinga <yebhavinga@gmail.com> writes: > Regard the following lattice (direction from top to bottom): > 1 > |\ > 2 3 > \|\ > 4 5 > \| > 6 > When adding a constraint to 1, the proper coninhcount for that > constraint on relation 6 is 2. But the code currently counts to 3, since > 6 is reached by paths 1-2-4-5, 1-3-4-6, 1-3-5-6. Mph. I'm not sure that 3 is wrong. You have to consider what happens during a DROP CONSTRAINT, which as far as I saw this patch didn't address. regards, tom lane
On Fri, Jul 30, 2010 at 11:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeb Havinga <yebhavinga@gmail.com> writes: >> Regard the following lattice (direction from top to bottom): > >> 1 >> |\ >> 2 3 >> \|\ >> 4 5 >> \| >> 6 > >> When adding a constraint to 1, the proper coninhcount for that >> constraint on relation 6 is 2. But the code currently counts to 3, since >> 6 is reached by paths 1-2-4-5, 1-3-4-6, 1-3-5-6. > > Mph. I'm not sure that 3 is wrong. You have to consider what happens > during a DROP CONSTRAINT, which as far as I saw this patch didn't > address. The behavior of DROP CONSTRAINT matches the fine documentation. The behavior of ADD CONSTRAINT does not. Perhaps there's a definition of coninhcount that would make 3 the right answer, but (a) I'm not sure what that definition would be and (b) it's certainly not the one in the manual. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, Jul 30, 2010 at 11:35 AM, Robert Haas <robertmhaas@gmail.com> wrote: > In the case of coninhcount, I believe that the fix actually is quite > simple, although of course it's possible that I'm missing something. > Currently, ATAddConstraint() first calls AddRelationNewConstraints() > to add the constraint, or possibly merge it into an existing, > inherited constraint; it then adds the constraint to the phase-3 work > queue so that it will be checked; and finally it recurses to its own > children. It seems to me that it should only recurse to its children > if the constraint was really added, rather than merged into an > existing constraint, because if it was merged into an existing > constraint, then the children already have it. I'm still trying to > figure out why this bug doesn't show up in simpler cases, but I think > that's the root of the issue. OK, it looks like level_2_parent is actually irrelevant to this issue.So here's a slightly simplified test case: DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE top (i int); CREATE TABLE mid1 () INHERITS (top); CREATE TABLE mid2 () INHERITS (top); CREATE TABLE bottom () INHERITS (mid1, mid2); CREATE TABLE basement () INHERITS (bottom); ALTER TABLE top ADD CONSTRAINT a_check_constraint CHECK (i = 0); You can also trigger it this way: DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE top1 (i int); CREATE TABLE top2 (i int); CREATE TABLE bottom () INHERITS (top1, top2); CREATE TABLE basement () INHERITS (bottom); ALTER TABLE top1 ADD CONSTRAINT a_check_constraint CHECK (i = 0); ALTER TABLE top2 ADD CONSTRAINT a_check_constraint CHECK (i = 0); In other words, the problem occurs when table A inherits a constraint from multiple parents, and table B inherits from table A. Most of the code (including ALTER TABLE .. DROP CONSTRAINT and ALTER TABLE .. [NO] INHERIT) maintains the invariant that coninhcount is the number of direct parents from which the constraint is inherited, but the ADD CONSTRAINT fails to do so in this particular case. So at this point, I'm fairly confident that this is the correct fix. I think the reason we haven't noticed this before is that it's most likely to occur when you have a diamond inheritance pattern with another child hanging off the bottom, which is not something most people probably do. It appears that the coninhcount mechanism was introduced in 8.4; prior to that, we didn't really make an effort to get this right. Therefore, I believe the patch I posted upthread should be applied to HEAD, REL9_0_STABLE, and REL8_4_STABLE. This still leaves open the question of what to do about the similar case involving attributes: DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE top (i int); CREATE TABLE mid1 () INHERITS (top); CREATE TABLE mid2 () INHERITS (top); CREATE TABLE bottom () INHERITS (mid1, mid2); CREATE TABLE basement () INHERITS (bottom); ALTER TABLE top ADD COLUMN a_table_column integer; That problem looks significantly more difficult to solve, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, Jul 30, 2010 at 10:22, Robert Haas <robertmhaas@gmail.com> wrote: > OK, it looks like level_2_parent is actually irrelevant to this issue. > So here's a slightly simplified test case: > > DROP SCHEMA IF EXISTS test_inheritance CASCADE; > CREATE SCHEMA test_inheritance; > SET search_path TO test_inheritance; > > CREATE TABLE top (i int); > CREATE TABLE mid1 () INHERITS (top); > CREATE TABLE mid2 () INHERITS (top); > CREATE TABLE bottom () INHERITS (mid1, mid2); > CREATE TABLE basement () INHERITS (bottom); > > ALTER TABLE top ADD CONSTRAINT a_check_constraint CHECK (i = 0); The other problem with the current code is it creates a dump that is not consistent with \d. The dump gets it "right" in the sense that basement does not have the constraint. We could debate what coninhcount should be, but clearly there is a bug here. [ FYI with your patch the dump is, of course, consistent again (no a_check_constraint) ]
Robert Haas wrote: >> It seems to me that it should only recurse to its children >> if the constraint was really added, rather than merged into an >> existing constraint, because if it was merged into an existing >> constraint, then the children already have it. Yes. (then the children already have it -> already have it from the current parent) - now I understand how AddRelationNewConstraints is used, this effectively blocks > 1 times propagation to childs from the same parent, and that is what our patch was written todo too. > OK, it looks like level_2_parent is actually irrelevant to this issue. > So here's a slightly simplified test case: > > CREATE TABLE top (i int); > CREATE TABLE mid1 () INHERITS (top); > CREATE TABLE mid2 () INHERITS (top); > CREATE TABLE bottom () INHERITS (mid1, mid2); > CREATE TABLE basement () INHERITS (bottom); > This is, probably provable, the smallest case where this problem can occur. Removing any table would mean that this bug is not hit anymore. > This still leaves open the question of what to do about the similar > case involving attributes: > > That problem looks significantly more difficult to solve, though. > I'm looking at ATPrepAddColumn right now, where there is actually some comments about getting the right attinhcount in the case of multiple inherited children, but it's the first time I'm looking at this part of PostgreSQL and it needs some time to sink in. It looks a bit like at the place of the comment /* child should see column as singly inherited */, maybe the recursion could be stopped there as well, i.e. not only setting inhcount to 1, but actually adding the child relation one time to the wqueue. regards, Yeb Havinga
Yeb Havinga wrote: > Robert Haas wrote: >> This still leaves open the question of what to do about the similar >> case involving attributes: >> >> That problem looks significantly more difficult to solve, though. >> > I'm looking at ATPrepAddColumn right now, where there is actually some > comments about getting the right attinhcount in the case of multiple > inherited children, but it's the first time I'm looking at this part > of PostgreSQL and it needs some time to sink in. It looks a bit like > at the place of the comment /* child should see column as singly > inherited */, maybe the recursion could be stopped there as well, i.e. > not only setting inhcount to 1, but actually adding the child relation > one time to the wqueue. I believe the crux is to stop double recursion from a parent in ATOneLevelRecursion. I did a quick test by adding a globally defined static List *visitedrels = NIL; together by adding in front of ATOneLevelRecursion this: if (list_member_oid(visitedrels, relid)) return; visitedrels = lappend_oid(visitedrels, relid); Running Roberts example gives the correct attinhcount for the basement. SELECT t.oid, t.relname, a.attinhcount FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname = 'a_table_column' ORDER BY oid; oid | relname | attinhcount -------+----------+-------------16566 | top | 016569 | mid1 | 116572 | mid2 | 116575 | bottom | 216578 | basement | 1 regards, Yeb Havinga PS: forgot to say thanks for looking into this problem earlier in the thread. Thanks!
On Fri, Jul 30, 2010 at 4:38 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> I'm looking at ATPrepAddColumn right now, where there is actually some >> comments about getting the right attinhcount in the case of multiple >> inherited children, but it's the first time I'm looking at this part of >> PostgreSQL and it needs some time to sink in. It looks a bit like at the >> place of the comment /* child should see column as singly inherited */, >> maybe the recursion could be stopped there as well, i.e. not only setting >> inhcount to 1, but actually adding the child relation one time to the >> wqueue. > > I believe the crux is to stop double recursion from a parent in > ATOneLevelRecursion. I did a quick test by adding a globally defined > > static List *visitedrels = NIL; I agree that's the crux of the problem, but I can't see solving it with a global variable. I realize you were just testing... > PS: forgot to say thanks for looking into this problem earlier in the > thread. Thanks! yw -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > I agree that's the crux of the problem, but I can't see solving it > with a global variable. I realize you were just testing... > Yes it was just a test. However, somewhere information must be kept or altered so it can be detected that a relation has already been visited, i.e. it is a multiple inheriting child. The other solutions I could think of are more intrusive (i.e. definitionin ATController and passing as parameter). The attached patch uses the globally defined list. After ATPrepCmd the list pointer is reset to NIL, the list not freed since the allocs are done in a memory context soon to be deleted (PortalHeapMemory). It passes regression as well as the script below. regards, Yeb Havinga DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE top (i int); CREATE TABLE mid1 () INHERITS (top); CREATE TABLE mid2 () INHERITS (top); CREATE TABLE bottom () INHERITS (mid1, mid2); CREATE TABLE basement () INHERITS (bottom); ALTER TABLE top ADD COLUMN a_table_column integer, ADD COLUMN a_table_column2 integer; ALTER TABLE top ADD COLUMN a_table_column3 integer; ALTER TABLE top ADD CONSTRAINT a_check_constraint CHECK (i IN (0,1)), ADD CONSTRAINT a_check_constraint2 CHECK (i IN (0,1)); ALTER TABLE top ADD CONSTRAINT a_check_constraint3 CHECK (i IN (0,1)); SELECT t.oid, t.relname, a.attinhcount FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname LIKE 'a_table_column%' ORDER BY oid; SELECT t.oid, t.relname, c.coninhcount FROM pg_class t JOIN pg_constraint c ON (c.conrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' ORDER BY oid; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 49a6f73..08efffc 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -99,6 +99,10 @@ typedef struct OnCommitItem static List *on_commits = NIL; +/* + * Per subcommand history of relids visited in an inheritance hierarchy. + */ +static List *visited_relids = NIL; /* * State information for ALTER TABLE @@ -2584,6 +2588,7 @@ ATController(Relation rel, List *cmds, bool recurse, LOCKMODE lockmode) AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd); ATPrepCmd(&wqueue, rel, cmd, recurse, false, lockmode); + visited_relids = NIL; } /* Close the relation, but keep lock until commit */ @@ -3618,10 +3623,11 @@ ATSimpleRecursion(List **wqueue, Relation rel, /* * ATOneLevelRecursion * - * Here, we visit only direct inheritance children. It is expected that - * the command's prep routine will recurse again to find indirect children. - * When using this technique, a multiply-inheriting child will be visited - * multiple times. + * Here, we visit only direct inheritance children. It is expected that the + * command's prep routine will recurse again to find indirect children. When + * using this technique, a multiple-inheriting child will be visited multiple + * times. Childs of multiple-inheriting childs however are only visited once + * for each parent. */ static void ATOneLevelRecursion(List **wqueue, Relation rel, @@ -3631,6 +3637,14 @@ ATOneLevelRecursion(List **wqueue, Relation rel, ListCell *child; List *children; + /* If we already visited the current multiple-inheriting relation, we + * mustn't recurse to it's child tables, because they've already been + * visited. Visiting them would lead to an incorrect value for + * attinhcount. */ + if (list_member_oid(visited_relids, relid)) + return; + visited_relids = lappend_oid(visited_relids, relid); + children = find_inheritance_children(relid, lockmode); foreach(child, children) @@ -4891,6 +4905,15 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, CommandCounterIncrement(); /* + * If the constraint got merged with an existing constraint, we're done. + * We mustn't recurse to child tables in this case, because they've already + * got the constraint, and visiting them again would lead to an incorrect + * value for coninhcount. + */ + if (newcons == NIL) + return; + + /* * Propagate to children as appropriate. Unlike most other ALTER * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass.
On Mon, Aug 2, 2010 at 9:20 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: > Robert Haas wrote: >> >> I agree that's the crux of the problem, but I can't see solving it >> with a global variable. I realize you were just testing... > > Yes it was just a test. However, somewhere information must be kept or > altered so it can be detected that a relation has already been visited, i.e. > it is a multiple inheriting child. The other solutions I could think of are > more intrusive (i.e. definitionin ATController and passing as parameter). > > The attached patch uses the globally defined list. After ATPrepCmd the list > pointer is reset to NIL, the list not freed since the allocs are done in a > memory context soon to be deleted (PortalHeapMemory). It passes regression > as well as the script below. I can't speak for any other committer, but personally I'm prepared to reject out of hand any solution involving a global variable. This code is none to easy to follow as it is and adding global variables to the mix is not going to make it better, even if we can convince ourselves that it's safe and correct. This is something of a corner case, so I won't be crushed if a cleaner fix is too invasive to back-patch. Incidentally, we need to shift this discussion to a new subject line; we have a working patch for the original subject of this thread, and are now discussing the related problem I found with attributes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Mon, Aug 2, 2010 at 9:20 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> The attached patch uses the globally defined list. > I can't speak for any other committer, but personally I'm prepared to > reject out of hand any solution involving a global variable. This > code is none to easy to follow as it is and adding global variables to > the mix is not going to make it better, even if we can convince > ourselves that it's safe and correct. This is something of a corner > case, so I won't be crushed if a cleaner fix is too invasive to > back-patch. Hello Robert, Thanks for looking at the patch. I've attached a bit more wordy version, that adds a boolean to AlteredTableInfo and a function to wipe that boolean between processing of subcommands. > Incidentally, we need to shift this discussion to a new > subject line; we have a working patch for the original subject of this > thread, and are now discussing the related problem I found with > attributes. > Both solutions have changes in callers of 'find_inheritance_children'. I was working in the hope a unifiying solution would pop up naturally, but so far it has not. Checking of the new AlteredTableInfo->relVisited boolean could perhaps be pushed down into find_inheritance_children, if multiple-'doing things' for the childs under a multiple-inheriting relation is unwanted for every kind of action. It seems to me that the question whether that is the case must be answered, before the current working patch for coninhcount is 'ready for committer'. regards, Yeb Havinga diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 49a6f73..d43efc3 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -99,7 +99,6 @@ typedef struct OnCommitItem static List *on_commits = NIL; - /* * State information for ALTER TABLE * @@ -132,6 +131,7 @@ typedef struct AlteredTableInfo Oid relid; /* Relation to work on */ char relkind; /* Its relkind */ TupleDesc oldDesc; /* Pre-modification tuple descriptor */ + bool relVisited; /* Was the rel visited before, for the current subcommand */ /* Information saved by Phase 1 for Phase 2: */ List *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */ /* Information saved by Phases 1/2 for Phase 3: */ @@ -335,6 +335,7 @@ static void ATExecDropInherit(Relation rel, RangeVar *parent, LOCKMODE lockmode) static void copy_relation_data(SMgrRelation rel, SMgrRelation dst, ForkNumber forkNum, bool istemp); static const char *storage_name(char c); +static void ATResetRelVisited(List **wqueue); /* ---------------------------------------------------------------- @@ -2583,6 +2584,7 @@ ATController(Relation rel, List *cmds, bool recurse, LOCKMODE lockmode) { AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd); + ATResetRelVisited(&wqueue); ATPrepCmd(&wqueue, rel, cmd, recurse, false, lockmode); } @@ -3503,6 +3505,23 @@ ATGetQueueEntry(List **wqueue, Relation rel) } /* + * ATResetRelVisited: reset the visitation info for each rel in the working + * queue, so it can be used for the next subcommand. + */ +static void +ATResetRelVisited(List **wqueue) +{ + AlteredTableInfo *tab; + ListCell *ltab; + + foreach(ltab, *wqueue) + { + tab = (AlteredTableInfo *) lfirst(ltab); + tab->relVisited = false; + } +} + +/* * ATSimplePermissions * * - Ensure that it is a relation (or possibly a view) @@ -3618,19 +3637,29 @@ ATSimpleRecursion(List **wqueue, Relation rel, /* * ATOneLevelRecursion * - * Here, we visit only direct inheritance children. It is expected that - * the command's prep routine will recurse again to find indirect children. - * When using this technique, a multiply-inheriting child will be visited - * multiple times. + * Here, we visit only direct inheritance children. It is expected that the + * command's prep routine will recurse again to find indirect children. When + * using this technique, a multiple-inheriting child will be visited multiple + * times. Childs of multiple-inheriting childs however are only visited once + * for each parent. */ static void ATOneLevelRecursion(List **wqueue, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode) { Oid relid = RelationGetRelid(rel); + AlteredTableInfo *tab = ATGetQueueEntry(wqueue, rel); ListCell *child; List *children; + /* If we already visited the current multiple-inheriting relation, we + * mustn't recurse to it's child tables, because they've already been + * visited. Visiting them would lead to an incorrect value for + * attinhcount. */ + if (tab->relVisited) + return; + tab->relVisited = true; + children = find_inheritance_children(relid, lockmode); foreach(child, children) @@ -4891,6 +4920,15 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, CommandCounterIncrement(); /* + * If the constraint got merged with an existing constraint, we're done. + * We mustn't recurse to child tables in this case, because they've already + * got the constraint, and visiting them again would lead to an incorrect + * value for coninhcount. + */ + if (newcons == NIL) + return; + + /* * Propagate to children as appropriate. Unlike most other ALTER * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass.
On Mon, Aug 2, 2010 at 10:47 AM, Yeb Havinga <yebhavinga@gmail.com> wrote: >>> The attached patch uses the globally defined list. >> >> I can't speak for any other committer, but personally I'm prepared to >> reject out of hand any solution involving a global variable. This >> code is none to easy to follow as it is and adding global variables to >> the mix is not going to make it better, even if we can convince >> ourselves that it's safe and correct. This is something of a corner >> case, so I won't be crushed if a cleaner fix is too invasive to >> back-patch. > > Thanks for looking at the patch. I've attached a bit more wordy version, > that adds a boolean to AlteredTableInfo and a function to wipe that boolean > between processing of subcommands. I don't think that this is much cleaner than the global variable solution; you haven't really localized that need to know about the new flag in any meaningful way, the hacks in ATOneLevelRecusion() basically destroy any pretense of that code possibly being reusable for some other caller. However, there's a more serious problem, which is that it doesn't in general fix the bug: try it with the top1/top2/bottom/basement example I posted upthread. If you add the same column to both top1 and top2 and then drop it in both top1 and top2, basement ends up with a leftover copy. The problem is that "only visit each child once" is not the right algorithm; what you need to do is "only visit the descendents of each child if no merge happened at the parent". I believe that the only way to do this correct is to merge the prep stage into the execution stage, as the code for adding constraints already does. At the prep stage, you don't have enough information to determine which relations you'll ultimately need to update, since you don't know where the merges will happen. >> Incidentally, we need to shift this discussion to a new >> subject line; we have a working patch for the original subject of this >> thread, and are now discussing the related problem I found with >> attributes. >> > > Both solutions have changes in callers of 'find_inheritance_children'. I was > working in the hope a unifiying solution would pop up naturally, but so far > it has not. Checking of the new AlteredTableInfo->relVisited boolean could > perhaps be pushed down into find_inheritance_children, if multiple-'doing > things' for the childs under a multiple-inheriting relation is unwanted for > every kind of action. It seems to me that the question whether that is the > case must be answered, before the current working patch for coninhcount is > 'ready for committer'. I am of the opinion that the chances of a unifying solution popping up are pretty much zero. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > I don't think that this is much cleaner than the global variable > solution; you haven't really localized that need to know about the new > flag in any meaningful way, the hacks in ATOneLevelRecusion() > basically destroy any pretense of that code possibly being reusable > for some other caller. However, there's a more serious problem, which > is that it doesn't in general fix the bug: try it with the > top1/top2/bottom/basement example I posted upthread. If you add the > same column to both top1 and top2 and then drop it in both top1 and > top2, basement ends up with a leftover copy. The problem is that > "only visit each child once" is not the right algorithm; what you need > to do is "only visit the descendents of each child if no merge > happened at the parent". I believe that the only way to do this > correct is to merge the prep stage into the execution stage, as the > code for adding constraints already does. At the prep stage, you > don't have enough information to determine which relations you'll > ultimately need to update, since you don't know where the merges will > happen. > Hello Robert, Again thanks for looking at the patch. Unfortunately I missed the top1/top2 example earlier, but now I've seen it I understand that it is impossible to fix this problem during the prep stage, without looking at actual existing columns, i.e. without the merge code. Running the top1/top2 example I'd also have expected an error while adding the column to the second top, since the columns added to top1 and top2 are from a different origin. It is apparently considered good behaviour, however troubles emerge when e.g. trying to rename a_table_column in the top1/top2 example, where that is no problem in the 'lollipop' structure, that has a single origin. ALTER TABLE top RENAME COLUMN a_table_column TO another_table_column; SELECT t.oid, t.relname, a.attname, a.attinhcount FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname LIKE '%table_column%' ORDER BY oid; I do not completely understand what you mean with the destruction of reusability of ATOneLevelRecursion, currently only called by ATPrepAddColumn - in the patch it is documented in the definition of relVisited that is it visit info for each subcommand. The loop over subcommands is in ATController, where the value is properly reset for each all current and future subcommands. Hence the ATOneLevelRecursion routing is usable in its current form for all callers during the prep stage, and not only ATPrepAddColumn. > I am of the opinion that the chances of a unifying solution popping up > are pretty much zero. :-) > Me too, now I understand the fixes must be in the execution stages. regards, Yeb Havinga
On Mon, Aug 2, 2010 at 2:56 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > I do not completely understand what you mean with the destruction of > reusability of ATOneLevelRecursion, currently only called by ATPrepAddColumn > - in the patch it is documented in the definition of relVisited that is it > visit info for each subcommand. The loop over subcommands is in > ATController, where the value is properly reset for each all current and > future subcommands. Hence the ATOneLevelRecursion routing is usable in its > current form for all callers during the prep stage, and not only > ATPrepAddColumn. Well, only if they happen to want the "visit each table only once" behavior, which might not be true for every command type. >> I am of the opinion that the chances of a unifying solution popping up >> are pretty much zero. :-) > > Me too, now I understand the fixes must be in the execution stages. OK. I'll go ahead and commit the patch upthread, so that the constraint bug is fixed, and then we can keep arguing about what do about the column bug, perhaps on a new thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Mon, Aug 2, 2010 at 2:56 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: >> Hence the ATOneLevelRecursion routing is usable in its >> current form for all callers during the prep stage, and not only >> ATPrepAddColumn. > > Well, only if they happen to want the "visit each table only once" > behavior, which might not be true for every command type. It is actually "visit each table only once for each distinct parent". Looking at the command types for ALTER TABLE, I see none where this behaviour would be incorrect. That put aside, the top1/top2 example is interesting in the sense that it reveals other problems besides the wrong attinhcount at the basement. For an example see the script below. The underlying cause is the failure of the code to recognize that if relation C inherits from both A and B, where A and B both have column x, that A.x 'is the same as' B.x, where the 'is the same as' relation is the same that holds for (A.x, C.x) and (B.x, C.x), which the code does a lot of trouble for to recognize. This means that if some definition is altered on A.x, only C.x is updated and B.x not touched. IMO this is wrong and either a multiple inheritance structure like this should be prohibited, since the user did not explicitly declare that A.x and B.x 'are the same' (by e.g. defining a relation D.x and have A and B inherit from that), or the code should update parents of relations when the childs are updated. The difficulty is in exactly specifying the 'is the same' as relation, i.e. under what conditions are columns A.x and B.x allowed to be merged to C.x. In the regression test there's only a small amount of tests, but one of them shows that the 'is the same' as relation does not mean everything is the same, as it shows that default values may differ. regards, Yeb Havinga DROP SCHEMA IF EXISTS test_inheritance CASCADE; CREATE SCHEMA test_inheritance; SET search_path TO test_inheritance; CREATE TABLE top1 (i int); CREATE TABLE top2 (i int); CREATE TABLE bottom () INHERITS (top1, top2); CREATE TABLE basement () INHERITS (bottom); ALTER TABLE top1 ADD COLUMN a_table_column INTEGER; ALTER TABLE top2 ADD COLUMN a_table_column INTEGER; SELECT t.oid, t.relname, a.attinhcount, a.attname FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname LIKE '%table_column%' ORDER BY oid; ALTER TABLE top1 RENAME COLUMN a_table_column TO another_table_column; SELECT t.oid, t.relname, a.attinhcount, a.attname FROM pg_class t JOIN pg_attribute a ON (a.attrelid = t.oid) JOIN pg_namespace n ON (t.relnamespace = n.oid) WHERE n.nspname = 'test_inheritance' AND a.attname LIKE '%table_column%' ORDER BY oid; ALTER TABLE top2 RENAME COLUMN a_table_column TO another_table_column;
Yeb Havinga wrote: > The underlying cause is the failure of the code to recognize that if > relation C inherits from both A and B, where A and B both have column > x, that A.x 'is the same as' B.x, where the 'is the same as' relation > is the same that holds for (A.x, C.x) and (B.x, C.x), which the code > does a lot of trouble for to recognize. This means that if some > definition is altered on A.x, only C.x is updated and B.x not touched. > IMO this is wrong and either a multiple inheritance structure like > this should be prohibited, since the user did not explicitly declare > that A.x and B.x 'are the same' (by e.g. defining a relation D.x and > have A and B inherit from that), or the code should update parents of > relations when the childs are updated. Thinking about this a bit more, the name 'is the same as' is a bit confusing, since that relation might not be commutative. C.x 'inherits properties from' A.x, or C.x 'is defined by' A.x are perhaps better names, that reflect that the converse might not hold. OTOH, what does C.x 'inherits (all) properties from' A.x mean? If it means that for all properties P, P(C.x) iff P(A.x), then C.x = A.x commutatively and by similar reasoning A.x = B.x. > ALTER TABLE top1 RENAME COLUMN a_table_column TO another_table_column; When looking for previous discussions that was referred to upthread, the first thing I found was this recent thread about the exactly the same problem http://archives.postgresql.org/pgsql-hackers/2010-01/msg03117.php Sorry for the double post, however the previous discussion postponed work to .. now, so maybe there is some value in first trying to specify exactly what 'inherits' means, and derive consequences for code behaviour from that. regards, Yeb Havinga