Thread: Re: tracking inherited columns (was: patch for check constraints using multiple inheritance)

On Tue, Aug 3, 2010 at 3:05 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> Yeb Havinga wrote:
>> The underlying cause is the failure of the code to recognize that if
>> relation C inherits from both A and B, where A and B both have column x,
>> that A.x 'is the same as' B.x, where the 'is the same as' relation is the
>> same that holds for (A.x, C.x) and (B.x, C.x), which the code does a lot of
>> trouble for to recognize. This means that if some definition is altered on
>> A.x, only C.x is updated and B.x not touched. IMO this is wrong and either a
>> multiple inheritance structure like this should be prohibited, since the
>> user did not explicitly declare that A.x and B.x 'are the same' (by e.g.
>> defining a relation D.x and have A and B inherit from that), or the code
>> should update parents of relations when the childs are updated.
>
> Thinking about this a bit more, the name 'is the same as' is a bit
> confusing, since that relation might not be commutative. C.x 'inherits
> properties from' A.x, or C.x 'is defined by' A.x are perhaps better names,
> that reflect that the converse might not hold. OTOH, what does C.x 'inherits
> (all) properties from' A.x mean? If it means that for all properties P,
> P(C.x) iff P(A.x), then C.x =  A.x commutatively and by similar reasoning
> A.x = B.x.
>
>> ALTER TABLE top1 RENAME COLUMN a_table_column TO another_table_column;
>
> When looking for previous discussions that was referred to upthread, the
> first thing I found was this recent thread about the exactly the same
> problem  http://archives.postgresql.org/pgsql-hackers/2010-01/msg03117.php
>
> Sorry for the double post, however the previous discussion postponed work to
> .. now, so maybe there is some value in first trying to specify exactly what
> 'inherits' means, and derive consequences for code behaviour from that.

Yeah, I was thinking about that thread, too, on my drive home from
Metuchen.  I wouldn't get too bogged down in formal logic; it seems
there are a couple of distinct cases here:

1. If you're changing properties of a column, you need to verify for
each relation in the inheritance tree that the "expected attinhcount"
and the actual attinhcount match.  If, for any relation in the
inheritance tree rooted at the named table, they don't, then they are
doubly inherited there, from some other table outside the hierarchy
rooted at the named table, and the operation must fail.  We'd need
similar logic for constraints, if we had support for renaming or
otherwise modifying them, but right now we don't.

2. If you're adding a column, you need to propagate the new column to
relations that don't have it yet, but if you find one that already has
it than you adjust attinhcount and don't recurse to its chidlren.

3. If you're dropping a column, you essentially decrement the
attinhcount of all your children; then you recurse into any that reach
attincount = 0 and not attislocal and drop the column there as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Robert Haas wrote:
> On Tue, Aug 3, 2010 at 3:05 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>   
>> Yeb Havinga wrote:
>>     
>>> The underlying cause is the failure of the code to recognize that if
>>> relation C inherits from both A and B, where A and B both have column x,
>>> that A.x 'is the same as' B.x, where the 'is the same as' relation is the
>>> same that holds for (A.x, C.x) and (B.x, C.x), which the code does a lot of
>>> trouble for to recognize. This means that if some definition is altered on
>>> A.x, only C.x is updated and B.x not touched. IMO this is wrong and either a
>>> multiple inheritance structure like this should be prohibited, since the
>>> user did not explicitly declare that A.x and B.x 'are the same' (by e.g.
>>> defining a relation D.x and have A and B inherit from that), or the code
>>> should update parents of relations when the childs are updated.
>>>       
>> Thinking about this a bit more, the name 'is the same as' is a bit
>> confusing, since that relation might not be commutative. C.x 'inherits
>> properties from' A.x, or C.x 'is defined by' A.x are perhaps better names,
>> that reflect that the converse might not hold. OTOH, what does C.x 'inherits
>> (all) properties from' A.x mean? If it means that for all properties P,
>> P(C.x) iff P(A.x), then C.x =  A.x commutatively and by similar reasoning
>> A.x = B.x.
>>
>>     
>>> ALTER TABLE top1 RENAME COLUMN a_table_column TO another_table_column;
>>>       
>> When looking for previous discussions that was referred to upthread, the
>> first thing I found was this recent thread about the exactly the same
>> problem  http://archives.postgresql.org/pgsql-hackers/2010-01/msg03117.php
>>
>> Sorry for the double post, however the previous discussion postponed work to
>> .. now, so maybe there is some value in first trying to specify exactly what
>> 'inherits' means, and derive consequences for code behaviour from that.
>>     
>
> Yeah, I was thinking about that thread, too, on my drive home from
> Metuchen.
I just read that thread. In the beginning there is a short discussion 
what the non-astonishing behaviour of the RENAME in the case of multiple 
origin inheritance should be, which is preventing renames or any 
property change in that case. I think we should explore the possibilty 
of allowing the RENAME more.

What if on a RENAME of a column (maybe with a necessary explicit 
CASCADE) in the multiple origin parent case, the parents with the same 
columns are altered too? I don't think it is ashtonishing for users; 
after all they've created the tree in the first place, but mostly for 
programmers with some experience with inheritance in computer languages: 
inheritance should go down, not up. That's why I tried to make reasoning 
exact, to figure out why it would be ok (or not) to update another 
parent as well. The reasoning can be made more formal/exact, but I 
believe in its current form it makes a strong case to technically allow 
to prograpage property changes to other parents as well (if they have 
the same inherited column).
> 1. If you're changing properties of a column, you need to verify for
> each relation in the inheritance tree that the "expected attinhcount"
> and the actual attinhcount match.  If, for any relation in the
> inheritance tree rooted at the named table, they don't, then they are
> doubly inherited there, from some other table outside the hierarchy
> rooted at the named table, and the operation must fail.
>   
If we want to block these RENAMES, yes. This is essentially KaiGai's 
patch http://archives.postgresql.org/pgsql-hackers/2010-01/msg02878.php
> 2. If you're adding a column, you need to propagate the new column to
> relations that don't have it yet, but if you find one that already has
> it than you adjust attinhcount and don't recurse to its chidlren.
>   
Sound ok.
> 3. If you're dropping a column, you essentially decrement the
> attinhcount of all your children; then you recurse into any that reach
> attincount = 0 and not attislocal and drop the column there as well.
>   
This too.

regards,
Yeb Havinga






On Wed, Aug 4, 2010 at 3:48 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> I just read that thread. In the beginning there is a short discussion what
> the non-astonishing behaviour of the RENAME in the case of multiple origin
> inheritance should be, which is preventing renames or any property change in
> that case. I think we should explore the possibilty of allowing the RENAME
> more.

If child inherits column A from parent1 and parent2, and it is then
renamed to B in parent2, what should the name be in the child after
the rename is completed?

For bonus points, how should pg_dump handle this to make sure the
state after a dump and reload matches the state before the dump and
reload?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Robert Haas wrote:
> On Wed, Aug 4, 2010 at 3:48 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>   
>> I just read that thread. In the beginning there is a short discussion what
>> the non-astonishing behaviour of the RENAME in the case of multiple origin
>> inheritance should be, which is preventing renames or any property change in
>> that case. I think we should explore the possibilty of allowing the RENAME
>> more.
>>     
>
> If child inherits column A from parent1 and parent2, and it is then
> renamed to B in parent2, what should the name be in the child after
> the rename is completed?
>   
The column should be renamed to B in parent2, child and parent1.
> For bonus points, how should pg_dump handle this to make sure the
> state after a dump and reload matches the state before the dump and
> reload?
>   
If the change happens in a single transaction there should be no 
problems here, as opposed to e.g. have the user issue two renames. Did I 
get the bonus points? :-)

regards,
Yeb Havinga



On Wed, Aug 4, 2010 at 6:41 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>> If child inherits column A from parent1 and parent2, and it is then
>> renamed to B in parent2, what should the name be in the child after
>> the rename is completed?
>
> The column should be renamed to B in parent2, child and parent1.

Uh, really?  Wow.  You want to follow the inheritance hierarchy in
both directions, both down and up?  That seems like it could be
confusing.

>> For bonus points, how should pg_dump handle this to make sure the
>> state after a dump and reload matches the state before the dump and
>> reload?
>
> If the change happens in a single transaction there should be no problems
> here, as opposed to e.g. have the user issue two renames. Did I get the
> bonus points? :-)

Sure, though I'm not sure I like the basic idea.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Robert Haas wrote:
> On Wed, Aug 4, 2010 at 6:41 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>   
>>> If child inherits column A from parent1 and parent2, and it is then
>>> renamed to B in parent2, what should the name be in the child after
>>> the rename is completed?
>>>       
>> The column should be renamed to B in parent2, child and parent1.
>>     
>
> Uh, really?  Wow.  You want to follow the inheritance hierarchy in
> both directions, both down and up?  That seems like it could be
> confusing.
>   
Yes, the idea is to follow the up direction in the case of column 
changes, only if the column is already present in another parent. To 
avoid confusion we could block the first attempt of a change with a 
message other parents exists, and the hint to add e.g. CASCADE to update 
the definition in the other parent as well.

-- Yeb




On 08/04/2010 06:41 AM, Robert Haas wrote:
> On Wed, Aug 4, 2010 at 6:41 AM, Yeb Havinga<yebhavinga@gmail.com>  wrote:
>>> If child inherits column A from parent1 and parent2, and it is then
>>> renamed to B in parent2, what should the name be in the child after
>>> the rename is completed?
>> The column should be renamed to B in parent2, child and parent1.
> Uh, really?  Wow.  You want to follow the inheritance hierarchy in
> both directions, both down and up?  That seems like it could be
> confusing.
>
>

It seems more than confusing. It seems fundamentally wrong. It would 
certainly be a violation of POLA.

Unless there's an extremely persuasive case made for it I'm inclined 
just to say no.

cheers

andrew


Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/04/2010 06:41 AM, Robert Haas wrote:
>> Uh, really?  Wow.  You want to follow the inheritance hierarchy in
>> both directions, both down and up?  That seems like it could be
>> confusing.

> It seems more than confusing. It seems fundamentally wrong. It would 
> certainly be a violation of POLA.

I agree, this idea seems completely nuts.  It is *not* reasonable for
an action applied to a child to change the definition of the parent.
        regards, tom lane


Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/04/2010 06:41 AM, Robert Haas wrote:
>>> Uh, really?  Wow.  You want to follow the inheritance hierarchy in
>>> both directions, both down and up?  That seems like it could be
>>> confusing.
>
>> It seems more than confusing. It seems fundamentally wrong. It would 
>> certainly be a violation of POLA.
>
> I agree, this idea seems completely nuts.  It is *not* reasonable for
> an action applied to a child to change the definition of the parent.
Also not in the case that we're talking about here?

A.a_column    B.a_column    |       /    v      v   C.a_column

C inherits from A and B.

The user wants to change a_column to better_name.

ALTER TABLE A RENAME COLUMN a_column TO better_name;
ERROR: could not rename column because an inherited child inherits the 
same column from other inheritance parents
HINT: use CASCADE to rename the column in the other parents and their 
childs as well

ALTER TABLE A RENAME COLUMN a_column TO better_name CASCADE;
(succeeds)

This doesn't seem nuts to me. After all, the set of columns with name 
'a_column' is like a domain, in the sense that all names and types of 
all three columns are the same. If the user wants to rename a_column, 
with the current code he gets an inconsistent database. There is a patch 
that prevents renaming in this case, and then the user could work around 
it by adding an artificial relation from which A and B inherit, rename 
a_column there and then remove that relation again. IMHO to allow the 
rename if the user explicitly asks for it is more user friendly, with no 
compromises at all. Since the upward inheritance relation scanning is 
only used to gather the set of a_columns to be updated in the cascade 
case, I do not see why this is nuts, nor why it should violate any 
definition of inheritance. After all: all conditions regarding 
inheritance I can think of are valid *after* the DDL update.

regards,
Yeb Havinga



Yeb Havinga <yebhavinga@gmail.com> writes:
> Tom Lane wrote:
>> I agree, this idea seems completely nuts.  It is *not* reasonable for
>> an action applied to a child to change the definition of the parent.

> Also not in the case that we're talking about here?

> A.a_column    B.a_column
>      |       /
>      v      v
>     C.a_column

> C inherits from A and B.

> The user wants to change a_column to better_name.

Well, if A and B inherited the column from a common ancestor, he can
easily do that.  If not, maybe he should have thought harder before he
started.  I do NOT agree that issuing a rename against C is a sane way
of dealing with this.

> This doesn't seem nuts to me.

You're in the minority.
        regards, tom lane


Tom Lane wrote:
> Yeb Havinga <yebhavinga@gmail.com> writes:
>   
>> A.a_column    B.a_column
>>      |       /
>>      v      v
>>     C.a_column
>>     
>> C inherits from A and B.
>>     
>
> Well, if A and B inherited the column from a common ancestor, he can
> easily do that.  If not, maybe he should have thought harder before he
> started.  I do NOT agree that issuing a rename against C is a sane way
> of dealing with this.
>   

Ok, I understand the intuition behind not wanting this kind of update.

The root cause seems to center around multiple inheritance of the same 
column without a common ancestor. Another way to approach the problem, 
is to prevent the user to create a setup, i.e. when adding a column to B 
that already exists in A, or when adding a inheritance relation A-C or 
B-c, if A and B share column names. He could then get a hint he should 
add a common ancestor with that column. This preemptively prevents 
problems with renames and other changes.

/me ducks

regards,
Yeb Havinga



On Thu, Aug 5, 2010 at 6:15 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> Tom Lane wrote:
>>
>> Yeb Havinga <yebhavinga@gmail.com> writes:
>>
>>>
>>> A.a_column    B.a_column
>>>     |       /
>>>     v      v
>>>    C.a_column
>>>    C inherits from A and B.
>>>
>>
>> Well, if A and B inherited the column from a common ancestor, he can
>> easily do that.  If not, maybe he should have thought harder before he
>> started.  I do NOT agree that issuing a rename against C is a sane way
>> of dealing with this.
>>
>
> Ok, I understand the intuition behind not wanting this kind of update.
>
> The root cause seems to center around multiple inheritance of the same
> column without a common ancestor. Another way to approach the problem, is to
> prevent the user to create a setup, i.e. when adding a column to B that
> already exists in A, or when adding a inheritance relation A-C or B-c, if A
> and B share column names. He could then get a hint he should add a common
> ancestor with that column. This preemptively prevents problems with renames
> and other changes.

It also breaks compatibility with previous releases for no particular
reason.  These cases are all marginal enough that it doesn't really
make sense to change historical behavior; I think we should confine
our efforts to fixing the bugs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Robert Haas <robertmhaas@gmail.com> writes:
> Yeb Havinga <yebhavinga@gmail.com> writes:
>> The root cause seems to center around multiple inheritance of the same
>> column without a common ancestor. Another way to approach the problem, is to
>> prevent the user to create a setup, i.e. when adding a column to B that
>> already exists in A, or when adding a inheritance relation A-C or B-c, if A
>> and B share column names. He could then get a hint he should add a common
>> ancestor with that column. This preemptively prevents problems with renames
>> and other changes.

> It also breaks compatibility with previous releases for no particular
> reason.

Well, if it were only a hint, and thus didn't actually "prevent"
anything, then it wouldn't be breaking compatibility.  But I don't
like the idea much either.  It would be extremely expensive, if not
impossible, to determine whether all parents having the similarly-named
column got it from the same common ancestor.  (In particular, if the
user had previously ignored the hint, you could have situations where
there isn't a unique ancestor that the column can be traced to; then
what do you do?)

I think we'd be putting huge amounts of effort into a case that no more
than one or two people would ever hit.
        regards, tom lane


On Thu, Aug 5, 2010 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, if it were only a hint, and thus didn't actually "prevent"
> anything, then it wouldn't be breaking compatibility.  But I don't
> like the idea much either.  It would be extremely expensive, if not
> impossible, to determine whether all parents having the similarly-named
> column got it from the same common ancestor.  (In particular, if the
> user had previously ignored the hint, you could have situations where
> there isn't a unique ancestor that the column can be traced to; then
> what do you do?)
>
> I think we'd be putting huge amounts of effort into a case that no more
> than one or two people would ever hit.

I don't agree that it would be a huge amount of effort, but I do agree
that only a very small number of people will ever hit it, and that it
just doesn't seem worth it.  We have bigger fish to fry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company