Thread: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17261 Logged by: Marcus Gartner Email address: marcus@cockroachlabs.com PostgreSQL version: 14.0 Operating system: macOS Big Sur 11.6 Description: It is possible to break foreign key referential integrity when the FK columns have different types and updates are cascaded from the parent relation to the child relation. As far as I can tell from the documentation on FKs (https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK) this behavior is not expected. The example below shows how to reproduce the issue. This behavior is present on 14.0 and 13.3. I did not test any other versions. -- To reproduce: CREATE TABLE p (d DECIMAL(10, 2) PRIMARY KEY); CREATE TABLE c (d DECIMAL(10, 0) REFERENCES p(d) ON UPDATE CASCADE); INSERT INTO p VALUES (1.00); INSERT INTO c VALUES (1); -- Update the parent row value to 1.45. UPDATE p SET d = 1.45 WHERE d = 1.00; SELECT * FROM p; -- d -- ------ -- 1.45 -- The FK constraint integrity is not upheld. -- I would expect the update to have failed, because 1 (the -- value of the assignment cast from 1.45 to DECIMAL(10, 0)) -- does not exist in p. SELECT * FROM c; -- d -- --- -- 1
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > It is possible to break foreign key referential integrity when the FK > columns have different types and updates are cascaded from the parent > relation to the child relation. I'm not really sure what you'd consider such an FK relationship to mean. I can't get too excited about it when there doesn't seem to be a well defined semantics for it. regards, tom lane
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
Marcus Gartner
Date:
I agree with you there. So why is an FK relationship between columns of different types even allowed?
On Mon, Nov 1, 2021 at 5:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> It is possible to break foreign key referential integrity when the FK
> columns have different types and updates are cascaded from the parent
> relation to the child relation.
I'm not really sure what you'd consider such an FK relationship to mean.
I can't get too excited about it when there doesn't seem to be a well
defined semantics for it.
regards, tom lane
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
Alvaro Herrera
Date:
On 2021-Nov-01, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: > > It is possible to break foreign key referential integrity when the FK > > columns have different types and updates are cascaded from the parent > > relation to the child relation. > > I'm not really sure what you'd consider such an FK relationship to mean. > I can't get too excited about it when there doesn't seem to be a well > defined semantics for it. Yeah, I was thinking that a possible fix might be to reject the creation of such an FK, but I'm not sure what would be a good test to determine acceptability. It's not as easy as rejecting different typmods, in general: for example, rejecting FKs of varchars because their max lengths are different would be inappropriate. For numeric perhaps we could get away with saying that the referencing column must have a scale that's at least as large as the referenced column. But I wouldn't want to get in the business of having type-specific rules for this, because that seems messy and overcomplicated for little useful gain. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
Alvaro Herrera
Date:
On 2021-Nov-01, Marcus Gartner wrote: > I agree with you there. So why is an FK relationship between columns of > different types even allowed? It is useful in other cases, such as varchar or int4/int8. Please don't top-post. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ Are you not unsure you want to delete Firefox? [Not unsure] [Not not unsure] [Cancel] http://smylers.hates-software.com/2008/01/03/566e45b2.html
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Yeah, I was thinking that a possible fix might be to reject the creation > of such an FK, but I'm not sure what would be a good test to determine > acceptability. It's not as easy as rejecting different typmods, in > general: for example, rejecting FKs of varchars because their max > lengths are different would be inappropriate. Right. I looked in the spec, and noted that they *used* to require the referencing and referenced types to be identical back in SQL92, but SQL99 and later only require The declared type of each referencing column shall be comparable to the declared type of the corresponding referenced column. And in late-model specs, that statement is followed by this gem: There shall not be corresponding constituents of the declared type of a referencing column and the declared type of the corresponding referenced column such that one constituent is datetime with time zone and the other is datetime without time zone. That exception is pretty weird. The SQL committee have apparently noticed that there can be semantic oddities for non-identical types, but they haven't pursued it very far. > For numeric perhaps we could get away with saying that the referencing > column must have a scale that's at least as large as the referenced > column. But I wouldn't want to get in the business of having > type-specific rules for this, because that seems messy and > overcomplicated for little useful gain. It would be *really* messy. For instance, AFAICS there is no problem with int4 vs int8 in either direction. If you store a too-large-for- int4 value in an int8 referenced column and we try to cascade it down to an int4 referencing column, we throw an overflow error and the constraint is preserved. The problem with this numeric case is that we round off the value while storing it into the referencing column --- but, for the specific values given, that results in no change in the referencing value so ri_KeysEqual() decides that there's no need to re-check the constraint. That's not an optimization I care to give up. As you say, the problem could be eliminated by requiring the referencing column to be able to represent all possible referenced values. But enforcing that sort of thing in an extensible type system seems mighty hard, and the value received for the effort would be mighty small. Moreover, as the datetime case shows, even that would not be quite right. A timestamptz referencing column can surely represent all values of plain timestamp ... but that combination is going to bite you on the rear pretty hard, because whether two values appear equal will vary with the timezone setting. On the whole, I'm satisfied with the "if it breaks you get to keep both pieces" approach to this question. The only case that seems unimpeachably OK is both-types-the-same, but the SQL standard requires us to accept more. If a user wants to use different types, though, it's on their head to figure out if that's semantically sane. AFAICT, the only place where our documentation touches on this is 5.4.5. Foreign Keys, which breezily says "Of course, the number and type of the constrained columns need to match the number and type of the referenced columns." So maybe we need to improve that, but I'm not sure what to say instead. In view of these considerations, we surely shouldn't encourage using different types. regards, tom lane
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
"David G. Johnston"
Date:
On Mon, Nov 1, 2021 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The problem with this numeric case is that
> we round off the value while storing it into the referencing column> --- but, for the specific values given, that results in no change in
> the referencing value so ri_KeysEqual() decides that there's no need
> to re-check the constraint. That's not an optimization I care to
> give up.
AFAICT, the only place where our documentation touches on this is
5.4.5. Foreign Keys, which breezily says "Of course, the number and
type of the constrained columns need to match the number and type of
the referenced columns." So maybe we need to improve that, but I'm
not sure what to say instead. In view of these considerations,
we surely shouldn't encourage using different types.
The following comes mostly from reading this thread. This is mostly just a conversation starter and me trying to make sure I understand the problem accurately, regardless of it's acceptability as documentation.
"""
Warning, be careful, or simply avoid, using on update cascade when the data types involved in the foreign key are not identical.
Why?
When using update cascade there is an optimization in place that compares the old and new values for equality, after casting them to the referencing table's data type, and skips performing the update if the value did not change. This is problematic when multiple values in the referenced table's data type map to a single value in the referencing table's data type. e.g., both 1.00 and 1.45 numeric(10,2) are equal to 1 numeric(10,0). Thus the optimization check will conclude that changing the referenced value from 1.00 to 1.45 is a no-op with respect to the referencing row's value of 1, leaving the system in an inconsistent state.
"""
The fundamental issue here seems to be that normal FK usage results in referencing-to-referenced lookups which are one-to-one. But the on update usage results in a referenced-to-referencing lookup which ends up being many-to-one (values, not rows). If that lookup ends up being one-to-one, even if the data types are different, then that difference in data types won't matter. In this example it is many-to-one.
I feel like the optimization being done, and the comparison between old and new being performed, is amenable to change to detect this case and fail the update command. Though I need to get more into the weeds to actually defend that feeling and even offer an approach. But hopefully this helps in the meantime.
Note: I haven't tried to reason out if on delete cascade has an issue here.
David J.
Re: BUG #17261: FK ON UPDATE CASCADE can break referential integrity with columns of different types
From
Marcus Gartner
Date:
On Mon, Nov 1, 2021 at 9:57 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
> The fundamental issue here seems to be that normal FK usage results in
> referencing-to-referenced lookups which are one-to-one. But the on update usage
> results in a referenced-to-referencing lookup which ends up being many-to-one
> (values, not rows). If that lookup ends up being one-to-one, even if the data
> types are different, then that difference in data types won't matter. In this
> example it is many-to-one.
> The fundamental issue here seems to be that normal FK usage results in
> referencing-to-referenced lookups which are one-to-one. But the on update usage
> results in a referenced-to-referencing lookup which ends up being many-to-one
> (values, not rows). If that lookup ends up being one-to-one, even if the data
> types are different, then that difference in data types won't matter. In this
> example it is many-to-one.
I believe the problem can present whenever the assignment cast from the referenced type
to the referencing type is lossy. As examples, NUMERIC(10,2) to INT and the more
esoteric TEXT to "char" both reproduce the issue.
On Mon, Nov 1, 2021 at 9:57 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Nov 1, 2021 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:> The problem with this numeric case is that> we round off the value while storing it into the referencing column
> --- but, for the specific values given, that results in no change in
> the referencing value so ri_KeysEqual() decides that there's no need
> to re-check the constraint. That's not an optimization I care to
> give up.AFAICT, the only place where our documentation touches on this is
5.4.5. Foreign Keys, which breezily says "Of course, the number and
type of the constrained columns need to match the number and type of
the referenced columns." So maybe we need to improve that, but I'm
not sure what to say instead. In view of these considerations,
we surely shouldn't encourage using different types.The following comes mostly from reading this thread. This is mostly just a conversation starter and me trying to make sure I understand the problem accurately, regardless of it's acceptability as documentation."""Warning, be careful, or simply avoid, using on update cascade when the data types involved in the foreign key are not identical.Why?When using update cascade there is an optimization in place that compares the old and new values for equality, after casting them to the referencing table's data type, and skips performing the update if the value did not change. This is problematic when multiple values in the referenced table's data type map to a single value in the referencing table's data type. e.g., both 1.00 and 1.45 numeric(10,2) are equal to 1 numeric(10,0). Thus the optimization check will conclude that changing the referenced value from 1.00 to 1.45 is a no-op with respect to the referencing row's value of 1, leaving the system in an inconsistent state."""The fundamental issue here seems to be that normal FK usage results in referencing-to-referenced lookups which are one-to-one. But the on update usage results in a referenced-to-referencing lookup which ends up being many-to-one (values, not rows). If that lookup ends up being one-to-one, even if the data types are different, then that difference in data types won't matter. In this example it is many-to-one.I feel like the optimization being done, and the comparison between old and new being performed, is amenable to change to detect this case and fail the update command. Though I need to get more into the weeds to actually defend that feeling and even offer an approach. But hopefully this helps in the meantime.Note: I haven't tried to reason out if on delete cascade has an issue here.David J.