Thread: [PATCH] Deferrable unique constraints vs join removal -- bug?
Hi list, PostgreSQL 9.0 introduced the "join removal" feature for cases where an left join can't return more than one output row. This feature relies on checking whether a UNIQUE constraint exists on the joined column in the referenced table. However, there was another new feature in 9.0: deferrable unique constraints. It seems that the join removal code currently doesn't check whether the constraint is deferrable or not. Since deferrable unique constraints may contain duplicate rows in the course of a transaction, join removal changes the results returned from a query. This probably doesn't affect many real-world applications, but it seems wrong that a performance feature can affect results returned by a query. Test case: create table uniq (i int unique deferrable initially deferred); begin; insert into uniq values(1),(1); select count(*) from uniq a left join uniq b using (i); count ------- 2 An inner join performs as expected: marti=# select count(*) from uniq a inner join uniq b using (i); count ------- 4 ---- Attached is a patch with an attempt to fix it. I'm not at all sure this is the right approach, but it seems the cheapest way to make sure is walk through *all* rows in pg_constraint for the given table, since there is no index on pg_constraint.conindid. I'm not sure whether the cost of this check outweighs the usefulness of this patch. Catalog changes are a no-no for backported patches, right? Or should I just go ahead and create this index, and not worry about backporting? I'm also adding lots of includes to this file. Maybe unique_index_is_consistent() should be moved to another file instead? While passing by, I also added an unrelated check to check_functional_grouping() for the validity of the constraint. This isn't necessary for now (unique constraints are always valid), but seems useful just in case this changes in the future. Regards, Marti
Attachment
On Wed, Oct 19, 2011 at 7:35 AM, Marti Raudsepp <marti@juffo.org> wrote: > This probably doesn't affect many real-world applications, but it > seems wrong that a performance feature can affect results returned by > a query. > > Test case: > > create table uniq (i int unique deferrable initially deferred); > begin; > insert into uniq values(1),(1); > select count(*) from uniq a left join uniq b using (i); > count > ------- > 2 Yuck. Well, that's certainly a bug. What's weird is that I thought we had put logic into the join removal code to ignore deferrable constraints. Apparently not. I think maybe what we should do is add an "immediate" field to IndexOptInfo, mirroring the existing unique flag, and have get_relation_info() populate it from indimmediate, and then make relation_has_unique_index() disqualify any non-immediate index. has_unique_index() arguably needs a similar fix, although at present that appears to be used for only statistic purposes, so maybe it's OK. A comment update might be a good idea, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Yuck. Well, that's certainly a bug. What's weird is that I thought > we had put logic into the join removal code to ignore deferrable > constraints. Yeah, I thought we had too. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > Yuck. Well, that's certainly a bug. What's weird is that I thought > we had put logic into the join removal code to ignore deferrable > constraints. Apparently not. I poked around a bit more and could not find any evidence that we'd ever done that. Ah well. > I think maybe what we should do is add > an "immediate" field to IndexOptInfo, mirroring the existing unique > flag, and have get_relation_info() populate it from indimmediate, and > then make relation_has_unique_index() disqualify any non-immediate > index. Yeah, this seems like the right fix. I considered redefining the unique flag to mean indisunique && indimmediate, but that's wrong because of: > has_unique_index() arguably needs a similar fix, although at present > that appears to be used for only statistic purposes, so maybe it's OK. Yes, since this is meant for statistical purposes, I think it's appropriate for it to disregard indimmediate. > A comment update might be a good idea, though. Or we could add a parameter to have the caller indicate which behavior is wanted. But for now I think a comment is enough. regards, tom lane
On Sun, Oct 23, 2011 at 06:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think maybe what we should do is add >> an "immediate" field to IndexOptInfo, mirroring the existing unique >> flag, and have get_relation_info() populate it from indimmediate, and >> then make relation_has_unique_index() disqualify any non-immediate >> index. > > Yeah, this seems like the right fix. Oh, that sounds pretty obvious now that you mention it. :) I will try to come up with a new patch in a few days (haven't had too much time lately). Regards, Marti
Marti Raudsepp <marti@juffo.org> writes: > On Sun, Oct 23, 2011 at 06:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, this seems like the right fix. > Oh, that sounds pretty obvious now that you mention it. :) > I will try to come up with a new patch in a few days (haven't had too > much time lately). Oh, I did it already. regards, tom lane
On Sun, Oct 23, 2011 at 21:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I will try to come up with a new patch in a few days (haven't had too >> much time lately). > > Oh, I did it already. Cool. I noticed now that you didn't add a regression test for this fix. Perhaps you could reuse the test from my patch, which also tests for plan invalidation. Regards, Marti