Thread: Unique indexes not unique?
I found that Postgres isn't behaving like I thought when using a unique index in combination with NULL-values... Is this a bug or specified in the SQL-standard? If its a bug, is it fixed in a recent version? We are using 7.2.3 This is the results I got: intranet=# create table foo (a varchar(10), b varchar(10)); CREATE intranet=# create unique index foo_idx on foo using btree(a, b); CREATE intranet=# insert into "foo" (a, b) values ('apa', 'banan'); INSERT 26229704 1 intranet=# insert into "foo" (a, b) values ('apa', 'banan'); ERROR: Cannot insert a duplicate key into unique index foo_idx intranet=# insert into "foo" (a, b) values ('apa', null); INSERT 26229706 1 intranet=# insert into "foo" (a, b) values ('apa', null); INSERT 26229707 1 And another completely unrelated question... I have got a table with a composite index on A andBb and an index on A which I query with something like this: SELECT * FROM "table" WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123 Postgres then chooses to use the index for A three times, which is really slow on my table... Then I rewrote the query like: SELECT * FROM "table" WHERE a = 1 AND b > 1232132 AND b < 123123123213123 UNION SELECT * FROM "table" WHERE a = 2 AND b > 1232132 AND b < 123123123213123 UNION SELECT * FROM "table" WHERE a = 3 AND b > 1232132 AND b < 123123123213123 Postgres then behaved better and choosed the composite index in all three cases resulting in a very large improvement... Why is this, and has it been improved in more recent versions? Thanks in advance, Jimmy Mäkelä ------------------------------------------------ Jimmy Mäkelä Programmerare Nybrogatan 55, Box 55708 114 83 Stockholm Direkt: 08-527 90 457 Mobil: 073-623 05 51 ------------------------------------------------ Jag tycker att du borde anlita en agent. Gå till: www.agent25.se
Jimmy Mäkelä wrote: > I found that Postgres isn't behaving like I thought when using a > unique index in > combination with NULL-values... > Is this a bug or specified in the SQL-standard? If its a bug, is it > fixed in a > recent version? We are using 7.2.3 > > This is the results I got: > > intranet=# create table foo (a varchar(10), b varchar(10)); > CREATE > intranet=# create unique index foo_idx on foo using btree(a, b); > CREATE > intranet=# insert into "foo" (a, b) values ('apa', 'banan'); > INSERT 26229704 1 > intranet=# insert into "foo" (a, b) values ('apa', 'banan'); > ERROR: Cannot insert a duplicate key into unique index foo_idx > intranet=# insert into "foo" (a, b) values ('apa', null); > INSERT 26229706 1 > intranet=# insert into "foo" (a, b) values ('apa', null); > INSERT 26229707 1 I'm not sure unique index works properly for null values. I can't explain, why. Maybe it comes from SQL standard - null i a special value and can't be compared using default operators to other non null values: 1>null =null 1<null =null 1=null =null > > > And another completely unrelated question... I have got a table with a > composite > index on A andBb and an index on A > which I query with something like this: > > SELECT * FROM "table" > WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123 > > Postgres then chooses to use the index for A three times, which is > really slow > on my table... > Then I rewrote the query like: > > SELECT * FROM "table" > WHERE a = 1 AND b > 1232132 AND b < 123123123213123 > UNION SELECT * FROM "table" > WHERE a = 2 AND b > 1232132 AND b < 123123123213123 > UNION SELECT * FROM "table" > WHERE a = 3 AND b > 1232132 AND b < 123123123213123 Try to rewrite your query to show postgres how to use index on AB: SELECT * FROM "table" WHERE (a = 1 AND b > 1232132 AND b < 123123123213123) or (a = 2 AND b > 1232132 AND b < 123123123213123) or (a = 3 AND b > 1232132 AND b < 123123123213123); Regards, Tomasz Myrta
From: Tomasz Myrta [mailto:jasiek@klaster.net] > I'm not sure unique index works properly for null values. I can't > explain, why. Maybe it comes from SQL standard - null i a > special value Yeah, I thought about that too, but I think that behaviour is really bad and would consider it a bug. There are good reasons for having a special SQL null, but none of these apply to unique indexes (not that I can think of anyway). > Try to rewrite your query to show postgres how to use index on AB: > SELECT * FROM "table" > WHERE > (a = 1 AND b > 1232132 AND b < 123123123213123) or > (a = 2 AND b > 1232132 AND b < 123123123213123) or > (a = 3 AND b > 1232132 AND b < 123123123213123); Sure, this works, and is an improvement to the UNION-version, but I think postgres should be able do these substitutions by itself in the planner/optimizer... Or is there any method for specifying optimizer hints? Regards, Jimmy
On Mon, 13 Jan 2003, [iso-8859-1] Jimmy M�kel� wrote: > I found that Postgres isn't behaving like I thought when using a unique index in > combination with NULL-values... > Is this a bug or specified in the SQL-standard? If its a bug, is it fixed in a > recent version? We are using 7.2.3 AFAIK this is standard. From the unique predicate (8.9), If there are no two rows in T such that the value of each column in one row is non-null and is equal to the value of the corresponding column in the other row according to Subclause 8.2, "comparison predicate", then the result of the unique predicate is true; otherwise, the result of the unique predicate is false. Unique constraints are defined in terms of the unique predicate. > And another completely unrelated question... I have got a table with a composite > index on A andBb and an index on A > which I query with something like this: > > SELECT * FROM "table" > WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123 > > Postgres then chooses to use the index for A three times, which is really slow > on my table... On my dev (7.4devel) box I see it using the composite index three times, but you haven't given explain output for the two queries or any statistics information so that doesn't say much.
> Jimmy Mäkelä wrote: > >> I found that Postgres isn't behaving like I thought when using a >> unique index in >> combination with NULL-values... >> Is this a bug or specified in the SQL-standard? If its a bug, is it >> fixed in a >> recent version? We are using 7.2.3 >> intranet=# insert into "foo" (a, b) values ('apa', null); >> INSERT 26229706 1 >> intranet=# insert into "foo" (a, b) values ('apa', null); >> INSERT 26229707 1 > > I'm not sure unique index works properly for null values. I can't > explain, why. Maybe it comes from SQL standard - null i a special value > and can't be compared using default operators to other non null values: > 1>null =null > 1<null =null > 1=null =null Null is not a value or even a "special" value, it is supposed to represent the absence of a value. It means either "not applicable" or "not known". It doesn't make sense to say whether one null is the same as another, a null is an absence, a hole. As a result, you can't really talk about comparing two nulls, only testing whether a value is null. If you are using a null in a situation where it should be unique, you probably want a value instead. Can't say more without an actual example. - Richard Huxton
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Mon, 13 Jan 2003, [iso-8859-1] Jimmy M�kel� wrote: >> And another completely unrelated question... I have got a table with a composite >> index on A andBb and an index on A >> which I query with something like this: >> >> SELECT * FROM "table" >> WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123 >> >> Postgres then chooses to use the index for A three times, which is really slow >> on my table... > On my dev (7.4devel) box I see it using the composite index three times, > but you haven't given explain output for the two queries or any statistics > information so that doesn't say much. [ checks CVS logs... ] I believe 7.2 should behave the same; the relevant change predated 7.2: 2001-06-05 13:13 tgl * src/: backend/optimizer/path/allpaths.c,backend/optimizer/path/indxpath.c, include/optimizer/paths.h,backend/optimizer/path/orindxpath.c:Improve planning of ORindexscan plans: for quals like WHERE(a = 1 or a = 2) and b =42 and an index on (a,b), include the clause b = 42 in theindexquals generated for each armof the OR clause. Essentiallythis is an index- driven conversion from CNF to DNF. Implementation is a bit klugy, butbetter than not exploiting theextra quals at all ... There may be a datatype coercion issue: in the example as quoted, '123123123213123' is a bigint constant. If b is int then that comparison wouldn't be considered indexable (and if it's bigint, then the other comparison against b wouldn't be indexable without adding a cast). regards, tom lane
On Mon, 13 Jan 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > On Mon, 13 Jan 2003, [iso-8859-1] Jimmy M�kel� wrote: > > On my dev (7.4devel) box I see it using the composite index three times, > > but you haven't given explain output for the two queries or any statistics > > information so that doesn't say much. > > [ checks CVS logs... ] I believe 7.2 should behave the same; the > relevant change predated 7.2: > > 2001-06-05 13:13 tgl > > * src/: backend/optimizer/path/allpaths.c, > backend/optimizer/path/indxpath.c, include/optimizer/paths.h, > backend/optimizer/path/orindxpath.c: Improve planning of OR > indexscan plans: for quals like WHERE (a = 1 or a = 2) and b = > 42 and an index on (a,b), include the clause b = 42 in the > indexquals generated for each arm of the OR clause. Essentially > this is an index- driven conversion from CNF to DNF. > Implementation is a bit klugy, but better than not exploiting the > extra quals at all ... > > There may be a datatype coercion issue: in the example as quoted, > '123123123213123' is a bigint constant. If b is int then that > comparison wouldn't be considered indexable (and if it's bigint, then > the other comparison against b wouldn't be indexable without adding > a cast). In his actual query (he sent me explain results which include the query) he uses ::bigint on both constants. -- Quoting the explain section from his message -- EXPLAIN SELECT * FROM agentresults WHERE (usr = 'svt' OR usr = 'svt1' OR usr = 'svt2') AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND category != '' AND (flags & 16) > 0 AND title != '<a25uniq>' ORDER BY modified DESC LIMIT 1000; returns Limit (cost=607870.16..607870.16 rows=94 width=372) -> Sort (cost=607870.16..607870.16 rows=95 width=372) -> IndexScan using agentresults2_usr, agentresults2_usr, agentresults2_usr on agentresults (cost=0.00..607867.04 rows=95 width=372) EXPLAIN SELECT * FROM agentresults WHERE (usr = 'svt' AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND category != '' AND (flags & 16) > 0 AND title != '<a25uniq>') OR (usr = 'svt1' AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND category != '' AND (flags & 16) > 0 AND title != '<a25uniq>') OR (usr = 'svt2' AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND category != '' AND (flags & 16) > 0 AND title != '<a25uniq>') ORDER BY modified DESC LIMIT 1000; returns Limit (cost=22669.68..22669.68 rows=95 width=372) -> Sort (cost=22669.68..22669.68 rows=96 width=372) -> IndexScan using agentresults2_modified_user, agentresults2_modified_user, agentresults2_modified_user on agentresults (cost=0.00..22666.52 rows=96 width=372) --
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > In his actual query (he sent me explain results which include the query) > he uses ::bigint on both constants. Okay, scratch that theory. > Limit (cost=22669.68..22669.68 rows=95 width=372) > -> Sort (cost=22669.68..22669.68 rows=96 width=372) > -> Index Scan using agentresults2_modified_user, > agentresults2_modified_user, agentresults2_modified_user on agentresults > (cost=0.00..22666.52 rows=96 width=372) Should I guess from the index name that it is on (modified, usr) and not on (usr, modified)? If so, the problem is that the OR-expansion code only triggers if it has found an OR-clause that's already usable with the index --- ie, matches the index's first column. So this index is the wrong way 'round for ... WHERE (usr = 'svt' OR usr = 'svt1' OR usr = 'svt2') AND modified >= 1042239600::bigint AND modified < 1042498800::bigint ... It would be nice someday for the expansion to work in the other case too, but I haven't thought of a way to do it that would not waste many cycles in typical queries where there is no benefit from searching for OR-clauses. regards, tom lane
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Limit (cost=22669.68..22669.68 rows=95 width=372) > > -> Sort (cost=22669.68..22669.68 rows=96 width=372) > > -> Index Scan using agentresults2_modified_user, > > agentresults2_modified_user, agentresults2_modified_user on > agentresults > > (cost=0.00..22666.52 rows=96 width=372) > > Should I guess from the index name that it is on (modified, > usr) and not > on (usr, modified)? If so, the problem is that the OR-expansion code Sorry for the late answer, was out of town for the week. Yes this is the case, and in my case I wouldn't want to change the order. Sure it would be nice to support this case too, but not if it implies penalties for more typical queries. Doing the expansion manually isn't that hard (but quite ugly). Thanks for all answers. Regards, Jimmy Mäkelä