Thread: Unique indexes not unique?

Unique indexes not unique?

From
Jimmy Mäkelä
Date:
I found that Postgres isn't behaving like I thought when using a unique index in
combination with NULL-values...
Is this a bug or specified in the SQL-standard? If its a bug, is it fixed in a
recent version? We are using 7.2.3

This is the results I got:

intranet=# create table foo (a varchar(10), b varchar(10));
CREATE
intranet=# create unique index foo_idx on foo using btree(a, b);
CREATE
intranet=# insert into "foo" (a, b) values ('apa', 'banan');
INSERT 26229704 1
intranet=# insert into "foo" (a, b) values ('apa', 'banan');
ERROR:  Cannot insert a duplicate key into unique index foo_idx
intranet=# insert into "foo" (a, b) values ('apa', null);
INSERT 26229706 1
intranet=# insert into "foo" (a, b) values ('apa', null);
INSERT 26229707 1

And another completely unrelated question... I have got a table with a composite
index on A andBb and an index on A
which I query with something like this:

SELECT * FROM "table"
WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123

Postgres then chooses to use the index for A three times, which is really slow
on my table...
Then I rewrote the query like:

SELECT * FROM "table"
WHERE a = 1 AND b > 1232132 AND b < 123123123213123
UNION SELECT * FROM "table"
WHERE a = 2 AND b > 1232132 AND b < 123123123213123
UNION SELECT * FROM "table"
WHERE a = 3 AND b > 1232132 AND b < 123123123213123

Postgres then behaved better and choosed the composite index in all three cases
resulting in a very large improvement...
Why is this, and has it been improved in more recent versions?

Thanks in advance,
Jimmy Mäkelä

------------------------------------------------
Jimmy Mäkelä
Programmerare
Nybrogatan 55, Box 55708
114 83 Stockholm
Direkt: 08-527 90 457
Mobil: 073-623 05 51
------------------------------------------------
Jag tycker att du borde anlita en agent.
Gå till: www.agent25.se



Re: Unique indexes not unique?

From
Tomasz Myrta
Date:
Jimmy Mäkelä wrote:

> I found that Postgres isn't behaving like I thought when using a 
> unique index in
> combination with NULL-values...
> Is this a bug or specified in the SQL-standard? If its a bug, is it 
> fixed in a
> recent version? We are using 7.2.3
>
> This is the results I got:
>
> intranet=# create table foo (a varchar(10), b varchar(10));
> CREATE
> intranet=# create unique index foo_idx on foo using btree(a, b);
> CREATE
> intranet=# insert into "foo" (a, b) values ('apa', 'banan');
> INSERT 26229704 1
> intranet=# insert into "foo" (a, b) values ('apa', 'banan');
> ERROR:  Cannot insert a duplicate key into unique index foo_idx
> intranet=# insert into "foo" (a, b) values ('apa', null);
> INSERT 26229706 1
> intranet=# insert into "foo" (a, b) values ('apa', null);
> INSERT 26229707 1

I'm not sure unique index works properly for null values. I can't 
explain, why. Maybe it comes from SQL standard - null i a special value 
and can't be compared using default operators to other non null values:
1>null =null
1<null =null
1=null =null

>
>
> And another completely unrelated question... I have got a table with a 
> composite
> index on A andBb and an index on A
> which I query with something like this:
>
> SELECT * FROM "table"
> WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123
>
> Postgres then chooses to use the index for A three times, which is 
> really slow
> on my table...
> Then I rewrote the query like:
>
> SELECT * FROM "table"
> WHERE a = 1 AND b > 1232132 AND b < 123123123213123
> UNION SELECT * FROM "table"
> WHERE a = 2 AND b > 1232132 AND b < 123123123213123
> UNION SELECT * FROM "table"
> WHERE a = 3 AND b > 1232132 AND b < 123123123213123


Try to rewrite your query to show postgres how to use index on AB:
SELECT * FROM "table"
WHERE
(a = 1 AND b > 1232132 AND b < 123123123213123) or
(a = 2 AND b > 1232132 AND b < 123123123213123) or
(a = 3 AND b > 1232132 AND b < 123123123213123);

Regards,
Tomasz Myrta





Re: Unique indexes not unique?

From
Jimmy Mäkelä
Date:
From: Tomasz Myrta [mailto:jasiek@klaster.net]
> I'm not sure unique index works properly for null values. I can't 
> explain, why. Maybe it comes from SQL standard - null i a 
> special value 

Yeah, I thought about that too, but I think that behaviour is really bad and
would consider it a bug. There are good reasons for having a special SQL null,
but
none of these apply to unique indexes (not that I can think of anyway).

> Try to rewrite your query to show postgres how to use index on AB:
> SELECT * FROM "table"
> WHERE
> (a = 1 AND b > 1232132 AND b < 123123123213123) or
> (a = 2 AND b > 1232132 AND b < 123123123213123) or
> (a = 3 AND b > 1232132 AND b < 123123123213123);

Sure, this works, and is an improvement to the UNION-version, but I think
postgres should be able do these substitutions by itself in the
planner/optimizer...

Or is there any method for specifying optimizer hints?

Regards,
Jimmy


Re: Unique indexes not unique?

From
Stephan Szabo
Date:
On Mon, 13 Jan 2003, [iso-8859-1] Jimmy M�kel� wrote:

> I found that Postgres isn't behaving like I thought when using a unique index in
> combination with NULL-values...
> Is this a bug or specified in the SQL-standard? If its a bug, is it fixed in a
> recent version? We are using 7.2.3

AFAIK this is standard.

From the unique predicate (8.9),

If there are no two rows in T such that the value of each column in one
row is non-null and is equal to the value of the corresponding column in
the other row according to Subclause 8.2, "comparison predicate", then
the result of the unique predicate is true; otherwise, the result of
the unique predicate is false.

Unique constraints are defined in terms of the unique predicate.

> And another completely unrelated question... I have got a table with a composite
> index on A andBb and an index on A
> which I query with something like this:
>
> SELECT * FROM "table"
> WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123
>
> Postgres then chooses to use the index for A three times, which is really slow
> on my table...

On my dev (7.4devel) box I see it using the composite index three times,
but you haven't given explain output for the two queries or any statistics
information so that doesn't say much.



Re: Unique indexes not unique?

From
dev@archonet.com
Date:
> Jimmy Mäkelä wrote:
>
>> I found that Postgres isn't behaving like I thought when using a
>> unique index in
>> combination with NULL-values...
>> Is this a bug or specified in the SQL-standard? If its a bug, is it
>> fixed in a
>> recent version? We are using 7.2.3

>> intranet=# insert into "foo" (a, b) values ('apa', null);
>> INSERT 26229706 1
>> intranet=# insert into "foo" (a, b) values ('apa', null);
>> INSERT 26229707 1
>
> I'm not sure unique index works properly for null values. I can't
> explain, why. Maybe it comes from SQL standard - null i a special value
> and can't be compared using default operators to other non null values:
> 1>null =null
> 1<null =null
> 1=null =null

Null is not a value or even a "special" value, it is supposed to represent
the absence of a value. It means either "not applicable" or "not known".

It doesn't make sense to say whether one null is the same as another, a
null is an absence, a hole. As a result, you can't really talk about
comparing two nulls, only testing whether a value is null.

If you are using a null in a situation where it should be unique, you
probably want a value instead. Can't say more without an actual example.

- Richard Huxton


Re: Unique indexes not unique?

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> On Mon, 13 Jan 2003, [iso-8859-1] Jimmy M�kel� wrote:
>> And another completely unrelated question... I have got a table with a composite
>> index on A andBb and an index on A
>> which I query with something like this:
>> 
>> SELECT * FROM "table"
>> WHERE (a = 1 OR a = 2 OR a = 3) AND b > 1232132 AND b < 123123123213123
>> 
>> Postgres then chooses to use the index for A three times, which is really slow
>> on my table...

> On my dev (7.4devel) box I see it using the composite index three times,
> but you haven't given explain output for the two queries or any statistics
> information so that doesn't say much.

[ checks CVS logs... ]  I believe 7.2 should behave the same; the
relevant change predated 7.2:

2001-06-05 13:13  tgl
* src/: backend/optimizer/path/allpaths.c,backend/optimizer/path/indxpath.c,
include/optimizer/paths.h,backend/optimizer/path/orindxpath.c:Improve planning of ORindexscan plans: for quals like
WHERE(a = 1 or a = 2) and b =42 and an index on (a,b), include the clause b = 42 in theindexquals generated for each
armof the OR clause.  Essentiallythis is an index- driven conversion from CNF to DNF. Implementation is a bit klugy,
butbetter than not exploiting theextra quals at all ...
 

There may be a datatype coercion issue: in the example as quoted,
'123123123213123' is a bigint constant.  If b is int then that
comparison wouldn't be considered indexable (and if it's bigint, then
the other comparison against b wouldn't be indexable without adding
a cast).
        regards, tom lane


Re: Unique indexes not unique?

From
Stephan Szabo
Date:
On Mon, 13 Jan 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > On Mon, 13 Jan 2003, [iso-8859-1] Jimmy M�kel� wrote:

> > On my dev (7.4devel) box I see it using the composite index three times,
> > but you haven't given explain output for the two queries or any statistics
> > information so that doesn't say much.
>
> [ checks CVS logs... ]  I believe 7.2 should behave the same; the
> relevant change predated 7.2:
>
> 2001-06-05 13:13  tgl
>
>     * src/: backend/optimizer/path/allpaths.c,
>     backend/optimizer/path/indxpath.c, include/optimizer/paths.h,
>     backend/optimizer/path/orindxpath.c: Improve planning of OR
>     indexscan plans: for quals like    WHERE (a = 1 or a = 2) and b =
>     42 and an index on (a,b), include the clause b = 42 in the
>     indexquals generated for each arm of the OR clause.  Essentially
>     this is an index- driven conversion from CNF to DNF.
>     Implementation is a bit klugy, but better than not exploiting the
>     extra quals at all ...

>
> There may be a datatype coercion issue: in the example as quoted,
> '123123123213123' is a bigint constant.  If b is int then that
> comparison wouldn't be considered indexable (and if it's bigint, then
> the other comparison against b wouldn't be indexable without adding
> a cast).

In his actual query (he sent me explain results which include the query)
he uses ::bigint on both constants.

-- Quoting the explain section from his message --
EXPLAIN SELECT * FROM agentresults WHERE (usr = 'svt' OR usr = 'svt1' OR
usr = 'svt2')
AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND
category != '' AND (flags & 16) > 0 AND title != '<a25uniq>'
ORDER BY modified DESC LIMIT 1000;


returns

Limit  (cost=607870.16..607870.16 rows=94 width=372) ->  Sort  (cost=607870.16..607870.16 rows=95 width=372)       ->
IndexScan using agentresults2_usr, agentresults2_usr, 
agentresults2_usr on agentresults  (cost=0.00..607867.04 rows=95
width=372)

EXPLAIN SELECT * FROM agentresults WHERE (usr = 'svt'
AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND
category != '' AND (flags & 16) > 0 AND title != '<a25uniq>')
OR (usr = 'svt1'
AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND
category != '' AND (flags & 16) > 0 AND title != '<a25uniq>')
OR (usr = 'svt2'
AND modified >= 1042239600::bigint AND modified < 1042498800::bigint AND
category != '' AND (flags & 16) > 0 AND title != '<a25uniq>')
ORDER BY modified DESC LIMIT 1000;

returns

Limit  (cost=22669.68..22669.68 rows=95 width=372) ->  Sort  (cost=22669.68..22669.68 rows=96 width=372)       ->
IndexScan using agentresults2_modified_user, 
agentresults2_modified_user, agentresults2_modified_user on agentresults
(cost=0.00..22666.52 rows=96 width=372)

--




Re: Unique indexes not unique?

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> In his actual query (he sent me explain results which include the query)
> he uses ::bigint on both constants.

Okay, scratch that theory.

> Limit  (cost=22669.68..22669.68 rows=95 width=372)
>   ->  Sort  (cost=22669.68..22669.68 rows=96 width=372)
>         ->  Index Scan using agentresults2_modified_user,
> agentresults2_modified_user, agentresults2_modified_user on agentresults
> (cost=0.00..22666.52 rows=96 width=372)

Should I guess from the index name that it is on (modified, usr) and not
on (usr, modified)?  If so, the problem is that the OR-expansion code
only triggers if it has found an OR-clause that's already usable with
the index --- ie, matches the index's first column.  So this index is
the wrong way 'round for

... WHERE (usr = 'svt' OR usr = 'svt1' OR usr = 'svt2')
AND modified >= 1042239600::bigint AND modified < 1042498800::bigint ...

It would be nice someday for the expansion to work in the other case
too, but I haven't thought of a way to do it that would not waste many
cycles in typical queries where there is no benefit from searching for
OR-clauses.
        regards, tom lane


Re: Unique indexes not unique?

From
Jimmy Mäkelä
Date:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Limit  (cost=22669.68..22669.68 rows=95 width=372)
> >   ->  Sort  (cost=22669.68..22669.68 rows=96 width=372)
> >         ->  Index Scan using agentresults2_modified_user,
> > agentresults2_modified_user, agentresults2_modified_user on
> agentresults
> > (cost=0.00..22666.52 rows=96 width=372)
>
> Should I guess from the index name that it is on (modified,
> usr) and not
> on (usr, modified)?  If so, the problem is that the OR-expansion code

Sorry for the late answer, was out of town for the week.
Yes this is the case, and in my case I wouldn't want to change the order.

Sure it would be nice to support this case too, but not if it implies
penalties for more typical queries. Doing the expansion manually isn't
that hard (but quite ugly).

Thanks for all answers.

Regards,
Jimmy Mäkelä