Thread: uniqueness constraint with NULLs

uniqueness constraint with NULLs

From

Robert Edwards

Date:

29 June 2009, 03:18:11

I have a table with a uniqueness constraint on three columns:

# \d bobtest                         Table "public.bobtest" Column |  Type   |                      Modifiers
--------+---------+------------------------------------------------------ id     | integer | not null default
nextval('bobtest_id_seq'::regclass)a      | integer | b      | integer | c      | integer |
 
Indexes:    "bobtest_id_key" UNIQUE, btree (id)    "bobtest_unique" UNIQUE, btree (a, b, c)

I can insert multiple rows with identical a and b when c is NULL:
...
# insert into bobtest (a, b) values (1, 4);
INSERT 0 1
# insert into bobtest (a, b, c) values (1, 4, NULL);
INSERT 0 1
# select * from bobtest; id | a | b | c
----+---+---+---  1 | 1 | 2 | 1  2 | 1 | 3 | 1  3 | 1 | 4 | 1  4 | 1 | 4 |  5 | 1 | 4 |  6 | 1 | 4 |
(6 rows)

Can anyone suggest a way that I can impose uniqueness on a and b when
c is NULL?

In the real app., c is a date field and I require it to be NULL for
some rows. In these cases, I only want at most one row with identical
a and b, but I can have identical a and b when c is a real date as long
as that date is also unique for a given a and b.

I'm guessing I'm going to need to use a function and that someone will
yell at me for using NULLs to represent real data, but I thought I'd be
brave and ask anyway, in case I am missing some other solution that
doesn't involve the use of triggers etc.

Cheers,

Bob Edwards.

Re: uniqueness constraint with NULLs

From

"A. Kretschmer"

Date:

29 June 2009, 03:31:13

In response to Robert Edwards :
> Can anyone suggest a way that I can impose uniqueness on a and b when
> c is NULL?

Sure, use a functional index:

test=# create table bobtest (a int, b int, c int);
CREATE TABLE
test=*# create unique index idx_bobtest on
bobtest(a,b,coalesce(c::text,'NULL'));
CREATE INDEX
test=*# insert into bobtest (a, b) values (1, 4);
INSERT 0 1
test=*# insert into bobtest (a, b, c) values (1, 4, NULL);
ERROR:  duplicate key value violates unique constraint "idx_bobtest"
test=!#


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: uniqueness constraint with NULLs

From

Robert Edwards

Date:

29 June 2009, 04:08:52

A. Kretschmer wrote:
> In response to Robert Edwards :
>> Can anyone suggest a way that I can impose uniqueness on a and b when
>> c is NULL?
> 
> Sure, use a functional index:
> 
> test=# create table bobtest (a int, b int, c int);
> CREATE TABLE
> test=*# create unique index idx_bobtest on
> bobtest(a,b,coalesce(c::text,'NULL'));
> CREATE INDEX
> test=*# insert into bobtest (a, b) values (1, 4);
> INSERT 0 1
> test=*# insert into bobtest (a, b, c) values (1, 4, NULL);
> ERROR:  duplicate key value violates unique constraint "idx_bobtest"
> test=!#
> 
> 
> Regards, Andreas

Beautiful!

Many thanks,

Bob Edwards.

Re: uniqueness constraint with NULLs

From

Leo Mannhart

Date:

29 June 2009, 04:55:58

On Mon, 2009-06-29 at 17:08 +1000, Robert Edwards wrote:
> A. Kretschmer wrote:
> > In response to Robert Edwards :
> >> Can anyone suggest a way that I can impose uniqueness on a and b when
> >> c is NULL?
> > 
> > Sure, use a functional index:
> > 
> > test=# create table bobtest (a int, b int, c int);
> > CREATE TABLE
> > test=*# create unique index idx_bobtest on
> > bobtest(a,b,coalesce(c::text,'NULL'));
> > CREATE INDEX
> > test=*# insert into bobtest (a, b) values (1, 4);
> > INSERT 0 1
> > test=*# insert into bobtest (a, b, c) values (1, 4, NULL);
> > ERROR:  duplicate key value violates unique constraint "idx_bobtest"
> > test=!#
> > 
Why changing the datatype of coulumn "c"? Will this not probably disable
the use of the index? Why not only:

lem=# create table bobtest (a int, b int, c int);
CREATE TABLE
lem=# create unique index idc_bobtest on bobtest (a, b, coalesce(c, 1));
CREATE INDEX
lem=# insert into bobtest (a, b) values (1, 4);
INSERT 0 1
lem=# insert into bobtest (a, b) values (1, 4);
ERROR:  duplicate key value violates unique constraint "idc_bobtest"
lem=# insert into bobtest (a, b, c) values (1, 4, null);
ERROR:  duplicate key value violates unique constraint "idc_bobtest"
lem=# 

> > 
> > Regards, Andreas
> 
> Beautiful!
> 
> Many thanks,
> 
> Bob Edwards.
>

Re: uniqueness constraint with NULLs

From

"A. Kretschmer"

Date:

29 June 2009, 05:02:39

In response to Robert Edwards :
> A. Kretschmer wrote:
> >In response to Robert Edwards :
> >>Can anyone suggest a way that I can impose uniqueness on a and b when
> >>c is NULL?
> >
> >Sure, use a functional index:
> >
> >test=# create table bobtest (a int, b int, c int);
> >CREATE TABLE
> >test=*# create unique index idx_bobtest on
> >bobtest(a,b,coalesce(c::text,'NULL'));
> >CREATE INDEX
> >test=*# insert into bobtest (a, b) values (1, 4);
> >INSERT 0 1
> >test=*# insert into bobtest (a, b, c) values (1, 4, NULL);
> >ERROR:  duplicate key value violates unique constraint "idx_bobtest"
> >test=!#
> >
> >
> >Regards, Andreas
> 
> Beautiful!

Btw:

if your data doesn't contains 0 or negative, you can use coalesce(c,0)
or coalesce(c,-1). It would be better for searching.

Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: uniqueness constraint with NULLs

From

"A. Kretschmer"

Date:

29 June 2009, 05:21:03

In response to Leo Mannhart :
> On Mon, 2009-06-29 at 17:08 +1000, Robert Edwards wrote:
> > A. Kretschmer wrote:
> > > In response to Robert Edwards :
> > >> Can anyone suggest a way that I can impose uniqueness on a and b when
> > >> c is NULL?
> > > 
> > > Sure, use a functional index:
> > > 
> > > test=# create table bobtest (a int, b int, c int);
> > > CREATE TABLE
> > > test=*# create unique index idx_bobtest on
> > > bobtest(a,b,coalesce(c::text,'NULL'));
> > > CREATE INDEX
> > > test=*# insert into bobtest (a, b) values (1, 4);
> > > INSERT 0 1
> > > test=*# insert into bobtest (a, b, c) values (1, 4, NULL);
> > > ERROR:  duplicate key value violates unique constraint "idx_bobtest"
> > > test=!#
> > > 
> Why changing the datatype of coulumn "c"? Will this not probably disable
> the use of the index? Why not only:

Maybe, depends on the query.

> 
> lem=# create table bobtest (a int, b int, c int);
> CREATE TABLE
> lem=# create unique index idc_bobtest on bobtest (a, b, coalesce(c, 1));

bad idea if 1 is a valid value for this column.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: uniqueness constraint with NULLs

From

Craig Ringer

Date:

29 June 2009, 08:32:39

On Mon, 2009-06-29 at 15:42 +1000, Robert Edwards wrote:

> Can anyone suggest a way that I can impose uniqueness on a and b when
> c is NULL?

One way is to add an additional partial index on (a,b):

CREATE INDEX bobtest_ab_unique ON bobtest(a,b) WHERE (c IS NULL);

... however, if you want to do the same sort of thing for all
permutations (a, null, null), (b, null, null), (c, null, null), (a, b,
null), (a, c, null), (b, c, null), (a, b, c)that'll be a LOT of indexes.

In that case you might be better off just using a trigger function like
(untested but should be about right):

CREATE OR REPLACE FUNCTION bobtest_unique_trigger() RETURNS trigger AS
$$
declare conflicting_id integer;
begin if TG_OP = 'INSERT' or TG_OP = 'UPDATE' then   select into conflicting_id from bobtest   where (NOT new.a IS
DISTINCTFROM a)     and (NOT new.b IS DISTINCT FROM b)     and (NOT new.c IS DISTINCT FROM c);   if found then
raiseexception 'Unique violation in bobest: inserted row

conflicts with row id=%',conflicting_id;   end if; end if;
end;
$$ LANGUAGE 'plpgsql';

... which enforces uniqueness considering nulls.

> In the real app., c is a date field and I require it to be NULL for
> some rows.

Oh. Er, In that case, the partial unique index is your best bet (but 'a'
and 'b' should ne NOT NULL, right).

> in case I am missing some other solution that
> doesn't involve the use of triggers etc.

Sometimes a trigger is the right solution.

-- 
Craig Ringer

Re: uniqueness constraint with NULLs

From

Tom Lane

Date:

29 June 2009, 11:01:35

Craig Ringer <craig@postnewspapers.com.au> writes:
> On Mon, 2009-06-29 at 15:42 +1000, Robert Edwards wrote:
>> Can anyone suggest a way that I can impose uniqueness on a and b when
>> c is NULL?

> One way is to add an additional partial index on (a,b):
> CREATE INDEX bobtest_ab_unique ON bobtest(a,b) WHERE (c IS NULL);

That's the way I'd suggest; unlike the other proposal, it doesn't make
any assumptions about datatypes and it doesn't require there to be a
special non-null value that won't be a real data value.

> ... however, if you want to do the same sort of thing for all
> permutations (a, null, null), (b, null, null), (c, null, null), (a, b,
> null), (a, c, null), (b, c, null), (a, b, c)that'll be a LOT of indexes.

... yeah.  So one answer that definitely requires consideration is
"you have misdesigned your data representation; do not try to use NULL
this way".

> In that case you might be better off just using a trigger function like
> (untested but should be about right):

This trigger has race conditions: it will fail to prevent concurrent
insertion of rows that you would like to have conflict.  I think it
does the wrong thing for the UPDATE case too, though that is fixable.
The race condition isn't.
        regards, tom lane

Re: uniqueness constraint with NULLs

From

Robert Edwards

Date:

29 June 2009, 21:23:39

Thanks for all these great ideas!

Craig Ringer wrote:
> On Mon, 2009-06-29 at 15:42 +1000, Robert Edwards wrote:
> 
>> Can anyone suggest a way that I can impose uniqueness on a and b when
>> c is NULL?
> 
> One way is to add an additional partial index on (a,b):
> 
> CREATE INDEX bobtest_ab_unique ON bobtest(a,b) WHERE (c IS NULL);

Would this be in addition to a unique constraint on (a, b, c) (for the
cases where c is not null)?

> 
> ... however, if you want to do the same sort of thing for all
> permutations (a, null, null), (b, null, null), (c, null, null), (a, b,
> null), (a, c, null), (b, c, null), (a, b, c)that'll be a LOT of indexes.
> 

In the real app. a and b are not null ints and c is a date. The date
indicates if and when a row has expired (there are other columns in the
table). I am trying to avoid having separate columns for the "if" and
the "when" of the expiry.

One alternate would be to use a date way off into the future (such as
the famous 9/9/99 case many COBOL programmers used back in the 60's...)
and to test on expired < now ().

Another option is to use a separate shadow table for the expired rows
and to use a trigger function to "move" expired rows to that shadow
table. Then need to use UNION etc. when I need to search across both
current and expired rows.

> In that case you might be better off just using a trigger function like
> (untested but should be about right):
> 
> CREATE OR REPLACE FUNCTION bobtest_unique_trigger() RETURNS trigger AS
> $$
> declare
>   conflicting_id integer;
> begin
>   if TG_OP = 'INSERT' or TG_OP = 'UPDATE' then
>     select into conflicting_id from bobtest
>     where (NOT new.a IS DISTINCT FROM a)
>       and (NOT new.b IS DISTINCT FROM b)
>       and (NOT new.c IS DISTINCT FROM c);
>     if found then
>       raise exception 'Unique violation in bobest: inserted row
> conflicts with row id=%',conflicting_id;
>     end if;
>   end if;
> end;
> $$ LANGUAGE 'plpgsql';
> 
> ... which enforces uniqueness considering nulls.

I am "guessing" that the "functional index" that Andreas Kretschmer
proposed would be a lot "lighter-weight" than a full trigger. This
table will get quite a bit of insert activity and some update activity
on the "c" (expired) column, so this uniqueness index will get
exercised quite a lot. I am concerned that this could cause performance 
issues with a heavier-weight trigger function (but have no empirical
data to back up these concerns...).

> 
>> In the real app., c is a date field and I require it to be NULL for
>> some rows.
> 
> Oh. Er, In that case, the partial unique index is your best bet (but 'a'
> and 'b' should ne NOT NULL, right).

Right - see above.

> 
>> in case I am missing some other solution that
>> doesn't involve the use of triggers etc.
> 
> Sometimes a trigger is the right solution.
> 

Yep - I have many of those in other places as well.

Cheers,

Bob Edwards.

Re: uniqueness constraint with NULLs

From

Craig Ringer

Date:

29 June 2009, 22:29:16

On Tue, 2009-06-30 at 10:22 +1000, Robert Edwards wrote:

> Would this be in addition to a unique constraint on (a, b, c) (for the
> cases where c is not null)?

That depends on your app's needs. Do you need to enforce uniqueness of
(a,b,c) ? Or of (a,b) where c is _not_ null? If so, then yes, that's in
addition to the existing unique index over a, b and c.

If you're happy to permit multiple identical (a,b,c) sets so long as c
is null, then there's no need for the additional index. 

> In the real app. a and b are not null ints and c is a date. The date
> indicates if and when a row has expired (there are other columns in the
> table). I am trying to avoid having separate columns for the "if" and
> the "when" of the expiry.

Sounds fairly sensible to me, though I have the unsubstantiated feeling
that sometimes storing a boolean + date can help the database optimise
queries better. 

If you do store a boolean and a date, add a check constraint to ensure
sanity:

CHECK ( is_expired = (expired_date IS NOT NULL) )

since you don't want a valid expired_date for an unexpired row, or a row
that expired at an unknown time.

> One alternate would be to use a date way off into the future (such as
> the famous 9/9/99 case many COBOL programmers used back in the 60's...)
> and to test on expired < now ().

Ick. Ick. Ick.

> Another option is to use a separate shadow table for the expired rows
> and to use a trigger function to "move" expired rows to that shadow
> table. Then need to use UNION etc. when I need to search across both
> current and expired rows.

That can work. It might be worth using inherited tables so you have a
single parent table that has rows from both history and live tables,
though.

Often a partial index on the main table works just as well, though.
That's another benefit to adding the index on (a,b) where c is null:
it's a smaller index that takes up less memory and can be scanned much
faster.

Unlike partitioning the data into separate tables, though, you'll still
have a mix of expired and unexpired rows in the table heap. The data of
interest will be scattered across more blocks, so the index will be
bigger and more data will have to be read in to satisfy a `where c is
not null' constrained query. So a partial index isn't _as_ good as
partitioning the data - but it's quite a bit easier.

-- 
Craig Ringer