Thread: uniqueness constraint with NULLs
I have a table with a uniqueness constraint on three columns: # \d bobtest Table "public.bobtest" Column | Type | Modifiers --------+---------+------------------------------------------------------ id | integer | not null default nextval('bobtest_id_seq'::regclass)a | integer | b | integer | c | integer | Indexes: "bobtest_id_key" UNIQUE, btree (id) "bobtest_unique" UNIQUE, btree (a, b, c) I can insert multiple rows with identical a and b when c is NULL: ... # insert into bobtest (a, b) values (1, 4); INSERT 0 1 # insert into bobtest (a, b, c) values (1, 4, NULL); INSERT 0 1 # select * from bobtest; id | a | b | c ----+---+---+--- 1 | 1 | 2 | 1 2 | 1 | 3 | 1 3 | 1 | 4 | 1 4 | 1 | 4 | 5 | 1 | 4 | 6 | 1 | 4 | (6 rows) Can anyone suggest a way that I can impose uniqueness on a and b when c is NULL? In the real app., c is a date field and I require it to be NULL for some rows. In these cases, I only want at most one row with identical a and b, but I can have identical a and b when c is a real date as long as that date is also unique for a given a and b. I'm guessing I'm going to need to use a function and that someone will yell at me for using NULLs to represent real data, but I thought I'd be brave and ask anyway, in case I am missing some other solution that doesn't involve the use of triggers etc. Cheers, Bob Edwards.
In response to Robert Edwards : > Can anyone suggest a way that I can impose uniqueness on a and b when > c is NULL? Sure, use a functional index: test=# create table bobtest (a int, b int, c int); CREATE TABLE test=*# create unique index idx_bobtest on bobtest(a,b,coalesce(c::text,'NULL')); CREATE INDEX test=*# insert into bobtest (a, b) values (1, 4); INSERT 0 1 test=*# insert into bobtest (a, b, c) values (1, 4, NULL); ERROR: duplicate key value violates unique constraint "idx_bobtest" test=!# Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
A. Kretschmer wrote: > In response to Robert Edwards : >> Can anyone suggest a way that I can impose uniqueness on a and b when >> c is NULL? > > Sure, use a functional index: > > test=# create table bobtest (a int, b int, c int); > CREATE TABLE > test=*# create unique index idx_bobtest on > bobtest(a,b,coalesce(c::text,'NULL')); > CREATE INDEX > test=*# insert into bobtest (a, b) values (1, 4); > INSERT 0 1 > test=*# insert into bobtest (a, b, c) values (1, 4, NULL); > ERROR: duplicate key value violates unique constraint "idx_bobtest" > test=!# > > > Regards, Andreas Beautiful! Many thanks, Bob Edwards.
On Mon, 2009-06-29 at 17:08 +1000, Robert Edwards wrote: > A. Kretschmer wrote: > > In response to Robert Edwards : > >> Can anyone suggest a way that I can impose uniqueness on a and b when > >> c is NULL? > > > > Sure, use a functional index: > > > > test=# create table bobtest (a int, b int, c int); > > CREATE TABLE > > test=*# create unique index idx_bobtest on > > bobtest(a,b,coalesce(c::text,'NULL')); > > CREATE INDEX > > test=*# insert into bobtest (a, b) values (1, 4); > > INSERT 0 1 > > test=*# insert into bobtest (a, b, c) values (1, 4, NULL); > > ERROR: duplicate key value violates unique constraint "idx_bobtest" > > test=!# > > Why changing the datatype of coulumn "c"? Will this not probably disable the use of the index? Why not only: lem=# create table bobtest (a int, b int, c int); CREATE TABLE lem=# create unique index idc_bobtest on bobtest (a, b, coalesce(c, 1)); CREATE INDEX lem=# insert into bobtest (a, b) values (1, 4); INSERT 0 1 lem=# insert into bobtest (a, b) values (1, 4); ERROR: duplicate key value violates unique constraint "idc_bobtest" lem=# insert into bobtest (a, b, c) values (1, 4, null); ERROR: duplicate key value violates unique constraint "idc_bobtest" lem=# > > > > Regards, Andreas > > Beautiful! > > Many thanks, > > Bob Edwards. >
In response to Robert Edwards : > A. Kretschmer wrote: > >In response to Robert Edwards : > >>Can anyone suggest a way that I can impose uniqueness on a and b when > >>c is NULL? > > > >Sure, use a functional index: > > > >test=# create table bobtest (a int, b int, c int); > >CREATE TABLE > >test=*# create unique index idx_bobtest on > >bobtest(a,b,coalesce(c::text,'NULL')); > >CREATE INDEX > >test=*# insert into bobtest (a, b) values (1, 4); > >INSERT 0 1 > >test=*# insert into bobtest (a, b, c) values (1, 4, NULL); > >ERROR: duplicate key value violates unique constraint "idx_bobtest" > >test=!# > > > > > >Regards, Andreas > > Beautiful! Btw: if your data doesn't contains 0 or negative, you can use coalesce(c,0) or coalesce(c,-1). It would be better for searching. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
In response to Leo Mannhart : > On Mon, 2009-06-29 at 17:08 +1000, Robert Edwards wrote: > > A. Kretschmer wrote: > > > In response to Robert Edwards : > > >> Can anyone suggest a way that I can impose uniqueness on a and b when > > >> c is NULL? > > > > > > Sure, use a functional index: > > > > > > test=# create table bobtest (a int, b int, c int); > > > CREATE TABLE > > > test=*# create unique index idx_bobtest on > > > bobtest(a,b,coalesce(c::text,'NULL')); > > > CREATE INDEX > > > test=*# insert into bobtest (a, b) values (1, 4); > > > INSERT 0 1 > > > test=*# insert into bobtest (a, b, c) values (1, 4, NULL); > > > ERROR: duplicate key value violates unique constraint "idx_bobtest" > > > test=!# > > > > Why changing the datatype of coulumn "c"? Will this not probably disable > the use of the index? Why not only: Maybe, depends on the query. > > lem=# create table bobtest (a int, b int, c int); > CREATE TABLE > lem=# create unique index idc_bobtest on bobtest (a, b, coalesce(c, 1)); bad idea if 1 is a valid value for this column. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Mon, 2009-06-29 at 15:42 +1000, Robert Edwards wrote: > Can anyone suggest a way that I can impose uniqueness on a and b when > c is NULL? One way is to add an additional partial index on (a,b): CREATE INDEX bobtest_ab_unique ON bobtest(a,b) WHERE (c IS NULL); ... however, if you want to do the same sort of thing for all permutations (a, null, null), (b, null, null), (c, null, null), (a, b, null), (a, c, null), (b, c, null), (a, b, c)that'll be a LOT of indexes. In that case you might be better off just using a trigger function like (untested but should be about right): CREATE OR REPLACE FUNCTION bobtest_unique_trigger() RETURNS trigger AS $$ declare conflicting_id integer; begin if TG_OP = 'INSERT' or TG_OP = 'UPDATE' then select into conflicting_id from bobtest where (NOT new.a IS DISTINCTFROM a) and (NOT new.b IS DISTINCT FROM b) and (NOT new.c IS DISTINCT FROM c); if found then raiseexception 'Unique violation in bobest: inserted row conflicts with row id=%',conflicting_id; end if; end if; end; $$ LANGUAGE 'plpgsql'; ... which enforces uniqueness considering nulls. > In the real app., c is a date field and I require it to be NULL for > some rows. Oh. Er, In that case, the partial unique index is your best bet (but 'a' and 'b' should ne NOT NULL, right). > in case I am missing some other solution that > doesn't involve the use of triggers etc. Sometimes a trigger is the right solution. -- Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes: > On Mon, 2009-06-29 at 15:42 +1000, Robert Edwards wrote: >> Can anyone suggest a way that I can impose uniqueness on a and b when >> c is NULL? > One way is to add an additional partial index on (a,b): > CREATE INDEX bobtest_ab_unique ON bobtest(a,b) WHERE (c IS NULL); That's the way I'd suggest; unlike the other proposal, it doesn't make any assumptions about datatypes and it doesn't require there to be a special non-null value that won't be a real data value. > ... however, if you want to do the same sort of thing for all > permutations (a, null, null), (b, null, null), (c, null, null), (a, b, > null), (a, c, null), (b, c, null), (a, b, c)that'll be a LOT of indexes. ... yeah. So one answer that definitely requires consideration is "you have misdesigned your data representation; do not try to use NULL this way". > In that case you might be better off just using a trigger function like > (untested but should be about right): This trigger has race conditions: it will fail to prevent concurrent insertion of rows that you would like to have conflict. I think it does the wrong thing for the UPDATE case too, though that is fixable. The race condition isn't. regards, tom lane
Thanks for all these great ideas! Craig Ringer wrote: > On Mon, 2009-06-29 at 15:42 +1000, Robert Edwards wrote: > >> Can anyone suggest a way that I can impose uniqueness on a and b when >> c is NULL? > > One way is to add an additional partial index on (a,b): > > CREATE INDEX bobtest_ab_unique ON bobtest(a,b) WHERE (c IS NULL); Would this be in addition to a unique constraint on (a, b, c) (for the cases where c is not null)? > > ... however, if you want to do the same sort of thing for all > permutations (a, null, null), (b, null, null), (c, null, null), (a, b, > null), (a, c, null), (b, c, null), (a, b, c)that'll be a LOT of indexes. > In the real app. a and b are not null ints and c is a date. The date indicates if and when a row has expired (there are other columns in the table). I am trying to avoid having separate columns for the "if" and the "when" of the expiry. One alternate would be to use a date way off into the future (such as the famous 9/9/99 case many COBOL programmers used back in the 60's...) and to test on expired < now (). Another option is to use a separate shadow table for the expired rows and to use a trigger function to "move" expired rows to that shadow table. Then need to use UNION etc. when I need to search across both current and expired rows. > In that case you might be better off just using a trigger function like > (untested but should be about right): > > CREATE OR REPLACE FUNCTION bobtest_unique_trigger() RETURNS trigger AS > $$ > declare > conflicting_id integer; > begin > if TG_OP = 'INSERT' or TG_OP = 'UPDATE' then > select into conflicting_id from bobtest > where (NOT new.a IS DISTINCT FROM a) > and (NOT new.b IS DISTINCT FROM b) > and (NOT new.c IS DISTINCT FROM c); > if found then > raise exception 'Unique violation in bobest: inserted row > conflicts with row id=%',conflicting_id; > end if; > end if; > end; > $$ LANGUAGE 'plpgsql'; > > ... which enforces uniqueness considering nulls. I am "guessing" that the "functional index" that Andreas Kretschmer proposed would be a lot "lighter-weight" than a full trigger. This table will get quite a bit of insert activity and some update activity on the "c" (expired) column, so this uniqueness index will get exercised quite a lot. I am concerned that this could cause performance issues with a heavier-weight trigger function (but have no empirical data to back up these concerns...). > >> In the real app., c is a date field and I require it to be NULL for >> some rows. > > Oh. Er, In that case, the partial unique index is your best bet (but 'a' > and 'b' should ne NOT NULL, right). Right - see above. > >> in case I am missing some other solution that >> doesn't involve the use of triggers etc. > > Sometimes a trigger is the right solution. > Yep - I have many of those in other places as well. Cheers, Bob Edwards.
On Tue, 2009-06-30 at 10:22 +1000, Robert Edwards wrote: > Would this be in addition to a unique constraint on (a, b, c) (for the > cases where c is not null)? That depends on your app's needs. Do you need to enforce uniqueness of (a,b,c) ? Or of (a,b) where c is _not_ null? If so, then yes, that's in addition to the existing unique index over a, b and c. If you're happy to permit multiple identical (a,b,c) sets so long as c is null, then there's no need for the additional index. > In the real app. a and b are not null ints and c is a date. The date > indicates if and when a row has expired (there are other columns in the > table). I am trying to avoid having separate columns for the "if" and > the "when" of the expiry. Sounds fairly sensible to me, though I have the unsubstantiated feeling that sometimes storing a boolean + date can help the database optimise queries better. If you do store a boolean and a date, add a check constraint to ensure sanity: CHECK ( is_expired = (expired_date IS NOT NULL) ) since you don't want a valid expired_date for an unexpired row, or a row that expired at an unknown time. > One alternate would be to use a date way off into the future (such as > the famous 9/9/99 case many COBOL programmers used back in the 60's...) > and to test on expired < now (). Ick. Ick. Ick. > Another option is to use a separate shadow table for the expired rows > and to use a trigger function to "move" expired rows to that shadow > table. Then need to use UNION etc. when I need to search across both > current and expired rows. That can work. It might be worth using inherited tables so you have a single parent table that has rows from both history and live tables, though. Often a partial index on the main table works just as well, though. That's another benefit to adding the index on (a,b) where c is null: it's a smaller index that takes up less memory and can be scanned much faster. Unlike partitioning the data into separate tables, though, you'll still have a mix of expired and unexpired rows in the table heap. The data of interest will be scattered across more blocks, so the index will be bigger and more data will have to be read in to satisfy a `where c is not null' constrained query. So a partial index isn't _as_ good as partitioning the data - but it's quite a bit easier. -- Craig Ringer