Thread: unique constraint with several null values

unique constraint with several null values

From
Mark Lybarger
Date:
I have a relation such as
create table order_item ( id uuid not null primary key, order_id number not null, item_code text, make text, model text, reason text, size text, expiration_date timestamp );

where the combination of the columns order_id, item_code, make, model, reason, size must be unique (unless there's an expiration date).  

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id, item_code, make, model, reason, size)
where expiration_date is null;

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- rejects

however, nulls are allowed for all the columns except the order_id. so, when I add a null value, it fails to meet my expectations, 

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to handle the various combinations of data that I can store.  Until now, this integrity is handled in the application code.  That breaks when the application is multi-threaded and the rules are not applied at the database level.

Another solution I can think of is to just use a trigger to prevent the duplicate rows.  

Any thoughts are certainly appreciated.  I can't do much about the data model itself right now, I need to protect the integrity of the data.  

Thanks!
-mark-





Re: unique constraint with several null values

From
"David G. Johnston"
Date:
On Wed, Jul 20, 2016 at 2:14 PM, Mark Lybarger <mlybarger@gmail.com> wrote:
I have a relation such as
create table order_item ( id uuid not null primary key, order_id number not null, item_code text, make text, model text, reason text, size text, expiration_date timestamp );

where the combination of the columns order_id, item_code, make, model, reason, size must be unique (unless there's an expiration date).  

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id, item_code, make, model, reason, size)
where expiration_date is null;

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- rejects

however, nulls are allowed for all the columns except the order_id. so, when I add a null value, it fails to meet my expectations, 

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to handle the various combinations of data that I can store.  Until now, this integrity is handled in the application code.  That breaks when the application is multi-threaded and the rules are not applied at the database level.

Another solution I can think of is to just use a trigger to prevent the duplicate rows.  

Any thoughts are certainly appreciated.  I can't do much about the data model itself right now, I need to protect the integrity of the data.  


​Experimenting using 9.6​

​You may or may not find this helpful...

DROP TYPE base_order_item
​;​
DROP TABLE order_item
​;​

CREATE TYPE base_order_item AS (id int, order_id numeric, item_code text, make text, model text, reason text);
create table order_item (base_item base_order_item, expiration_date timestamp,
exclude (base_item with =) where (expiration_date is null)
);

insert into order_item VALUES ( (1, 88, 'CODE', 'MAKE',null,'REASON')::base_order_item, null);
insert into order_item VALUES ( (1, 88, 'CODE', 'MAKE',null,'REASON')::base_order_item, null);

I could not figure out a way to specify an arbitrary record type within the exclusion constraint - which is kinda what you want​ though it seems to have its own issues...

SELECT ROW(1, 88, 'CODE', 'MAKE',null,'REASON') 
     = ROW(1, 88, 'CODE', 'MAKE',null,'REASON')
-- Yields NULL

​SELECT ROW(1, 88, 'CODE', 'MAKE',null,'REASON')::base_order_item 
     = ROW(1, 88, 'CODE', 'MAKE',null,'REASON')::base_order_item
-- Yields TRUE
​​Though

SELECT ROW(1, 88, 'CODE', 'MAKE',null,'REASON') 
     IS DISTINCT FROM ROW(1, 88, 'CODE', 'MAKE',null,'REASON')

-- Yields FALSE

​So I suppose using IS DISTINCT FROM within a Trigger is going to be your less cumbersome option.

Because exclusion constraints and indexes are operator based you are stuck in those contexts on having two null values considered unequal.

David J.

Re: unique constraint with several null values

From
Kevin Grittner
Date:
On Wed, Jul 20, 2016 at 1:48 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 2:14 PM, Mark Lybarger <mlybarger@gmail.com> wrote:

>> Another solution I can think of is to just use a trigger to
>> prevent the duplicate rows.

If you go that route you will need to use serializable
transactions, explicit locking, or trigger-based update of some
otherwise-unneeded column to avoid race conditions.  See:

https://www.postgresql.org/docs/current/static/mvcc.html

That can be a perfectly valid option as long as you handle race
conditions somehow.

>> Any thoughts are certainly appreciated.  I can't do much about
>> the data model itself right now, I need to protect the integrity
>> of the data.

Rather than unique constraints, you could add a unique index on the
COALESCE of each column with some impossible value.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: unique constraint with several null values

From
Jerry Sievers
Date:
Mark Lybarger <mlybarger@gmail.com> writes:

> I have a relation such as
> create table order_item ( id uuid not null primary key, order_id number not null, item_code text, make text, model
text,reason text, size text, expiration_date 
> timestamp );
>
> where the combination of the columns order_id, item_code, make, model, reason, size must be unique (unless there's an
expirationdate).   
>
> I'm inclined to use a unique index:
>
> create unique index unique_index_order_item_1 on order_item (order_id, item_code, make, model, reason, size)
> where expiration_date is null;
>
> this works as expected and the duplicate row is rejected
> :
> insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- first adds
> insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- rejects
>
> however, nulls are allowed for all the columns except the order_id. so, when I add a null value, it fails to meet my
expectations, 


sj$ cat q
begin;

create table foo (a int, b int);
create unique index foox on foo (coalesce(a, 0), coalesce(b, 0));

insert into foo select 1, null;
insert into foo select 1, null;
sj$ psql -f q
SET
BEGIN
CREATE TABLE
CREATE INDEX
INSERT 0 1
psql:q:7: ERROR:  duplicate key value violates unique constraint "foox"
DETAIL:  Key ((COALESCE(a, 0)), (COALESCE(b, 0)))=(1, 0) already exists.
sj$

>
> insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- first adds
> insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- adds, but should reject.
>
> This leads me to think I need to create 2^5 or 32 unique constraints to handle the various combinations of data that
Ican store.  Until now, this integrity is handled 
> in the application code.  That breaks when the application is multi-threaded and the rules are not applied at the
databaselevel. 
>
> Another solution I can think of is to just use a trigger to prevent the duplicate rows.  
>
> Any thoughts are certainly appreciated.  I can't do much about the data model itself right now, I need to protect
theintegrity of the data.   
>
> Thanks!
> -mark-
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: unique constraint with several null values

From
Chris Travers
Date:


On Wed, Jul 20, 2016 at 8:14 PM, Mark Lybarger <mlybarger@gmail.com> wrote:
I have a relation such as
create table order_item ( id uuid not null primary key, order_id number not null, item_code text, make text, model text, reason text, size text, expiration_date timestamp );

where the combination of the columns order_id, item_code, make, model, reason, size must be unique (unless there's an expiration date).  

I'm inclined to use a unique index:

create unique index unique_index_order_item_1 on order_item (order_id, item_code, make, model, reason, size)
where expiration_date is null;

You can also:
 
create unique index unique_index_order_item_1 on order_item (coalesce(order_id, -1), coalesce(item_code, -1), coalesce(make, '--NULL--), coalesce(model, '--NULL--), coalesce(reason, '--NULL--), coalesce(size, '--NULL--'), coalesce(expiration_date, '1700-01-01 00:00:00'));

this works as expected and the duplicate row is rejected
:
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE','MODEL','REASON', 'SIZE', null); <- rejects

however, nulls are allowed for all the columns except the order_id. so, when I add a null value, it fails to meet my expectations, 

insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- first adds
insert into order_item ( 1, 88, 'CODE', 'MAKE',null,'REASON', 'SIZE', null); <- adds, but should reject.

This leads me to think I need to create 2^5 or 32 unique constraints to handle the various combinations of data that I can store.  Until now, this integrity is handled in the application code.  That breaks when the application is multi-threaded and the rules are not applied at the database level.

Another solution I can think of is to just use a trigger to prevent the duplicate rows.  

Any thoughts are certainly appreciated.  I can't do much about the data model itself right now, I need to protect the integrity of the data.  

Thanks!
-mark-








--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: unique constraint with several null values

From
Jim Nasby
Date:
On 7/20/16 1:14 PM, Mark Lybarger wrote:
> This leads me to think I need to create 2^5 or 32 unique constraints to
> handle the various combinations of data that I can store.

Another option would be to create a unique index of a bit varying field
that set a bit to true for each field that was NULL WHERE <bit varying
field> != 0.

Let me know if you want to go that route, I could probably add that to
http://pgxn.org/dist/count_nulls/ without much difficulty. Though,
probably a better way to accomplish that would be to add a function to
count_nulls that spits out an array of fields that are NULL; you could
then do a unique index on that WHERE array != array[].

Maybe a less obtuse option would be to use a boolean array. Storage
would be ~8x larger, but since there should be very few rows I doubt
that matters.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461