Home > mailing lists

Re: Enforcing uniqueness on [real estate/postal] addresses - Mailing list pgsql-general

From	Paul Jungwirth
Subject	Re: Enforcing uniqueness on [real estate/postal] addresses
Date	May 11, 2020 21:33:45
Msg-id	8f2d845c-e621-c650-6891-8503d0da8f49@illuminatedcomputing.com Whole thread Raw
In response to	Enforcing uniqueness on [real estate/postal] addresses (Peter Devoy <peter@3xe.co.uk>)
Responses	Re: Enforcing uniqueness on [real estate/postal] addresses
List	pgsql-general

Tree view

On 5/11/20 9:55 AM, Peter Devoy wrote:
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
> 
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.

If you don't want to store empty strings (which I agree is a little 
yucky), you could replace NULLs with an empty string *only when checking 
for uniqueness*. To do this, first replace your unique constraint with a 
unique index, which gives you some additional features (e.g. indexing 
expressions, indexing only part of the table with a WHERE clause, 
building it concurrently, etc.). In this case we only care about 
indexing expressions.

So you can say:

     CREATE UNIQUE INDEX is_unique_address ON properties (
       COALESCE(description, ''),
       COALESCE(address_identifier_general, ''),
       COALESCE(street, ''),
       COALESCE(postcode, ''));

Another approach, which I don't think is really a serious suggestion but 
is sort of interesting to think about: you could define an operator, say 
===, that does the same thing as `IS NOT DISTINCT FROM`. Then you could 
create an exclusion constraint using that operator on all four columns. 
I've never tried that before but it seems like it would work.

Maybe that's too much effort for something like this. I just think it's 
interesting because it feels like a use case for exclusion constraints 
that goes in the "opposite direction" of how they are usually used: 
instead of being less restrictive than =, it is more restrictive.

Regards,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

pgsql-general by date:

From: "David G. Johnston"
Date: 11 May 2020, 21:33:11
Subject: Re: Enforcing uniqueness on [real estate/postal] addresses

From: Tom Lane
Date: 11 May 2020, 21:41:29
Subject: Re: PostgreSQL client hangs sometimes on 'EXEC SQL PREPARE sid_sisisinst FROM :select_anw;'

Re: Enforcing uniqueness on [real estate/postal] addresses - Mailing list pgsql-general

Previous

Next