Thread: Enforcing uniqueness on [real estate/postal] addresses

Enforcing uniqueness on [real estate/postal] addresses

From
Peter Devoy
Date:
Hi list

I need to store addresses for properties (as in real estate) so in my
naivety I created a unique constraint like this:

ALTER TABLE properties
    ADD CONSTRAINT is_unique_address
    UNIQUE (
        description, --e.g. Land north of Foo Cottage
        address_identifier_general,
        street,
        postcode
    );

Of course, if any of the fields are NULL (which they often are) I end
up with duplicates.

One solution may be to add NOT NULL constraints and use empty strings
instead of NULL values but, until asking around today, I thought this was
generally considered bad practice.

Please can anyone recommend a way of approaching this? Perhaps empty strings
are pragmatic in this situation?

Kind regards


Peter



Re: Enforcing uniqueness on [real estate/postal] addresses

From
Philip Semanchuk
Date:

> On May 11, 2020, at 12:55 PM, Peter Devoy <peter@3xe.co.uk> wrote:
>
> Hi list
>
> I need to store addresses for properties (as in real estate) so in my
> naivety I created a unique constraint like this:
>
> ALTER TABLE properties
>    ADD CONSTRAINT is_unique_address
>    UNIQUE (
>        description, --e.g. Land north of Foo Cottage
>        address_identifier_general,
>        street,
>        postcode
>    );
>
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
>
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.
>
> Please can anyone recommend a way of approaching this? Perhaps empty strings
> are pragmatic in this situation?

Hi Peter,
I wouldn’t use empty strings in place of NULL. It’s possible to define a partial unique index that has more or less the
sameeffect as a constraint. Have you looked into them?  

Cheers
Philip








Re: Enforcing uniqueness on [real estate/postal] addresses

From
Adrian Klaver
Date:
On 5/11/20 9:55 AM, Peter Devoy wrote:
> Hi list
> 
> I need to store addresses for properties (as in real estate) so in my
> naivety I created a unique constraint like this:
> 
> ALTER TABLE properties
>      ADD CONSTRAINT is_unique_address
>      UNIQUE (
>          description, --e.g. Land north of Foo Cottage
>          address_identifier_general,
>          street,
>          postcode
>      );
> 
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
> 
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.
> 
> Please can anyone recommend a way of approaching this? Perhaps empty strings
> are pragmatic in this situation?

How about?:

create table properties (description varchar, address_identifier_general 
varchar, street varchar, postcode varchar);

CREATE UNIQUE INDEX is_unique_address ON properties 
((coalesce(description, '')),
         (coalesce(address_identifier_general, '')),
         (coalesce(street, '')),
         (coalesce(postcode, ''))
     );


insert into properties (description, street, postcode) values ('test', 
'anywhere', '1234');
INSERT 0 1

insert into properties (description, street, postcode) values ('test', 
'anywhere', '1234');
ERROR:  duplicate key value violates unique constraint "is_unique_address"
DETAIL:  Key (COALESCE(description, ''::character varying), 
COALESCE(address_identifier_general, ''::character varying), 
COALESCE(street, ''::character varying), COALESCE(postcode, 
''::character varying))=(test, , anywhere, 1234) already exists.

> 
> Kind regards
> 
> 
> Peter
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Enforcing uniqueness on [real estate/postal] addresses

From
"David G. Johnston"
Date:
On Mon, May 11, 2020 at 9:56 AM Peter Devoy <peter@3xe.co.uk> wrote:
I need to store addresses for properties (as in real estate) so in my
naivety I created a unique constraint like this:

ALTER TABLE properties
    ADD CONSTRAINT is_unique_address
    UNIQUE (
        description, --e.g. Land north of Foo Cottage
        address_identifier_general,
        street,
        postcode
    );

 
Please can anyone recommend a way of approaching this? Perhaps empty strings
are pragmatic in this situation?

Don't add a unique table constraint (aside from an artificial primary key).

Then, if you can define a problem where you feel having a unique table constraint over the field is the correct solution - especially given the fact that you can have missing data in the relevant fields - you should post the problem and take suggestions on ways to solve it.

David J.

Re: Enforcing uniqueness on [real estate/postal] addresses

From
Paul Jungwirth
Date:
On 5/11/20 9:55 AM, Peter Devoy wrote:
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
> 
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.

If you don't want to store empty strings (which I agree is a little 
yucky), you could replace NULLs with an empty string *only when checking 
for uniqueness*. To do this, first replace your unique constraint with a 
unique index, which gives you some additional features (e.g. indexing 
expressions, indexing only part of the table with a WHERE clause, 
building it concurrently, etc.). In this case we only care about 
indexing expressions.

So you can say:

     CREATE UNIQUE INDEX is_unique_address ON properties (
       COALESCE(description, ''),
       COALESCE(address_identifier_general, ''),
       COALESCE(street, ''),
       COALESCE(postcode, ''));

Another approach, which I don't think is really a serious suggestion but 
is sort of interesting to think about: you could define an operator, say 
===, that does the same thing as `IS NOT DISTINCT FROM`. Then you could 
create an exclusion constraint using that operator on all four columns. 
I've never tried that before but it seems like it would work.

Maybe that's too much effort for something like this. I just think it's 
interesting because it feels like a use case for exclusion constraints 
that goes in the "opposite direction" of how they are usually used: 
instead of being less restrictive than =, it is more restrictive.

Regards,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com



Re: Enforcing uniqueness on [real estate/postal] addresses

From
Peter Devoy
Date:
Hello everyone

Thank you all for your suggestions.  I had neither heard of partial
unique indexes nor
considered using COALESCE so I will explore both of these as options
and update the
thread with how it goes.

>Then, if you can define a problem where you feel having a unique table constraint over
>the field is the correct solution [...] you should post the problem and take suggestions on
>ways to solve it.
I am not sure if I have such a problem :)  I tried to simplify my
example for the sake of
brevity but if I cannot get the above to work I will post the full problem.

Kind regards


Peter



Re: Enforcing uniqueness on [real estate/postal] addresses

From
Tim Cross
Date:
Peter Devoy <peter@3xe.co.uk> writes:

> Hi list
>
> I need to store addresses for properties (as in real estate) so in my
> naivety I created a unique constraint like this:
>
> ALTER TABLE properties
>     ADD CONSTRAINT is_unique_address
>     UNIQUE (
>         description, --e.g. Land north of Foo Cottage
>         address_identifier_general,
>         street,
>         postcode
>     );
>
> Of course, if any of the fields are NULL (which they often are) I end
> up with duplicates.
>
> One solution may be to add NOT NULL constraints and use empty strings
> instead of NULL values but, until asking around today, I thought this was
> generally considered bad practice.
>
> Please can anyone recommend a way of approaching this? Perhaps empty strings
> are pragmatic in this situation?
>
> Kind regards
>
>

Hi Peter,

Personally, I don't like the idea of using empty strings just to avoid
having nulls. This is probably a personal preference, but for me null
and '' are quite different. A null indicates an unknown - we don't know
what the value is. An empty string i.e. '' means there is no value (i.e.
we know it has no value). The difference is quite subtle and may not
seem relevant. It may not be or it may be or it may become relevant in
the future. General rule of thumb for me is that my model should reflect
the known information and should always avoid any data transformation or
mapping which reduces the known information.

I would step back a bit and think about why/what constraint you really
need and what needs to be unique. The first field which jumps out for me
is description. Is this really a unique value? Would it be possible to
have two properties with the same description? Does it matter if two
properties have the same description? Does the description really affect
property uniqueness. If two records have the same street, postcode and
general_property_identifier, but different descriptions, are they really
two different records? Will description change over time? As description
is a fairly subjective value, I would be tempted to not include it in
your unique constraint at all. In fact, I would probably keep
description in a separate table as it may be reasonable to have multiple
descriptions for a property. If you want just a single description, then
you can leave it in this table. I would not put a unique or not null
constraint on it.

This would leave you with address_identifier_general, street and
postcode. None of those will be unique by themselves. You will only get
uniqueness when you combine all 3. Can any of them be null? I would
suspect not, so I would define them with not null constraints. I would
then probably add a composite unique index using all 3 values to enforce
uniqueness. Depending on your application needs, I would probably add a
unique property_id field to the table as well (which would be the value
I would used to link records in other tables, such as a
property_description table).

Of course, there are other constraints you could consider - post code
probably has a set format which you might want to enforce or perhaps you
can access a complete listing of valid postcodes and import that into
your system as a postcode table. In that case, your postcode field might
be better defined as a foreign key constraint into the postcode table. 

When defining your constraints, it is important to consider what
information is known at the point of initial data entry. Is there a need
to enter partial data (for example, you might know the street and
postcode, but not the general_property_identifier. Is it expected or
reasonable to allow entry of this sort of partial data? If so, how will
that work with your uniqueness constraints? (it may be quite reasonable
to require all 3 fields be known). The point is, you need to know how
the system will be used and what the expectations of the users are.
Maybe there is a legitimate business case to allow partial data entry,
in which case, you may need a different approach or a way to identify
partial/incomplete records etc. 

-- 
Tim Cross



Re: Enforcing uniqueness on [real estate/postal] addresses

From
"Peter J. Holzer"
Date:
On 2020-05-12 10:49:22 +1000, Tim Cross wrote:
> Peter Devoy <peter@3xe.co.uk> writes:
> > I need to store addresses for properties (as in real estate) so in my
> > naivety I created a unique constraint like this:
> >
> > ALTER TABLE properties
> >     ADD CONSTRAINT is_unique_address
> >     UNIQUE (
> >         description, --e.g. Land north of Foo Cottage
> >         address_identifier_general,
> >         street,
> >         postcode
> >     );
> >
> > Of course, if any of the fields are NULL (which they often are) I end
> > up with duplicates.
[...]
>
> Personally, I don't like the idea of using empty strings just to avoid
> having nulls. This is probably a personal preference, but for me null
> and '' are quite different. A null indicates an unknown - we don't know
> what the value is. An empty string i.e. '' means there is no value (i.e.
> we know it has no value). The difference is quite subtle and may not
> seem relevant. It may not be or it may be or it may become relevant in
> the future. General rule of thumb for me is that my model should reflect
> the known information and should always avoid any data transformation or
> mapping which reduces the known information.
>
> I would step back a bit and think about why/what constraint you really
> need and what needs to be unique. The first field which jumps out for me
> is description. Is this really a unique value?

As the coonstraint stands, it isn't. Only description,
address_identifier_general, street and postcode together are unique.

I'd therefore ask the question in the other direction:

Is is possible to have two entries which have the same
address_identifier_general, street and postcode, but different
descriptions? What does that mean? To different properties which happen
to be at the same place or two descriptions for the same property?

(What is an address_identifier_general, btw?)

I agree with the rest of posting.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Enforcing uniqueness on [real estate/postal] addresses

From
Peter Devoy
Date:
>Is is possible to have two entries which have the same
>address_identifier_general, street and postcode, but different
>descriptions?

Unfortunately, yes.  The data comes from gov't systems to
regulate the development/alteration of arbitrary pieces of property and
those pieces do not always have a postal address.  E.g. a farmer may
one year apply to erect a wind turbine in "field north of Foo Cottage"
and the next year apply to demolish "barnhouse west of Foo Cottage".

Now, I know what you are thinking, there is a normalization opportunity and you
may well be right. However, the problem does exist in some of the other fields
too and I am already facing a fair amount of join complexity in my schema so I
am trying to figure out my options :)

>(What is an address_identifier_general, btw?)
Address identifier composed by numbers and/or characters. I'm using the
terminology from the EU's "INSPIRE Data Specification on Addresses" Guidelines.

I haven't yet had the opportunity to try out the above suggestions but I will
post again when I have.



Re: Enforcing uniqueness on [real estate/postal] addresses

From
Karsten Hilbert
Date:
On Tue, May 12, 2020 at 09:55:56PM +0100, Peter Devoy wrote:

> >Is is possible to have two entries which have the same
> >address_identifier_general, street and postcode, but different
> >descriptions?
>
> Unfortunately, yes.  The data comes from gov't systems to
> regulate the development/alteration of arbitrary pieces of property and
> those pieces do not always have a postal address.  E.g. a farmer may
> one year apply to erect a wind turbine in "field north of Foo Cottage"
> and the next year apply to demolish "barnhouse west of Foo Cottage".

LAT/LON ?

https://plus.codes/ ?

Best,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Enforcing uniqueness on [real estate/postal] addresses

From
Rich Shepard
Date:
On Tue, 12 May 2020, Peter Devoy wrote:

>> Is is possible to have two entries which have the same
>> address_identifier_general, street and postcode, but different
>> descriptions?
>
> Unfortunately, yes. The data comes from gov't systems to regulate the
> development/alteration of arbitrary pieces of property and those pieces do
> not always have a postal address. E.g. a farmer may one year apply to
> erect a wind turbine in "field north of Foo Cottage" and the next year
> apply to demolish "barnhouse west of Foo Cottage".

I'm in the US but recently read (in The Economist, I think) that house
numbers in the UK are sometimes denied by the local government and the owner
told to select a name for the property.

Can you have a primary key for the property descrption, e.g., "Foo Cottage"
and another (related) column for objects associated with it; e.g., "North
field" and "West barnhouse?"

Similar to an apartment (flat) number in a multifamily dwelling.

Rich



Re: Enforcing uniqueness on [real estate/postal] addresses

From
Adrian Klaver
Date:
On 5/12/20 1:55 PM, Peter Devoy wrote:
>> Is is possible to have two entries which have the same
>> address_identifier_general, street and postcode, but different
>> descriptions?
> 
> Unfortunately, yes.  The data comes from gov't systems to
> regulate the development/alteration of arbitrary pieces of property and
> those pieces do not always have a postal address.  E.g. a farmer may
> one year apply to erect a wind turbine in "field north of Foo Cottage"
> and the next year apply to demolish "barnhouse west of Foo Cottage".

Having farmed myself, letting a farmer define a property is a 
questionable practice:)

> 
> Now, I know what you are thinking, there is a normalization opportunity and you
> may well be right. However, the problem does exist in some of the other fields
> too and I am already facing a fair amount of join complexity in my schema so I
> am trying to figure out my options :)
> 
>> (What is an address_identifier_general, btw?)
> Address identifier composed by numbers and/or characters. I'm using the
> terminology from the EU's "INSPIRE Data Specification on Addresses" Guidelines.

Took a quick scan through above, that was mind numbing. I did see that 
there are provisions for geographic reference grids. Is that an option?

> 
> I haven't yet had the opportunity to try out the above suggestions but I will
> post again when I have.
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Enforcing uniqueness on [real estate/postal] addresses

From
"Basques, Bob (CI-StPaul)"
Date:
I've been following this thread with some interest.

Was wondering if you ever thought about binding the textual address to a USNG location. https://usngcenter.org/

You can easily add individual locations within something like a farm field with as few as eight  unique digits that
wouldidentify each 10 meter square.  I'm oversimplifying my response, but once you look through how the USNG works,
you'llsee the benefit for using it as a address/location uniqueness enforcing tool. 
 

It would easily allow for locating many different locations inside of a larger addressed location, as well as non,
addressedlocations.  The USNG location can be thought of as a unique address unto itself, and works  across the planet.
No two are alike.
 



Bobb



-----Original Message-----
From: Peter Devoy <peter@3xe.co.uk> 
Sent: Tuesday, May 12, 2020 3:56 PM
To: Peter J. Holzer <hjp-pgsql@hjp.at>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Enforcing uniqueness on [real estate/postal] addresses

Think Before You Click: This email originated outside our organization.


>Is is possible to have two entries which have the same 
>address_identifier_general, street and postcode, but different 
>descriptions?

Unfortunately, yes.  The data comes from gov't systems to regulate the development/alteration of arbitrary pieces of
propertyand those pieces do not always have a postal address.  E.g. a farmer may one year apply to erect a wind turbine
in"field north of Foo Cottage"
 
and the next year apply to demolish "barnhouse west of Foo Cottage".

Now, I know what you are thinking, there is a normalization opportunity and you may well be right. However, the problem
doesexist in some of the other fields too and I am already facing a fair amount of join complexity in my schema so I am
tryingto figure out my options :)
 

>(What is an address_identifier_general, btw?)
Address identifier composed by numbers and/or characters. I'm using the terminology from the EU's "INSPIRE Data
Specificationon Addresses" Guidelines.
 

I haven't yet had the opportunity to try out the above suggestions but I will post again when I have.



Re: Enforcing uniqueness on [real estate/postal] addresses

From
David Goodenough
Date:

There is also What3Words.com, which give a three word name to each 3m square over the world. Longer that USNG but easier to remember/type/say.

 

David

 

On Wednesday, 13 May 2020 14:33:30 BST Basques, Bob (CI-StPaul) wrote:

> I've been following this thread with some interest.

>

> Was wondering if you ever thought about binding the textual address to a

> USNG location. https://usngcenter.org/

> You can easily add individual locations within something like a farm field

> with as few as eight unique digits that would identify each 10 meter

> square. I'm oversimplifying my response, but once you look through how the

> USNG works, you'll see the benefit for using it as a address/location

> uniqueness enforcing tool.

> It would easily allow for locating many different locations inside of a

> larger addressed location, as well as non, addressed locations. The USNG

> location can be thought of as a unique address unto itself, and works

> across the planet. No two are alike.

>

>

> Bobb

>

>

>

> -----Original Message-----

> From: Peter Devoy <peter@3xe.co.uk>

> Sent: Tuesday, May 12, 2020 3:56 PM

> To: Peter J. Holzer <hjp-pgsql@hjp.at>

> Cc: pgsql-general@lists.postgresql.org

> Subject: Re: Enforcing uniqueness on [real estate/postal] addresses

>

> Think Before You Click: This email originated outside our organization.

>

>

>

> >Is is possible to have two entries which have the same

> >address_identifier_general, street and postcode, but different

> >descriptions?

>

>

> Unfortunately, yes. The data comes from gov't systems to regulate the

> development/alteration of arbitrary pieces of property and those pieces do

> not always have a postal address. E.g. a farmer may one year apply to

> erect a wind turbine in "field north of Foo Cottage"

and the next year

> apply to demolish "barnhouse west of Foo Cottage".

> Now, I know what you are thinking, there is a normalization opportunity and

> you may well be right. However, the problem does exist in some of the other

> fields too and I am already facing a fair amount of join complexity in my

> schema so I am trying to figure out my options :)

>

> >(What is an address_identifier_general, btw?)

>

> Address identifier composed by numbers and/or characters. I'm using the

> terminology from the EU's "INSPIRE Data Specification on Addresses"

> Guidelines.

> I haven't yet had the opportunity to try out the above suggestions but I

> will post again when I have.

>

 

 

RE: Enforcing uniqueness on [real estate/postal] addresses

From
Rich Shepard
Date:
On Wed, 13 May 2020, Basques, Bob (CI-StPaul) wrote:

> Was wondering if you ever thought about binding the textual address to a
> USNG location. https://usngcenter.org/

Bobb, et al.:

Why not have a 'parent' table with entries such as 'Foo Farm' and a 'child'
table with rows for sub-parts of the parent; e.g., 'Barn', 'Field'?

Rich



RE: Enforcing uniqueness on [real estate/postal] addresses

From
"Basques, Bob (CI-StPaul)"
Date:
Yup, that's where I was going with the USNG use.

Separate dataset bound to addresses where needed/desired for detailed locating.  And where no address is present, the
USNGbecomes the address. 

Bobb



-----Original Message-----
From: Rich Shepard <rshepard@appl-ecosys.com>
Sent: Wednesday, May 13, 2020 9:01 AM
To: pgsql-general@lists.postgresql.org
Subject: RE: Enforcing uniqueness on [real estate/postal] addresses

Think Before You Click: This email originated outside our organization.


On Wed, 13 May 2020, Basques, Bob (CI-StPaul) wrote:

> Was wondering if you ever thought about binding the textual address to
> a USNG location. https://usngcenter.org/

Bobb, et al.:

Why not have a 'parent' table with entries such as 'Foo Farm' and a 'child'
table with rows for sub-parts of the parent; e.g., 'Barn', 'Field'?

Rich





Re: Enforcing uniqueness on [real estate/postal] addresses

From
"Peter J. Holzer"
Date:
On 2020-05-12 21:55:56 +0100, Peter Devoy wrote:
> >Is is possible to have two entries which have the same
> >address_identifier_general, street and postcode, but different
> >descriptions?
>
> Unfortunately, yes.  The data comes from gov't systems to
> regulate the development/alteration of arbitrary pieces of property and
> those pieces do not always have a postal address.  E.g. a farmer may
> one year apply to erect a wind turbine in "field north of Foo Cottage"
> and the next year apply to demolish "barnhouse west of Foo Cottage".

I see. So postcode, street, address_identifier_general, description
(from least to most specific) together identify an object.

Going back to your original question I think that in this case it is
actually useful to distinguish between NULL (unknown) and '' (empty),
and if unknown values are forbidden, enforce that with a non null
constraint.

Consider the following examples:

postcode | street      | address_identifier_general | description
1234     | main street | 12                         | ''
1234     | main street | 12                         | NULL
1234     | main street | NULL                       | bike shed
2345     | ''          | 12                         | ''

The first one refers to the whole property at main street 12. The
second one maybe only to a part of it but we don't know which one.

In the third example tghe address_identifier_general is unknown. Some
bike shed on main street, There might be more than one, so PostgreSQL is
correct not to enforce the unique constraint.

In the last one there is no street name - it's not unknown, we know that
there is none because this is a small village which doesn't have street
names, just house numbers.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment