Thread: Enforcing uniqueness on [real estate/postal] addresses
Hi list I need to store addresses for properties (as in real estate) so in my naivety I created a unique constraint like this: ALTER TABLE properties ADD CONSTRAINT is_unique_address UNIQUE ( description, --e.g. Land north of Foo Cottage address_identifier_general, street, postcode ); Of course, if any of the fields are NULL (which they often are) I end up with duplicates. One solution may be to add NOT NULL constraints and use empty strings instead of NULL values but, until asking around today, I thought this was generally considered bad practice. Please can anyone recommend a way of approaching this? Perhaps empty strings are pragmatic in this situation? Kind regards Peter
> On May 11, 2020, at 12:55 PM, Peter Devoy <peter@3xe.co.uk> wrote: > > Hi list > > I need to store addresses for properties (as in real estate) so in my > naivety I created a unique constraint like this: > > ALTER TABLE properties > ADD CONSTRAINT is_unique_address > UNIQUE ( > description, --e.g. Land north of Foo Cottage > address_identifier_general, > street, > postcode > ); > > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. > > Please can anyone recommend a way of approaching this? Perhaps empty strings > are pragmatic in this situation? Hi Peter, I wouldn’t use empty strings in place of NULL. It’s possible to define a partial unique index that has more or less the sameeffect as a constraint. Have you looked into them? Cheers Philip
On 5/11/20 9:55 AM, Peter Devoy wrote: > Hi list > > I need to store addresses for properties (as in real estate) so in my > naivety I created a unique constraint like this: > > ALTER TABLE properties > ADD CONSTRAINT is_unique_address > UNIQUE ( > description, --e.g. Land north of Foo Cottage > address_identifier_general, > street, > postcode > ); > > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. > > Please can anyone recommend a way of approaching this? Perhaps empty strings > are pragmatic in this situation? How about?: create table properties (description varchar, address_identifier_general varchar, street varchar, postcode varchar); CREATE UNIQUE INDEX is_unique_address ON properties ((coalesce(description, '')), (coalesce(address_identifier_general, '')), (coalesce(street, '')), (coalesce(postcode, '')) ); insert into properties (description, street, postcode) values ('test', 'anywhere', '1234'); INSERT 0 1 insert into properties (description, street, postcode) values ('test', 'anywhere', '1234'); ERROR: duplicate key value violates unique constraint "is_unique_address" DETAIL: Key (COALESCE(description, ''::character varying), COALESCE(address_identifier_general, ''::character varying), COALESCE(street, ''::character varying), COALESCE(postcode, ''::character varying))=(test, , anywhere, 1234) already exists. > > Kind regards > > > Peter > > -- Adrian Klaver adrian.klaver@aklaver.com
I need to store addresses for properties (as in real estate) so in my
naivety I created a unique constraint like this:
ALTER TABLE properties
ADD CONSTRAINT is_unique_address
UNIQUE (
description, --e.g. Land north of Foo Cottage
address_identifier_general,
street,
postcode
);
Please can anyone recommend a way of approaching this? Perhaps empty strings
are pragmatic in this situation?
On 5/11/20 9:55 AM, Peter Devoy wrote: > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. If you don't want to store empty strings (which I agree is a little yucky), you could replace NULLs with an empty string *only when checking for uniqueness*. To do this, first replace your unique constraint with a unique index, which gives you some additional features (e.g. indexing expressions, indexing only part of the table with a WHERE clause, building it concurrently, etc.). In this case we only care about indexing expressions. So you can say: CREATE UNIQUE INDEX is_unique_address ON properties ( COALESCE(description, ''), COALESCE(address_identifier_general, ''), COALESCE(street, ''), COALESCE(postcode, '')); Another approach, which I don't think is really a serious suggestion but is sort of interesting to think about: you could define an operator, say ===, that does the same thing as `IS NOT DISTINCT FROM`. Then you could create an exclusion constraint using that operator on all four columns. I've never tried that before but it seems like it would work. Maybe that's too much effort for something like this. I just think it's interesting because it feels like a use case for exclusion constraints that goes in the "opposite direction" of how they are usually used: instead of being less restrictive than =, it is more restrictive. Regards, -- Paul ~{:-) pj@illuminatedcomputing.com
Hello everyone Thank you all for your suggestions. I had neither heard of partial unique indexes nor considered using COALESCE so I will explore both of these as options and update the thread with how it goes. >Then, if you can define a problem where you feel having a unique table constraint over >the field is the correct solution [...] you should post the problem and take suggestions on >ways to solve it. I am not sure if I have such a problem :) I tried to simplify my example for the sake of brevity but if I cannot get the above to work I will post the full problem. Kind regards Peter
Peter Devoy <peter@3xe.co.uk> writes: > Hi list > > I need to store addresses for properties (as in real estate) so in my > naivety I created a unique constraint like this: > > ALTER TABLE properties > ADD CONSTRAINT is_unique_address > UNIQUE ( > description, --e.g. Land north of Foo Cottage > address_identifier_general, > street, > postcode > ); > > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. > > Please can anyone recommend a way of approaching this? Perhaps empty strings > are pragmatic in this situation? > > Kind regards > > Hi Peter, Personally, I don't like the idea of using empty strings just to avoid having nulls. This is probably a personal preference, but for me null and '' are quite different. A null indicates an unknown - we don't know what the value is. An empty string i.e. '' means there is no value (i.e. we know it has no value). The difference is quite subtle and may not seem relevant. It may not be or it may be or it may become relevant in the future. General rule of thumb for me is that my model should reflect the known information and should always avoid any data transformation or mapping which reduces the known information. I would step back a bit and think about why/what constraint you really need and what needs to be unique. The first field which jumps out for me is description. Is this really a unique value? Would it be possible to have two properties with the same description? Does it matter if two properties have the same description? Does the description really affect property uniqueness. If two records have the same street, postcode and general_property_identifier, but different descriptions, are they really two different records? Will description change over time? As description is a fairly subjective value, I would be tempted to not include it in your unique constraint at all. In fact, I would probably keep description in a separate table as it may be reasonable to have multiple descriptions for a property. If you want just a single description, then you can leave it in this table. I would not put a unique or not null constraint on it. This would leave you with address_identifier_general, street and postcode. None of those will be unique by themselves. You will only get uniqueness when you combine all 3. Can any of them be null? I would suspect not, so I would define them with not null constraints. I would then probably add a composite unique index using all 3 values to enforce uniqueness. Depending on your application needs, I would probably add a unique property_id field to the table as well (which would be the value I would used to link records in other tables, such as a property_description table). Of course, there are other constraints you could consider - post code probably has a set format which you might want to enforce or perhaps you can access a complete listing of valid postcodes and import that into your system as a postcode table. In that case, your postcode field might be better defined as a foreign key constraint into the postcode table. When defining your constraints, it is important to consider what information is known at the point of initial data entry. Is there a need to enter partial data (for example, you might know the street and postcode, but not the general_property_identifier. Is it expected or reasonable to allow entry of this sort of partial data? If so, how will that work with your uniqueness constraints? (it may be quite reasonable to require all 3 fields be known). The point is, you need to know how the system will be used and what the expectations of the users are. Maybe there is a legitimate business case to allow partial data entry, in which case, you may need a different approach or a way to identify partial/incomplete records etc. -- Tim Cross
On 2020-05-12 10:49:22 +1000, Tim Cross wrote: > Peter Devoy <peter@3xe.co.uk> writes: > > I need to store addresses for properties (as in real estate) so in my > > naivety I created a unique constraint like this: > > > > ALTER TABLE properties > > ADD CONSTRAINT is_unique_address > > UNIQUE ( > > description, --e.g. Land north of Foo Cottage > > address_identifier_general, > > street, > > postcode > > ); > > > > Of course, if any of the fields are NULL (which they often are) I end > > up with duplicates. [...] > > Personally, I don't like the idea of using empty strings just to avoid > having nulls. This is probably a personal preference, but for me null > and '' are quite different. A null indicates an unknown - we don't know > what the value is. An empty string i.e. '' means there is no value (i.e. > we know it has no value). The difference is quite subtle and may not > seem relevant. It may not be or it may be or it may become relevant in > the future. General rule of thumb for me is that my model should reflect > the known information and should always avoid any data transformation or > mapping which reduces the known information. > > I would step back a bit and think about why/what constraint you really > need and what needs to be unique. The first field which jumps out for me > is description. Is this really a unique value? As the coonstraint stands, it isn't. Only description, address_identifier_general, street and postcode together are unique. I'd therefore ask the question in the other direction: Is is possible to have two entries which have the same address_identifier_general, street and postcode, but different descriptions? What does that mean? To different properties which happen to be at the same place or two descriptions for the same property? (What is an address_identifier_general, btw?) I agree with the rest of posting. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
>Is is possible to have two entries which have the same >address_identifier_general, street and postcode, but different >descriptions? Unfortunately, yes. The data comes from gov't systems to regulate the development/alteration of arbitrary pieces of property and those pieces do not always have a postal address. E.g. a farmer may one year apply to erect a wind turbine in "field north of Foo Cottage" and the next year apply to demolish "barnhouse west of Foo Cottage". Now, I know what you are thinking, there is a normalization opportunity and you may well be right. However, the problem does exist in some of the other fields too and I am already facing a fair amount of join complexity in my schema so I am trying to figure out my options :) >(What is an address_identifier_general, btw?) Address identifier composed by numbers and/or characters. I'm using the terminology from the EU's "INSPIRE Data Specification on Addresses" Guidelines. I haven't yet had the opportunity to try out the above suggestions but I will post again when I have.
On Tue, May 12, 2020 at 09:55:56PM +0100, Peter Devoy wrote: > >Is is possible to have two entries which have the same > >address_identifier_general, street and postcode, but different > >descriptions? > > Unfortunately, yes. The data comes from gov't systems to > regulate the development/alteration of arbitrary pieces of property and > those pieces do not always have a postal address. E.g. a farmer may > one year apply to erect a wind turbine in "field north of Foo Cottage" > and the next year apply to demolish "barnhouse west of Foo Cottage". LAT/LON ? https://plus.codes/ ? Best, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On Tue, 12 May 2020, Peter Devoy wrote: >> Is is possible to have two entries which have the same >> address_identifier_general, street and postcode, but different >> descriptions? > > Unfortunately, yes. The data comes from gov't systems to regulate the > development/alteration of arbitrary pieces of property and those pieces do > not always have a postal address. E.g. a farmer may one year apply to > erect a wind turbine in "field north of Foo Cottage" and the next year > apply to demolish "barnhouse west of Foo Cottage". I'm in the US but recently read (in The Economist, I think) that house numbers in the UK are sometimes denied by the local government and the owner told to select a name for the property. Can you have a primary key for the property descrption, e.g., "Foo Cottage" and another (related) column for objects associated with it; e.g., "North field" and "West barnhouse?" Similar to an apartment (flat) number in a multifamily dwelling. Rich
On 5/12/20 1:55 PM, Peter Devoy wrote: >> Is is possible to have two entries which have the same >> address_identifier_general, street and postcode, but different >> descriptions? > > Unfortunately, yes. The data comes from gov't systems to > regulate the development/alteration of arbitrary pieces of property and > those pieces do not always have a postal address. E.g. a farmer may > one year apply to erect a wind turbine in "field north of Foo Cottage" > and the next year apply to demolish "barnhouse west of Foo Cottage". Having farmed myself, letting a farmer define a property is a questionable practice:) > > Now, I know what you are thinking, there is a normalization opportunity and you > may well be right. However, the problem does exist in some of the other fields > too and I am already facing a fair amount of join complexity in my schema so I > am trying to figure out my options :) > >> (What is an address_identifier_general, btw?) > Address identifier composed by numbers and/or characters. I'm using the > terminology from the EU's "INSPIRE Data Specification on Addresses" Guidelines. Took a quick scan through above, that was mind numbing. I did see that there are provisions for geographic reference grids. Is that an option? > > I haven't yet had the opportunity to try out the above suggestions but I will > post again when I have. > > -- Adrian Klaver adrian.klaver@aklaver.com
I've been following this thread with some interest. Was wondering if you ever thought about binding the textual address to a USNG location. https://usngcenter.org/ You can easily add individual locations within something like a farm field with as few as eight unique digits that wouldidentify each 10 meter square. I'm oversimplifying my response, but once you look through how the USNG works, you'llsee the benefit for using it as a address/location uniqueness enforcing tool. It would easily allow for locating many different locations inside of a larger addressed location, as well as non, addressedlocations. The USNG location can be thought of as a unique address unto itself, and works across the planet. No two are alike. Bobb -----Original Message----- From: Peter Devoy <peter@3xe.co.uk> Sent: Tuesday, May 12, 2020 3:56 PM To: Peter J. Holzer <hjp-pgsql@hjp.at> Cc: pgsql-general@lists.postgresql.org Subject: Re: Enforcing uniqueness on [real estate/postal] addresses Think Before You Click: This email originated outside our organization. >Is is possible to have two entries which have the same >address_identifier_general, street and postcode, but different >descriptions? Unfortunately, yes. The data comes from gov't systems to regulate the development/alteration of arbitrary pieces of propertyand those pieces do not always have a postal address. E.g. a farmer may one year apply to erect a wind turbine in"field north of Foo Cottage" and the next year apply to demolish "barnhouse west of Foo Cottage". Now, I know what you are thinking, there is a normalization opportunity and you may well be right. However, the problem doesexist in some of the other fields too and I am already facing a fair amount of join complexity in my schema so I am tryingto figure out my options :) >(What is an address_identifier_general, btw?) Address identifier composed by numbers and/or characters. I'm using the terminology from the EU's "INSPIRE Data Specificationon Addresses" Guidelines. I haven't yet had the opportunity to try out the above suggestions but I will post again when I have.
There is also What3Words.com, which give a three word name to each 3m square over the world. Longer that USNG but easier to remember/type/say.
David
On Wednesday, 13 May 2020 14:33:30 BST Basques, Bob (CI-StPaul) wrote:
> I've been following this thread with some interest.
>
> Was wondering if you ever thought about binding the textual address to a
> USNG location. https://usngcenter.org/
> You can easily add individual locations within something like a farm field
> with as few as eight unique digits that would identify each 10 meter
> square. I'm oversimplifying my response, but once you look through how the
> USNG works, you'll see the benefit for using it as a address/location
> uniqueness enforcing tool.
> It would easily allow for locating many different locations inside of a
> larger addressed location, as well as non, addressed locations. The USNG
> location can be thought of as a unique address unto itself, and works
> across the planet. No two are alike.
>
>
> Bobb
>
>
>
> -----Original Message-----
> From: Peter Devoy <peter@3xe.co.uk>
> Sent: Tuesday, May 12, 2020 3:56 PM
> To: Peter J. Holzer <hjp-pgsql@hjp.at>
> Cc: pgsql-general@lists.postgresql.org
> Subject: Re: Enforcing uniqueness on [real estate/postal] addresses
>
> Think Before You Click: This email originated outside our organization.
>
>
>
> >Is is possible to have two entries which have the same
> >address_identifier_general, street and postcode, but different
> >descriptions?
>
>
> Unfortunately, yes. The data comes from gov't systems to regulate the
> development/alteration of arbitrary pieces of property and those pieces do
> not always have a postal address. E.g. a farmer may one year apply to
> erect a wind turbine in "field north of Foo Cottage"
and the next year
> apply to demolish "barnhouse west of Foo Cottage".
> Now, I know what you are thinking, there is a normalization opportunity and
> you may well be right. However, the problem does exist in some of the other
> fields too and I am already facing a fair amount of join complexity in my
> schema so I am trying to figure out my options :)
>
> >(What is an address_identifier_general, btw?)
>
> Address identifier composed by numbers and/or characters. I'm using the
> terminology from the EU's "INSPIRE Data Specification on Addresses"
> Guidelines.
> I haven't yet had the opportunity to try out the above suggestions but I
> will post again when I have.
>
On Wed, 13 May 2020, Basques, Bob (CI-StPaul) wrote: > Was wondering if you ever thought about binding the textual address to a > USNG location. https://usngcenter.org/ Bobb, et al.: Why not have a 'parent' table with entries such as 'Foo Farm' and a 'child' table with rows for sub-parts of the parent; e.g., 'Barn', 'Field'? Rich
Yup, that's where I was going with the USNG use. Separate dataset bound to addresses where needed/desired for detailed locating. And where no address is present, the USNGbecomes the address. Bobb -----Original Message----- From: Rich Shepard <rshepard@appl-ecosys.com> Sent: Wednesday, May 13, 2020 9:01 AM To: pgsql-general@lists.postgresql.org Subject: RE: Enforcing uniqueness on [real estate/postal] addresses Think Before You Click: This email originated outside our organization. On Wed, 13 May 2020, Basques, Bob (CI-StPaul) wrote: > Was wondering if you ever thought about binding the textual address to > a USNG location. https://usngcenter.org/ Bobb, et al.: Why not have a 'parent' table with entries such as 'Foo Farm' and a 'child' table with rows for sub-parts of the parent; e.g., 'Barn', 'Field'? Rich
On 2020-05-12 21:55:56 +0100, Peter Devoy wrote: > >Is is possible to have two entries which have the same > >address_identifier_general, street and postcode, but different > >descriptions? > > Unfortunately, yes. The data comes from gov't systems to > regulate the development/alteration of arbitrary pieces of property and > those pieces do not always have a postal address. E.g. a farmer may > one year apply to erect a wind turbine in "field north of Foo Cottage" > and the next year apply to demolish "barnhouse west of Foo Cottage". I see. So postcode, street, address_identifier_general, description (from least to most specific) together identify an object. Going back to your original question I think that in this case it is actually useful to distinguish between NULL (unknown) and '' (empty), and if unknown values are forbidden, enforce that with a non null constraint. Consider the following examples: postcode | street | address_identifier_general | description 1234 | main street | 12 | '' 1234 | main street | 12 | NULL 1234 | main street | NULL | bike shed 2345 | '' | 12 | '' The first one refers to the whole property at main street 12. The second one maybe only to a part of it but we don't know which one. In the third example tghe address_identifier_general is unknown. Some bike shed on main street, There might be more than one, so PostgreSQL is correct not to enforce the unique constraint. In the last one there is no street name - it's not unknown, we know that there is none because this is a small village which doesn't have street names, just house numbers. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"