Thread: Lookup tables
Should lookup tables have a numeric FK column as well as the description column? If so, how should I add an FK to the two lookup tables in my database? TIA, Rich
On Tuesday, February 4, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:
Should lookup tables have a numeric FK column as well as the description column?
If so, how should I add an FK to the two lookup tables in my database?
Most do (have a surrogate PK) since it removes cascading updates and is a smaller value.
Lots of alter tables and update queries.
David J.
On Tue, 4 Feb 2025, Rob Sargent wrote: > I would definitely add an Id for each of the looked up values. The code > can use the Id (for the join or lookup) and the string value can safely be > changed (think typo) without ripple effect. It also prevents other tables > from referencing the lookup witrh bogus values (those not in the lookup > table) > > Unless your lookup tables are huge I would create a new table matching > your current table but with an identity column and load from you original > table. > https://www.postgresql.org/docs/current/ddl-identity-columns.html Thanks, Rob. Each lookup table has only a very few rows, I'll make new tables today. Regards, Rich
On Tue, 4 Feb 2025, David G. Johnston wrote: > Most do (have a surrogate PK) since it removes cascading updates and is a > smaller value. > > Lots of alter tables and update queries. David, That's a good point. Thanks, Rich
On Tue, Feb 4, 2025 at 9:41 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, February 4, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:Should lookup tables have a numeric FK column as well as the description column?
If so, how should I add an FK to the two lookup tables in my database?Most do (have a surrogate PK) since it removes cascading updates
How does a synthetic PK "remove cascading updates"? Doesn't the decision on whether or not to cascade update depend on the ON UPDATE CASCADE clause of the FK definition?
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tuesday, February 4, 2025, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Tue, Feb 4, 2025 at 9:41 AM David G. Johnston <david.g.johnston@gmail.com> wrote:On Tuesday, February 4, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:Should lookup tables have a numeric FK column as well as the description column?
If so, how should I add an FK to the two lookup tables in my database?Most do (have a surrogate PK) since it removes cascading updatesHow does a synthetic PK "remove cascading updates"? Doesn't the decision on whether or not to cascade update depend on the ON UPDATE CASCADE clause of the FK definition?
People don’t change synthetic PKs so updates never have to happen on the FK side. Labels do get changed, in which case you have to update the FK label side.
David J.
On Tue, Feb 4, 2025 at 9:28 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
Should lookup tables have a numeric FK column as well as the description column?
Does your lookup table just have one column? (That's what your question seems to imply, but that makes no sense, since the whole point of a lookup table is to store some sort of a code in the "child" table instead of the whole text of the description.)
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tuesday, February 4, 2025, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Tue, Feb 4, 2025 at 9:28 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:Should lookup tables have a numeric FK column as well as the description column?Does your lookup table just have one column? (That's what your question seems to imply, but that makes no sense, since the whole point of a lookup table is to store some sort of a code in the "child" table instead of the whole text of the description.)
The point of a lookup table is to provide a unique list of authoritative values for some purpose. Kinda like an enum. But having the label serve as the unique value is reasonable - we only add surrogates for optimization.
David J.
On Tue, 4 Feb 2025, Ron Johnson wrote: > Does your lookup table just have one column? (That's what your question > seems to imply, but that makes no sense, since the whole point of a lookup > table is to store some sort of a code in the "child" table instead of the > whole text of the description.) Ron, Yes, each has a single column of type names, industrytypes and statustypes. I've always used the descriptive names in queries. Rich
On Tue, Feb 4, 2025 at 9:59 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, February 4, 2025, Ron Johnson <ronljohnsonjr@gmail.com> wrote:On Tue, Feb 4, 2025 at 9:41 AM David G. Johnston <david.g.johnston@gmail.com> wrote:On Tuesday, February 4, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:Should lookup tables have a numeric FK column as well as the description column?
If so, how should I add an FK to the two lookup tables in my database?Most do (have a surrogate PK) since it removes cascading updatesHow does a synthetic PK "remove cascading updates"? Doesn't the decision on whether or not to cascade update depend on the ON UPDATE CASCADE clause of the FK definition?People don’t change synthetic PKs so updates never have to happen on the FK side. Labels do get changed, in which case you have to update the FK label side.
That's the argument between a synthetic PK and a natural PK. (Everywhere I've seen natural PK used -- that was 25 years ago -- you didn't use the whole text of the description of the PK, you used a code, like 'HRLY' for hourly wage employees and 'SLRY' for salaried employees. Then, HRLY or SLRY would go in the t_employee table income_type VARCHAR(4) column, referencing a lookup table.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tue, 4 Feb 2025, David G. Johnston wrote: > The point of a lookup table is to provide a unique list of authoritative > values for some purpose. Kinda like an enum. But having the label serve as > the unique value is reasonable - we only add surrogates for optimization. David, The industrytypes table has 26 rows, the statustypes table has 8 rows. Your explanation suggests that for this database adding a PK to each table adds little, if anything. Regards, Rich
On Tue, Feb 4, 2025 at 10:05 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 4 Feb 2025, Ron Johnson wrote:
> Does your lookup table just have one column? (That's what your question
> seems to imply, but that makes no sense, since the whole point of a lookup
> table is to store some sort of a code in the "child" table instead of the
> whole text of the description.)
Ron,
Yes, each has a single column of type names, industrytypes and statustypes.
I've always used the descriptive names in queries.
Yeah, that's definitely Bad Design, for the reason David enumerated.
In the lookup table, the PK can be either synthetic (an integer that means nothing) or natural (a short text code, typically four characters) that is an abbreviation of the description.
Synthetic keys have been the norm for the past 25+ years, but I have a soft spot for natural keys.
Natural keys become unwieldy, though, when more than two columns are required to define primality.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tuesday, February 4, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 4 Feb 2025, David G. Johnston wrote:The point of a lookup table is to provide a unique list of authoritative
values for some purpose. Kinda like an enum. But having the label serve as
the unique value is reasonable - we only add surrogates for optimization.
David,
The industrytypes table has 26 rows, the statustypes table has 8 rows. Your
explanation suggests that for this database adding a PK to each table adds
little, if anything.
It’s the FK side where the cost savings are experienced.
David J.
On Tue, Feb 4, 2025 at 10:08 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Tue, 4 Feb 2025, David G. Johnston wrote:
> The point of a lookup table is to provide a unique list of authoritative
> values for some purpose. Kinda like an enum. But having the label serve as
> the unique value is reasonable - we only add surrogates for optimization.
David,
The industrytypes table has 26 rows, the statustypes table has 8 rows. Your
explanation suggests that for this database adding a PK to each table adds
little, if anything.
How big is the database? A tiny 500MB db just for you can get by with poor design. (But then, why are you using PG instead of SQLite?)
More importantly, will you ever update the descriptions? Of course not! Famous last words. Having a separate PK means that you update one row in one column, while what you've done means that tens/hundreds of thousands of rows in possibly dozens of tables need to be updated.
It also means that you can easily change things in your ad hoc database without forgetting to update a table.
This is called an "update anomaly" in relational design theory.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tue, 4 Feb 2025, David G. Johnston wrote: > It’s the FK side where the cost savings are experienced. David, Okay. Thanks, Rich
On 2/4/25 07:19, Ron Johnson wrote: > How big is the database? A tiny 500MB db just for you can get by with > poor design. (But then, why are you using PG instead of SQLite?) For the reasons listed here: https://sqlite.org/quirks.html In particular: [...] 2. SQLite Is Embedded, Not Client-Server 3. Flexible Typing SQLite is flexible with regard to datatypes. Datatypes are advisory rather than mandatory. 4. Foreign Key Enforcement Is Off By Default [...] > > More importantly, will you ever update the descriptions? Of course not! > Famous last words. Having a separate PK means that you update one row > in one column, while what you've done means that tens/hundreds of > thousands of rows in possibly dozens of tables need to be updated. > > It also means that *you* can easily change things in your ad hoc > database *without forgetting* to update a table. > > This is called an "update anomaly" in relational design theory. > > -- > Death to <Redacted>, and butter sauce. > Don't boil me, I'm still alive. > <Redacted> lobster! -- Adrian Klaver adrian.klaver@aklaver.com
> On 4 Feb 2025, at 15:27, Rich Shepard <rshepard@appl-ecosys.com> wrote: > > Should lookup tables have a numeric FK column as well as the description column? > > If so, how should I add an FK to the two lookup tables in my database? I’ve read the whole thread and the reasoning for having (numeric) autogenerated surrogate key is: a) performance b) no cascading updates I would like to add another dimension to this discussion: logical consistency. Imagine the following simplified schema with surrogate PK: create table restaurant ( restaurant_id int not null primary key generated always as identity, name text not null unique ); create table restaurant_visit ( visit_id int not null primary key generated always as identity, guest_username text not null, when date not null, restaurant_id not null foreign key restaurant(restaurant_id), rating smallint not null ); Let’s say users on your website register restaurant visits and rate them. The query to register a visit is: insert into restaurant_visit select $user, current_date, restaurant_id, $rating from restaurant where name = $restaurant_name It is now completely unclear what it means to change the name of the restaurant for already registered visits. Is it still the same restaurant with a different name or a different restaurant? Or let say someone swaps names of two restaurants. That means a user that goes to the same restaurant every day would register visits to two different restaurants! Using the name of a restaurant as primary key gets rid of these logical anomalies because the database model now reflects facts from reality. Having surrogate keys makes your relational database more like a network/object oriented database where rows don’t representfacts but rather some entities that have identity independent from their attributes. Thanks, Michal
Am Tue, Feb 04, 2025 at 05:31:13PM +0100 schrieb Michał Kłeczek: > It is now completely unclear what it means to change the name of the restaurant for already registered visits. > Is it still the same restaurant with a different name or a different restaurant? > > Or let say someone swaps names of two restaurants. > That means a user that goes to the same restaurant every day would register visits to two different restaurants! > > Using the name of a restaurant as primary key gets rid of these logical anomalies because > the database model now reflects facts from reality. Reality tends to become so ambiguous as to not be reflectable (two entirely different restaurants eventually, within the flow of time, carry the very same name). A primary key is very likely not the proper place to reflect arbitrary business logic (is it the same restaurant or not ? what if two restaurants have the same name at the same time ?). Primary keys are tools at the technical level. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On 2/4/25 09:51, Karsten Hilbert wrote: > Am Tue, Feb 04, 2025 at 05:31:13PM +0100 schrieb Michał Kłeczek: > >> It is now completely unclear what it means to change the name of the restaurant for already registered visits. >> Is it still the same restaurant with a different name or a different restaurant? >> >> Or let say someone swaps names of two restaurants. >> That means a user that goes to the same restaurant every day would register visits to two different restaurants! >> >> Using the name of a restaurant as primary key gets rid of these logical anomalies because >> the database model now reflects facts from reality. > Reality tends to become so ambiguous as to not be > reflectable (two entirely different restaurants eventually, > within the flow of time, carry the very same name). > > A primary key is very likely not the proper place to reflect > arbitrary business logic (is it the same restaurant or not ? > what if two restaurants have the same name at the same time > ?). Primary keys are tools at the technical level. > > Karsten > -- > GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B > > That OP is using a single column is interesting: Taking this notion to the restaurant/visit theme, the list of names or restaurants becomes a proper table 'restaurant' (name, addresss, phone, website, etc). The name is as useless as a key as first/last is for person.
On Tue, Feb 4, 2025 at 11:31 AM Michał Kłeczek <michal@kleczek.org> wrote:
[snip]
The query to register a visit is:
insert into restaurant_visit
select $user, current_date, restaurant_id, $rating
from restaurant where name = $restaurant_name
It is now completely unclear what it means to change the name of the restaurant for already registered visits.
Is it still the same restaurant with a different name or a different restaurant?
Or let say someone swaps names of two restaurants.
That means a user that goes to the same restaurant every day would register visits to two different restaurants!
Valid concerns, which means that you add a new restaurant record when the name changes.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 2/4/25 10:03, Ron Johnson wrote:
On Tue, Feb 4, 2025 at 11:31 AM Michał Kłeczek <michal@kleczek.org> wrote:[snip]
The query to register a visit is:
insert into restaurant_visit
select $user, current_date, restaurant_id, $rating
from restaurant where name = $restaurant_name
It is now completely unclear what it means to change the name of the restaurant for already registered visits.
Is it still the same restaurant with a different name or a different restaurant?
Or let say someone swaps names of two restaurants.
That means a user that goes to the same restaurant every day would register visits to two different restaurants!Valid concerns, which means that you add a new restaurant record when the name changes.
And there goes your unique index on phone number :)
> On 4 Feb 2025, at 17:51, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > > Am Tue, Feb 04, 2025 at 05:31:13PM +0100 schrieb Michał Kłeczek: > >> It is now completely unclear what it means to change the name of the restaurant for already registered visits. >> Is it still the same restaurant with a different name or a different restaurant? >> >> Or let say someone swaps names of two restaurants. >> That means a user that goes to the same restaurant every day would register visits to two different restaurants! >> >> Using the name of a restaurant as primary key gets rid of these logical anomalies because >> the database model now reflects facts from reality. > > Reality tends to become so ambiguous as to not be > reflectable (two entirely different restaurants eventually, > within the flow of time, carry the very same name). > > A primary key is very likely not the proper place to reflect > arbitrary business logic (is it the same restaurant or not ? > what if two restaurants have the same name at the same time These are of course problems ( and beyond the scope of my contrived example ). The point is though, that having surrogate PK not only does not solve these issues but makes them worse by kicking the candown the road and allowing for inconsistencies. — Michał
On Tue, Feb 4, 2025 at 12:05 PM Rob Sargent <robjsargent@gmail.com> wrote:
On 2/4/25 10:03, Ron Johnson wrote:And there goes your unique index on phone number :)On Tue, Feb 4, 2025 at 11:31 AM Michał Kłeczek <michal@kleczek.org> wrote:[snip]
The query to register a visit is:
insert into restaurant_visit
select $user, current_date, restaurant_id, $rating
from restaurant where name = $restaurant_name
It is now completely unclear what it means to change the name of the restaurant for already registered visits.
Is it still the same restaurant with a different name or a different restaurant?
Or let say someone swaps names of two restaurants.
That means a user that goes to the same restaurant every day would register visits to two different restaurants!Valid concerns, which means that you add a new restaurant record when the name changes.
I don't think I'd ever do that, since phone numbers can get reassigned.
Capital-L Large and old businesses probably don't even have unique indices on SSN.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On Tue, Feb 4, 2025 at 9:31 AM Michał Kłeczek <michal@kleczek.org> wrote:
> On 4 Feb 2025, at 15:27, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> Should lookup tables have a numeric FK column as well as the description column?
>
> If so, how should I add an FK to the two lookup tables in my database?
I’ve read the whole thread and the reasoning for having (numeric) autogenerated surrogate key is:
a) performance
b) no cascading updates
I would like to add another dimension to this discussion: logical consistency.
Using the name of a restaurant as primary key gets rid of these logical anomalies because
the database model now reflects facts from reality.
Well, we were talking about lookup tables and not entity modelling...
Having surrogate keys makes your relational database more like a network/object oriented database where rows don’t represent facts but rather some entities that have identity independent from their attributes.
Exactly, this is why surrogate keys are not just inventions for performance (when it comes to entities, not attribute lookups) but rather are necessary because of the mutability of real-world objects.
My identity is separate from any single value of my attributes. Basically any single thing about me can be changed but the coherent existence of my "self" remains the same.
Frankly, the restaurant example the "Owner" of the business should probably be considered part of its primary key - they don't announce "under new ownership/management" just for fun - the new owner wants to keep the brand recognition but discard historical opinions that are likely no longer true.
David J.
Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface.
Cheers
Thiemo
Cheers
Thiemo
> On 4 Feb 2025, at 18:27, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > > Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface. That is even worse because it ceases being transactional and users might select something different than what they see onthe screen. — Michał
Am Tue, Feb 04, 2025 at 06:30:53PM +0100 schrieb Michał Kłeczek: > > On 4 Feb 2025, at 18:27, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > > > > Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface. > > That is even worse because it ceases being transactional and users might select something different than what they seeon the screen. I never thought of that, thanks. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On Tuesday, February 4, 2025, Michał Kłeczek <michal@kleczek.org> wrote:
> On 4 Feb 2025, at 18:27, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote:
>
> Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface.
That is even worse because it ceases being transactional and users might select something different than what they see on the screen.
What is the process flow for this happening? How big a bug needs to exist in the system for something bad to happen as opposed to some kind of “value not found” error and a refresh with the correct matching values showing up in the UI?
David J.
04.02.2025 18:31:09 Michał Kłeczek <michal@kleczek.org>: >> Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface. > > That is even worse because it ceases being transactional and users might select something different than what they seeon the screen. In how far is a real check constraint less transactional? And in how far is it more advisable to have a real check constraintand fill your dropdown boxes from another source and having to keep that source on sync with the real check constraint?
On Tue, 4 Feb 2025, Rob Sargent wrote: > Unless your lookup tables are huge I would create a new table matching > your current table but with an identity column and load from you original > table. I created a new table: create table ind_types_lu ( ind_nbr serial primary key, ind_name varchar(32) not null ); Now the database has the new table and a new sequence: public | ind_types_lu | table | rshepard public | ind_types_lu_ind_nbr_seq | sequence | rshepard I want to replace the old lookup table (with no FK) with this one. Can I use: alter table rename ind_types_lu to industrytypes; and have the sequence name changed, too? TIA, Rich
On Tuesday, February 4, 2025, Rich Shepard <rshepard@appl-ecosys.com> wrote:
I want to replace the old lookup table (with no FK) with this one.
Is the use of FK here intentional or a typo? Because everything written so far leads me to believe it should be PK. Pri,are key is the unique side, Foreign key is the usage side.
David J.
On Tue, 4 Feb 2025, David G. Johnston wrote: > Is the use of FK here intentional or a typo? Sigh, typo. Should be PK. Rich
On Tue, 4 Feb 2025, Rich Shepard wrote: > I'll make new tables today. That did not work as well as I expected. Instead, I added a new column (type `serial') to each of the two lookup tables. That adds a PK to each while retaining the table and column names referenced by other tables. Thanks for all your comments, Rich
04.02.2025 18:31:09 Michał Kłeczek <michal@kleczek.org>: > >> On 4 Feb 2025, at 18:27, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: >> >> Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface. > > That is even worse because it ceases being transactional and users might select something different than what they seeon the screen. I might see what you want to point out. E.g. the table is COLOURS. The rec with id 1 is RED, the one with id 2 is BLUE, 3is GREE and so on. Now you load these values into the dropdown box that sports RED, BLUE, GREE and so on. While someoneselects GREE, there is a maintenance release changing GREE to YELLOW. So when that someone sends the selection byid to the backend, not GREE is selected but YELLOW. A) Your release changed the sementics of the record 3. It's meaning changed. I cannot recommend doing that. B) If you absolutely must change the semantic, put your application into maintenance mode in which noone can select anythingbeforehand. If the maintenance would just correct the typo from GREE to GREEN, nothing would happen. Yor customer still ordered the lavishlygreen E-Bike her hear ever desired.
Am Tue, Feb 04, 2025 at 10:41:38PM +0100 schrieb Thiemo Kellner: > >> On 4 Feb 2025, at 18:27, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > >> > >> Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface. > > > > That is even worse because it ceases being transactional and users might select something different than what they seeon the screen. > > I might see what you want to point out. E.g. the table is COLOURS. The rec with id 1 is RED, the one with id 2 is BLUE,3 is GREE and so on. Now you load these values into the dropdown box that sports RED, BLUE, GREE and so on. While someoneselects GREE, there is a maintenance release changing GREE to YELLOW. So when that someone sends the selection byid to the backend, not GREE is selected but YELLOW. Yep, that's what I meant and which I never thought of before. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On 4 Feb 2025, at 18:11, David G. Johnston <david.g.johnston@gmail.com> wrote:On Tue, Feb 4, 2025 at 9:31 AM Michał Kłeczek <michal@kleczek.org> wrote:
> On 4 Feb 2025, at 15:27, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>
> Should lookup tables have a numeric FK column as well as the description column?
>
> If so, how should I add an FK to the two lookup tables in my database?
I’ve read the whole thread and the reasoning for having (numeric) autogenerated surrogate key is:
a) performance
b) no cascading updates
I would like to add another dimension to this discussion: logical consistency.
Using the name of a restaurant as primary key gets rid of these logical anomalies because
the database model now reflects facts from reality.Well, we were talking about lookup tables and not entity modelling...
True. Maybe this is not really the right place to explore the subject so this is my last message in this thread.
Having surrogate keys makes your relational database more like a network/object oriented database where rows don’t represent facts but rather some entities that have identity independent from their attributes.Exactly, this is why surrogate keys are not just inventions for performance (when it comes to entities, not attribute lookups) but rather are necessary because of the mutability of real-world objects.My identity is separate from any single value of my attributes. Basically any single thing about me can be changed but the coherent existence of my "self" remains the same.
This is really the difference in philosophy of database design:
In my view a relational database does not contain “entities” and their “attributes" but rather “relevant facts about real world”.
“Facts” are statements constructed from templates (ie. relation predicates) filled with tuple attribute values.
Based on facts recorded in the database I can infer logical conclusions using query language.
The queries are written based on some assumptions about facts recorded in the database and these assumptions are modelled as constraints.
What “facts” are relevant is of course requirements dependent (and sometimes it is necessary to also record time to account for mutability).
In this view surrogate keys are strictly internal and do not have any business meaning
They are no more than “row identifiers” and cannot be sensibly used for constraints (because constraints are modelled based on business requirements).
Their utility is more for technicalities like replication.
Frankly, the restaurant example the "Owner" of the business should probably be considered part of its primary key - they don't announce "under new ownership/management" just for fun - the new owner wants to keep the brand recognition but discard historical opinions that are likely no longer true.
Indeed - the choice of identifier is a very important one and cannot be hand-waved by blind introduction of a surrogate key :)
—
Michal
> On 4 Feb 2025, at 22:41, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > > 04.02.2025 18:31:09 Michał Kłeczek <michal@kleczek.org>: > >> >>> On 4 Feb 2025, at 18:27, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: >>> >>> Unless the lookup table is actually a check constraint one can use to populate dropdown boxes in an interface. >> >> That is even worse because it ceases being transactional and users might select something different than what they seeon the screen. > > I might see what you want to point out. E.g. the table is COLOURS. The rec with id 1 is RED, the one with id 2 is BLUE,3 is GREE and so on. Now you load these values into the dropdown box that sports RED, BLUE, GREE and so on. While someoneselects GREE, there is a maintenance release changing GREE to YELLOW. So when that someone sends the selection byid to the backend, not GREE is selected but YELLOW. > > A) Your release changed the sementics of the record 3. It's meaning changed. I cannot recommend doing that. That’s what using natural keys and FK’s restricting their changes guarantee: no (accidental) changes to meaning of data. Even with cascading updates you still have transactional semantics (ie. the user selects what’s on the screen or gets anerror). > B) If you absolutely must change the semantic, put your application into maintenance mode in which noone can select anythingbeforehand. All this error prone hassle and downtime can be avoided with natural keys and guarantees that DBMS gives you. > > If the maintenance would just correct the typo from GREE to GREEN, nothing would happen. Yor customer still ordered thelavishly green E-Bike her hear ever desired. The question is: how do you _ensure_ that? — Michal
04.02.2025 18:12:02 David G. Johnston <david.g.johnston@gmail.com>:
I am under the impression that a lookup table IS an entity. You find them in star and snowflake models alike.
The presence or implementation of surrogate keys do not define in the least the type of database. It sole purpose is to surrogate the (speaking) business key such that updates on that key (think of typo) does not end up in an update orgy. Ok, maybe to simplify matters if you business key is made of more than one attribute/column. IMHO it is very good practice to still build a unique key on the business key and place a not-null-constraint on all its attributes.
I would not go that transcendently far, but my attributes change but it is still me, even though my age increases over time as do my good looks. ;-)
I'd prefer the term business key instead of primary key here, as, if you choose to use a surrogate key, that one becomes the PK while the BK is a UQ. ;-) Sorry, I got carried away. Having said that, I would leave the decision of taking the owner into the BK to the project. E.g. if you want to have the information of unbroken existence of a restaurant at a certain place, I dare say, it cannot be part of the BK. One could even argue that not even the name is part of the BK but only the geolocation (addresses can change too).
On Tue, Feb 4, 2025 at 9:31 AM Michał Kłeczek <michal@kleczek.org> wrote:
Well, we were talking about lookup tables and not entity modelling...
I am under the impression that a lookup table IS an entity. You find them in star and snowflake models alike.
Having surrogate keys makes your relational database more like a network/object oriented database where rows don’t represent facts but rather some entities that have identity independent from their attributes.
The presence or implementation of surrogate keys do not define in the least the type of database. It sole purpose is to surrogate the (speaking) business key such that updates on that key (think of typo) does not end up in an update orgy. Ok, maybe to simplify matters if you business key is made of more than one attribute/column. IMHO it is very good practice to still build a unique key on the business key and place a not-null-constraint on all its attributes.
My identity is separate from any single value of my attributes. Basically any single thing about me can be changed but the coherent existence of my "self" remains the same.
I would not go that transcendently far, but my attributes change but it is still me, even though my age increases over time as do my good looks. ;-)
Frankly, the restaurant example the "Owner" of the business should probably be considered part of its primary key - they don't announce "under new ownership/management" just for fun - the new owner wants to keep the brand recognition but discard historical opinions that are likely no longer true.
I'd prefer the term business key instead of primary key here, as, if you choose to use a surrogate key, that one becomes the PK while the BK is a UQ. ;-) Sorry, I got carried away. Having said that, I would leave the decision of taking the owner into the BK to the project. E.g. if you want to have the information of unbroken existence of a restaurant at a certain place, I dare say, it cannot be part of the BK. One could even argue that not even the name is part of the BK but only the geolocation (addresses can change too).
El 04-02-25 a las 18:08, Michał Kłeczek escribió: >> Reality tends to become so ambiguous as to not be >> reflectable (two entirely different restaurants eventually, >> within the flow of time, carry the very same name). >> >> A primary key is very likely not the proper place to reflect >> arbitrary business logic (is it the same restaurant or not ? >> what if two restaurants have the same name at the same time > These are of course problems ( and beyond the scope of my contrived example ). > > The point is though, that having surrogate PK not only does not solve these issues but makes them worse by kicking thecan down the road and allowing for inconsistencies. Only if you do not see the primary key as the main immutable value identifying an object, entity, you name it. Having said that, it is very questionable that a natural key (names to name one) can be a suitable primary key (think of typo).
> On 5 Feb 2025, at 19:07, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > > El 04-02-25 a las 18:08, Michał Kłeczek escribió: >>> Reality tends to become so ambiguous as to not be >>> reflectable (two entirely different restaurants eventually, >>> within the flow of time, carry the very same name). >>> >>> A primary key is very likely not the proper place to reflect >>> arbitrary business logic (is it the same restaurant or not ? >>> what if two restaurants have the same name at the same time >> These are of course problems ( and beyond the scope of my contrived example ). >> >> The point is though, that having surrogate PK not only does not solve these issues but makes them worse by kicking thecan down the road and allowing for inconsistencies. > Only if you do not see the primary key as the main immutable value identifying an object, entity, you name it. Surrogate key cannot identify any (real) object by definition :) What object is identified by PK value 42 in “restaurants” table? > Having said that, it is very questionable that a natural key (names to name one) can be a suitable primary key (think oftypo). Typos are indeed a problem but adding surrogate key does not solve it, I’m afraid. — Michal
El 05-02-25 a las 19:13, Michał Kłeczek escribió: >> Only if you do not see the primary key as the main immutable value identifying an object, entity, you name it. > Surrogate key cannot identify any (real) object by definition :) > What object is identified by PK value 42 in “restaurants” table? What object is identified by a PK value "löasidfhaösliw" in a restaurant table? It is the context only giving it sense and not less sense then 42 or "Pizza Hut". In fact on disk, you won't even find 42 or "Pizza Hut". On hard disks, e.g., it is the direction of a magnetic field of several locations on the disk. Btw, 42 seems to me a quite geeky name for a nerd restaurant. >> Having said that, it is very questionable that a natural key (names to name one) can be a suitable primary key (thinkof typo). > Typos are indeed a problem but adding surrogate key does not solve it, I’m afraid. In how far does it not solve it? Or maybe better asked. Is your problem that typos occur or is the problem the amount of hassle to fix it?
El 05-02-25 a las 13:55, Michał Kłeczek escribió: >> A) Your release changed the sementics of the record 3. It's meaning changed. I cannot recommend doing that. > That’s what using natural keys and FK’s restricting their changes guarantee: no (accidental) changes to meaning of data. > Even with cascading updates you still have transactional semantics (ie. the user selects what’s on the screen or gets anerror). Sorry, that is utter nonsense. You cannot ever guarantee an update does not mess up the semantics on the updated field, change the meaning. You would need a check constraint which in it turn needs to get set up where one can mess up things. >> B) If you absolutely must change the semantic, put your application into maintenance mode in which noone can select anythingbeforehand. > All this error prone hassle and downtime can be avoided with natural keys and guarantees that DBMS gives you. And I thought you would have denied the need of changing semantics above. And no, changing your natural keys semantically ALWAYS requires downtime to make sure you do not run into the race condition described above. >> If the maintenance would just correct the typo from GREE to GREEN, nothing would happen. Yor customer still ordered thelavishly green E-Bike her hear ever desired. > The question is: how do you _ensure_ that? Ensure, the update goes from GREE to GREEN? You cannot, simple as that. You just can minimize the risk by testing, testing, testing. But that holds equally true for the business key of a surrogate key table as natural key table. That's why the surrogate key is such an elegant construct. You can change business key of the record with id 3 from GREE to GREEN, VERT, GRÜN, VERDE or ASéLDHK()*NSLDFHP)(*Z . It keeps its meaning of the perception of the human eye of electromagnetic waves of the wavelength roughly between 495-570 nm (according to Wikipedia).
> On 5 Feb 2025, at 21:33, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > > > El 05-02-25 a las 13:55, Michał Kłeczek escribió: >>> A) Your release changed the sementics of the record 3. It's meaning changed. I cannot recommend doing that. >> That’s what using natural keys and FK’s restricting their changes guarantee: no (accidental) changes to meaning of data. >> Even with cascading updates you still have transactional semantics (ie. the user selects what’s on the screen or getsan error). > Sorry, that is utter nonsense. You cannot ever guarantee an update does not mess up the semantics on the updated field,change the meaning. Y But you can guarantee that if you change the value of the key after the user displays it - the user will get an error onsubmission (whereas with the surrogate key it would happily proceed without user noticing). > ou would need a check constraint which in it turn needs to get set up where one can mess up things. >>> B) If you absolutely must change the semantic, put your application into maintenance mode in which noone can select anythingbeforehand. >> All this error prone hassle and downtime can be avoided with natural keys and guarantees that DBMS gives you. > And I thought you would have denied the need of changing semantics above. And no, changing your natural keys semanticallyALWAYS requires downtime to make sure you do not run into the race condition described above. How so? The user is going to get FK violation - you do not need any downtime to make sure users don’t submit wrong values. >>> If the maintenance would just correct the typo from GREE to GREEN, nothing would happen. Yor customer still ordered thelavishly green E-Bike her hear ever desired. >> The question is: how do you _ensure_ that? > Ensure, the update goes from GREE to GREEN? You cannot, simple as that. You just can minimize the risk by testing, testing,testing. You can also simply disallow updates with FK constraint eliminating risk. > But that holds equally true for the business key of a surrogate key table as natural key table. That's why the surrogatekey is such an elegant construct. You can change business key of the record with id 3 from GREE to GREEN, VERT,GRÜN, VERDE or ASéLDHK()*NSLDFHP)(*Z . It keeps its meaning of the perception of the human eye of electromagnetic wavesof the wavelength roughly between 495-570 nm (according to Wikipedia). And why do you think unconstrained updating of business key is a good thing? You must implement rules governing what can and what cannot be changed *somewhere* - not doing it in the database means youhave to do it in applications. Anyway - let’s agree to disagree :) — Michal
El 05-02-25 a las 22:19, Michał Kłeczek escribió: > But you can guarantee that if you change the value of the key after the user displays it - the user will get an error onsubmission (whereas with the surrogate key it would happily proceed without user noticing). As you very rightly say happily proceed, because that is, what you actually want. The user chose GREE meaning green, which is updated in the meantime to GREEN. All good. Your solution throws an error for nothing. > How so? The user is going to get FK violation - you do not need any downtime to make sure users don’t submit wrong values. Sorry, I was not aware, your aim is to bother the customer with FK violation messages, if you can avoid it. >> Ensure, the update goes from GREE to GREEN? You cannot, simple as that. You just can minimize the risk by testing, testing,testing. > You can also simply disallow updates with FK constraint eliminating risk. This is nothing that is specific to surrogate or natural keys. If one disallow updates, one has to live with typos and everything. >> But that holds equally true for the business key of a surrogate key table as natural key table. That's why the surrogatekey is such an elegant construct. You can change business key of the record with id 3 from GREE to GREEN, VERT,GRÜN, VERDE or ASéLDHK()*NSLDFHP)(*Z . It keeps its meaning of the perception of the human eye of electromagnetic wavesof the wavelength roughly between 495-570 nm (according to Wikipedia). > And why do you think unconstrained updating of business key is a good thing? I am not sure what is your take on "unconstrained". As already mentioned, testing is the only constraint you can set up. And I am neither sure where your problem is with updating unless you have an update orgy because your ref-constraint goes on natural keys. > You must implement rules governing what can and what cannot be changed *somewhere* - not doing it in the database meansyou have to do it in applications. That sounds adventurous. Maybe I am not seeing what you mean, but is sound to me that you build a shadow database where you map your GREE to GREEN in the application layer so the application can display the correct value in the GUI, but no one has to update the core database. And if there is a typo in the shadow data base you build another mapping database on top of it? > Anyway - let’s agree to disagree :) Agreed :-)
On 6 Feb 2025, at 22:03, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:On 2025-02-04 22:41:38 +0100, Thiemo Kellner wrote:
I might see what you want to point out. E.g. the table is COLOURS. The
rec with id 1 is RED, the one with id 2 is BLUE, 3 is GREE and so on.
Now you load these values into the dropdown box that sports RED, BLUE,
GREE and so on. While someone selects GREE, there is a maintenance
release changing GREE to YELLOW. So when that someone sends the
selection by id to the backend, not GREE is selected but YELLOW.
I fail to see why use of a surrogate key is the problem here.
Either changing the color from GREE to YELLOW makes sense or it doesn't.
If it doesn't make sense, then it's release which is faulty, not the
model.
if it does make sense (I'm a bit at a loss when that might be the case,
maybe the "color" is just a code word, or maybe they are colors in a
design which are arbitrary but must be consistent), then the experience
that the user has is exactly the same as if the maintenance release was
applied just after they selected the color. Which might be a bit
confusing but is almost certainly what is wanted.
This is simply saying “what is implemented is certainly wanted, so what’s the point”.
The discussion is about *defining* what is wanted and using DBMS to *enforce* that.
A) Your release changed the sementics of the record 3. It's meaning
changed. I cannot recommend doing that.
If the release changed the semantics of an existing record the release
was almost certainly wrong.
Is it possible to minimize the risk of “wrong releases” using mechanisms that DBMS provides?
B) If you absolutely must change the semantic, put your application
into maintenance mode in which noone can select anything beforehand.
If the maintenance would just correct the typo from GREE to GREEN,
nothing would happen. Yor customer still ordered the lavishly green
E-Bike her hear ever desired.
Yeah, that's a good example where changing the color from GREE to YELLOW
doesn't make sense. Presumably that ID 3 is used as a foreign key in
lots of places, e,g. in an inventory table. Your bikes in stock won't
just magically change color just because you changed some text in the
database. So that change simply doesn't make sense and shouldn't be done
as part of a maintenance release. Confusing a few people who just happen
to open the dropdown in the wrong second is the least of your problems.
We are in agreement here. What I am also saying is:
having color name as PK and all FKs referencing it *prevents* these unwanted updates.
—
Michal
06.02.2025 22:04:34 Peter J. Holzer <hjp-pgsql@hjp.at>: >> I might see what you want to point out. E.g. the table is COLOURS. The >> rec with id 1 is RED, the one with id 2 is BLUE, 3 is GREE and so on. >> Now you load these values into the dropdown box that sports RED, BLUE, >> GREE and so on. While someone selects GREE, there is a maintenance >> release changing GREE to YELLOW. So when that someone sends the >> selection by id to the backend, not GREE is selected but YELLOW. > > I fail to see why use of a surrogate key is the problem here. It seems I did not make mys of clear. I don't see either and don' remember an occasion in over 20 years of DWH business theuse of surrogates was a problem. > Either changing the color from GREE to YELLOW makes sense or it doesn't. Hear, hear. > If it doesn't make sense, then it's release which is faulty, not the > model. ACK Thiemo