Re: pervasiveness of surrogate (also called synthetic) keys - Mailing list pgsql-general

From Misa Simic
Subject Re: pervasiveness of surrogate (also called synthetic) keys
Date
Msg-id BANLkTinmu-qN_hnWKhMtmWD5KDeDWc_XOQ@mail.gmail.com
Whole thread Raw
In response to Re: pervasiveness of surrogate (also called synthetic) keys  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general

Being the “first line” or the “second line” of a physical invoice is a property for that line.  Identifying its position on the invoice is only natural. 
 
Specifically, the position of the line on the invoice; you can't have to invoice lines at the second line of aninvoice for example.   This information is visible and important to users because presumably the invoice lines as entered into an application are in the order you would want them printed out on aphysical invoice.

I agree that Line No could say, that is order in which Item is added to Invoice - but definately it will not mean that it should be printed in that order... Items could be grouped by Category or whatever on Invoice... in which order Items will be printed I would say it is more Business Rule... and as all Business Rules it is changeable by time... for example 1 rule known to me: SLA item (charge for Service Licence Agreement) is always on the end, and its price is 20% of other items SubTotal for what SLA should be paid (so for some Items on Invoice, SLA should be paid, for some not - we don't know in which order user will enter Items)...


 

By your reasoning all identifiers are synthetically generated if you consider there is never truly only a single instance of anything in the multi-verse.  The only truly unique identifier would be the time+place of an objects creation.

 

“Hello - person born in Liverpool London, St. Whatever hospital, Room 101 @ 13:14:57AM on the 5th of March 2001 – how may I direct your call?” (I guess you could use the conception date as well although twins+ might be tough to distinguish in that case).

 

 
Specifically, the position of the line on the invoice; you can't have to invoice lines at the second line of aninvoice for example.   This information is visible and important to users because presumably the invoice lines as entered into an application are in the order you would want them printed out on aphysical invoice.


Yes, I agree, I really think that "Natural" keys do not exist... but definatelly there are Entity properties, like e-mail, InvoiceNo (in whatever format), what must be Unique (if Business Rule say: that property must be unique).... That is even hard for humans to think and identify someone on that way - so humans invented SSN - to make them life easier... but it is known that SSN is not best canditate for PK... 

How humans identify someone/something - is totally different way then how Relational Database identifies some record...

Human better Identify some person by picture (if it is known to them, if have it in their memory) - Relational Database, well, today could by 1 digitial picture and just that one...  Tomorrow maybe...


If you key on email and the query coming from another table doesn't
need any other email properties, you just saved yourself a join
without having to de-normailze in the classic sense.  You also get to
cut out many sorts on similar principles.


Performance and Reliability - that are main things I worry about... and then made some tests...

JOIN - that is natural thing for Relational Database... Especially JOIN on Keys - I agree that JOIN on non key columns are expensive...

I can't imagine, that we can get some info from ERP system without join on every day basis, like just e-mail, Amount as salary in that period... without First Name, Last Name... but lets do the test

So we have two tables, in two models...

Employees table: 50 000 records

Salaries table:  100 000 records

(For a first test, I didn't have time to wait to fill table with 100 000 000 records... but will run it over night...) 

Model one:
Employee
-e-mail (PK)
-FirstName
-LastName


Salary (PK: e-mail, StartDate, EndDate)
-e-mail (FK to employee)
-StartDate
-EndDate
-Amount

(I recommend StartDate/EndDate to replace with Period datatype from many reasons - but out of this test topic)


Model two:
Employee
-EmpID (UUID -PK)
-e-mail
-FirstName
-LastName


Salary 
-SalID (UUID PK)
-EmpID (UUID FK to Employee)
-StartDate
-EndDate
-Amount


Test for Model 1:

SELECT e-mail, startdate, enddate, Amount FROM Salary

Returns 100 000k records in 1000 ms

SELECT Salary.e-mail, FirstName, LastName, startdate, enddate, Amount FROM Salary INNER JOIN Employees ON Salary.e-mail = Employees.e-mail

Returns 100 000k records in 240 ms

WOW - query with JOIN returned same result 4 times faster!!!

just tried then:

SELECT Salary.e-mail, FirstName, LastName, startdate, enddate, Amount FROM Salary INNER JOIN Employees ON Salary.e-mail = Employees.e-mail

all info from Salary table but just joined to emploees without columns from Employees table...

Returns 100 000k records in 240 ms


test for Model2:

SELECT emp_id, startdate, enddate, amount FROM Salary

Returned 100 000k rows in 1314 ms (worst result - though in practice that query would never been run because of emp_id as GUID means nothing to human...)

SELECT email, fname, lname, startdate, enddate, amount FROM Salary  INNER JOIN Employees ON Salary.emp_id = Employees.emp_id

Returns 100 000k records in 240 ms (the same as JOIN on e-mail column in Model1)



 
 Like many systems of
the day, it was written in COBOL over an isam data store which didn't
have the technical capability to do what sequences or uuids do today


I agree. Seq or uuids solved many problems today...and perfectly fits in Relational Database design as PK...

With Kind Regards,

Misa 

pgsql-general by date:

Previous
From: "David Johnston"
Date:
Subject: GROUP BY Wildcard Syntax Thought
Next
From: Merlin Moncure
Date:
Subject: Re: ZEOS or PGDAC - How to lock a resource?