Re: Enforce primary key on every table during dev? - Mailing list pgsql-general

From Steven Lembark
Subject Re: Enforce primary key on every table during dev?
Date
Msg-id 20180301102258.485d09aa@wrkhors.com
Whole thread Raw
In response to Re: Enforce primary key on every table during dev?  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: Enforce primary key on every table during dev?
Re: Enforce primary key on every table during dev?
List pgsql-general
> On 03/01/2018 02:20 AM, Alban Hertroys wrote:
> [snip]
> > Not to mention that not all types of tables necessarily have
> > suitable candidates for a primary key. You could add a surrogate
> > key based on a serial type, but in such cases that may not serve
> > any purpose other than to have some arbitrary primary key.
> >
> > An example of such tables is a monetary transaction table that
> > contains records for deposits and withdrawals to accounts. It will
> > have lots of foreign key references to other tables, but rows
> > containing the same values are probably not duplicates. Adding a
> > surrogate key to such a table just adds overhead, although that
> > could be useful in case specific rows need updating or deleting
> > without also modifying the other rows with that same data -
> > normally, only insertions and selections happen on such tables
> > though, and updates or deletes are absolutely forbidden -
> > corrections happen by inserting rows with an opposite transaction.  
> 
> Wouldn't the natural pk of such a table be timestamp+seqno, just as
> the natural pk of a transaction_detail table be transaction_no+seqno?

Start with Date's notion that a database exists to correclty represent
data about the real world. Storing un-identified data breaks this 
since we have no idea what the data means or have any good way of 
getting it back out. Net result is that any workable relational 
database will have at least one candidate key for any table in it.

If you can say that "rows containing the same values are not
duplicates" then you have a database that cannot be queried, audited,
or managed effectively. The problem is that you cannot identify the 
rows, and thus cannot select related ones, update them (e.g., to 
expire outdated records), or validate the content against any external
values (e.g., audit POS tapes using the database).

In the case of a monitary transaction you need a transaction
table, which will have most of the FK's, and a ledger for the 
transaction amounts.

A minimum candidate key for the transaction table would be account, 
timestamp, authorizing customer id, and channel. This allows two 
people to, say, make deposits at the same time or the same authorizing 
account (e.g., a credit card number) to be processed at the same time 
in two places.

The data for a transaction would include things like the final status, 
in-house authorizing agent, completion time.

The ledger entries would include the transaction SK, sequence within
the transaction, amount, and account. The ledger's candidate key is 
a transaction SK + sequence number -- the amount and account don't 
work because you may end up, say, making multiple deposits of $10
to your checking account on the same transaction.

The ledger's sequence value can be problematic, requiring external
code or moderately messy triggers to manage. Timestamps don't always
work and are subject to clock-skew. One way to avoid this is require
that a single transaction contain only unique amounts and accounts.
At that point the ledger becomes a degenerate table of  transaction id, 
amount, account (i.e., the entire table is nothing but a unique index).

This might require generating multiple database transactions for a
single external process (e.g., a customer walking up to the teller)
but does simplify processing quite a bit.

In both cases, having an SK on the ledger is useful for audit queries,
which might have to process a large number of ledger entries in code.
Extracting the ledger SK's in one query and walking down them using
a unique index can be more effecient than having to extract the values.

Either way, you can identify all of the transactions as unique and 
all of the ledger entries for that transaction. At that point the 
database can be queried for data, updated as necessary, audited 
against external data.

If you have a design with un-identified data it means that you havn't
normalized it properly: something is missing from the table with 
un-identifiable rows.

-- 
Steven Lembark                                       1505 National Ave
Workhorse Computing                                 Rockford, IL 61103
lembark@wrkhors.com                                    +1 888 359 3508


pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: Version upgrade: is restoring the postgres database needed?
Next
From: Vick Khera
Date:
Subject: Re: Version upgrade: is restoring the postgres database needed?