Thread: [off-topic] Database normalization, can't be done!

[off-topic] Database normalization, can't be done!

From
Matthew Hagerty
Date:
Greetings,

Sorry for the off-topic, I am probably responsible for more off-topic than
anyone else, but I don't know a better bunch of database people anywhere :)

So tell me, how do I set up a database where I have to rely on certain
records to not change, but they have to change?  Let me explain.  Invoices
table has a unique ID.  Line Items table stores the Invoice ID (foreign
key).  Now, the Line Item table also stores a product ID, used to lookup
the price in the Products table.  So, you create an invoice.  1 month later
there is a price change on some products.  Now all invoices are inaccurate,
they do not reflect what was charged at the time of the invoice, only what
that invoice would be today.

Okay, so lookup the price and store it in the Line Item table, problem
fixed.  Nope, that brakes a 3rd or 4th normal, I think.  You can have no
redundant data.  I learned this little problem the hard way, I created a
system based on looking everything up, very normalized.  To fix it, any
price changes resulted in a new product being created with the new price
and the old product being voided and only used to build history invoices.
This wound up costing more space that simply storing the price in the Line
Item table.

So, someone please tell me how to deal with this kind of problem.  I am
doing a similar system in terms of looking up data and storing data in
separate tables.  But now instead of just needing to store a price, I need
to store almost every field from one table into another to keep the history
records accurate.  Please help.  Any insight, suggestions, insults, etc.
would be greatly appreciated.


Thank you,
Matthew


Re: [INTERFACES] [off-topic] Database normalization, can't be done!

From
Tom Lane
Date:
Matthew Hagerty <matthew@venux.net> writes:
> Okay, so lookup the price and store it in the Line Item table, problem
> fixed.  Nope, that brakes a 3rd or 4th normal, I think.  You can have no
> redundant data.

If you want to apply normalization theory to a database that includes
historical data, then you have to recognize time as an explicit
variable.  In other words, the scenario you gave with a new record in
the Products table every time a product's price changes is the
academically "correct" approach.  Actually, since most of the other
info about a Product probably doesn't change when the price changes,
you'd really want a separate table Prices, keyed by product ID and
timespan of validity.  Then you'd use the product ID and date from
an invoice record to look up the applicable price.

If you want to be able to answer the question "What has the list price
for this product been in the past, as a function of time?" then you
might actually want to keep such a table.  If you only need to be able
to answer the question "What did we charge on invoice NNN?" then I think
storing the price in the invoice record is the correct approach.
(As D'Arcy points out, there are other factors such as customer
discounts that make it even more reasonable to treat the price charged
in a particular transaction as a fact unique to that transaction.)

The thing to recognize is that a historical price charged in some
past transaction is *not* the "same data" as the current price for
the same product, and so storing the value separately is not
denormalization.  It would be denormalization if you wanted the past
invoices to change when you updated the current price --- but obviously
you do not.

Normalization theory is just a tool to ensure that when you want to
change "one fact", you only have to change it in one place in your
database.  What is "one fact" is for you to define.  In this
application, past invoice prices are clearly not the same fact as
current prices.

            regards, tom lane

Re: [INTERFACES] [off-topic] Database normalization, can't be done!

From
Tom Lane
Date:
Forgot to make my other point ...

Matthew Hagerty <matthew@venux.net> writes:
> ... But now instead of just needing to store a price, I need
> to store almost every field from one table into another to keep the history
> records accurate.

Well, is that wrong?  If the history records represent past values
of the "master" record, that may be exactly what you need to do.

Perhaps you are concerned that there will be a lot of duplication
because history records are generated much more frequently than the
master data actually changes.  If so, you might steal an idea from
the old "time travel" feature in Postgres: generate a new master record
whenever the master data changes, and mark each such record with its
beginning and ending times of validity.  (A record is created with
begin time = 'now' and end time = '+infinity'; when it is superseded
the end time is set to 'now'; *no* other change is ever made to an
existing record.)  Then the history records can link to the current-
at-the-time master record, and they only need to directly store whatever
detail data is not in the master record.  You can also readily find
out what the master values were at any past time, whether there is a
history record for that particular time or not.

Whether this is worth the trouble depends on how much space you can save
in redundant history records and on the desired update properties ---
that is, do you want to be able to change the data of many history
records by modifying one copy of the master record?  (Of course,
modifying a past master record violates the "time travel" abstraction,
but if you really need to be able to rewrite history you can.)

The time travel feature was really kinda cool.  I think it is still
described in the Postgres documentation, even though it was removed a
while ago for performance reasons.  (The space overhead was intolerable
for applications that didn't need time travel.)

            regards, tom lane

Re: [INTERFACES] [off-topic] Database normalization, can't be done!

From
"Billy G. Allie"
Date:
Matthew Hagerty wrote:
> So tell me, how do I set up a database where I have to rely on certain
> records to not change, but they have to change?  Let me explain.  Invoices
> table has a unique ID.  Line Items table stores the Invoice ID (foreign
> key).  Now, the Line Item table also stores a product ID, used to lookup
> the price in the Products table.  So, you create an invoice.  1 month later
> there is a price change on some products.  Now all invoices are inaccurate,
> they do not reflect what was charged at the time of the invoice, only what
> that invoice would be today.
>
> Okay, so lookup the price and store it in the Line Item table, problem
> fixed.  Nope, that brakes a 3rd or 4th normal, I think.  You can have no
> redundant data.  I learned this little problem the hard way, I created a
> system based on looking everything up, very normalized.  To fix it, any
> price changes resulted in a new product being created with the new price
> and the old product being voided and only used to build history invoices.
> This wound up costing more space that simply storing the price in the Line
> Item table.

The data is not redundant if you define the price in the line item table
to be the invoiced price.  The invoiced price can be the product price at
the time the invoice was created, or it could be the product price less a
discount, or it could be the product price plus a markup, or it could be
a price a salesman said the buyer could purchase the product at.

The point I trying to make is that the invoiced price may have the same
value as the product price, but it is not the same as the product price.
Therefore it is not redundant data and does not break any normalization
rules.
___________________________________________________________________________
____       | Billy G. Allie    | Domain....: Bill.Allie@mug.org
|  /|      | 7436 Hartwell     | Compuserve: 76337,2061
|-/-|----- | Dearborn, MI 48126| MSN.......: B_G_Allie@email.msn.com
|/  |LLIE  | (313) 582-1540    |