Thread: [off-topic] Database normalization, can't be done!
Greetings, Sorry for the off-topic, I am probably responsible for more off-topic than anyone else, but I don't know a better bunch of database people anywhere :) So tell me, how do I set up a database where I have to rely on certain records to not change, but they have to change? Let me explain. Invoices table has a unique ID. Line Items table stores the Invoice ID (foreign key). Now, the Line Item table also stores a product ID, used to lookup the price in the Products table. So, you create an invoice. 1 month later there is a price change on some products. Now all invoices are inaccurate, they do not reflect what was charged at the time of the invoice, only what that invoice would be today. Okay, so lookup the price and store it in the Line Item table, problem fixed. Nope, that brakes a 3rd or 4th normal, I think. You can have no redundant data. I learned this little problem the hard way, I created a system based on looking everything up, very normalized. To fix it, any price changes resulted in a new product being created with the new price and the old product being voided and only used to build history invoices. This wound up costing more space that simply storing the price in the Line Item table. So, someone please tell me how to deal with this kind of problem. I am doing a similar system in terms of looking up data and storing data in separate tables. But now instead of just needing to store a price, I need to store almost every field from one table into another to keep the history records accurate. Please help. Any insight, suggestions, insults, etc. would be greatly appreciated. Thank you, Matthew
Matthew Hagerty <matthew@venux.net> writes: > Okay, so lookup the price and store it in the Line Item table, problem > fixed. Nope, that brakes a 3rd or 4th normal, I think. You can have no > redundant data. If you want to apply normalization theory to a database that includes historical data, then you have to recognize time as an explicit variable. In other words, the scenario you gave with a new record in the Products table every time a product's price changes is the academically "correct" approach. Actually, since most of the other info about a Product probably doesn't change when the price changes, you'd really want a separate table Prices, keyed by product ID and timespan of validity. Then you'd use the product ID and date from an invoice record to look up the applicable price. If you want to be able to answer the question "What has the list price for this product been in the past, as a function of time?" then you might actually want to keep such a table. If you only need to be able to answer the question "What did we charge on invoice NNN?" then I think storing the price in the invoice record is the correct approach. (As D'Arcy points out, there are other factors such as customer discounts that make it even more reasonable to treat the price charged in a particular transaction as a fact unique to that transaction.) The thing to recognize is that a historical price charged in some past transaction is *not* the "same data" as the current price for the same product, and so storing the value separately is not denormalization. It would be denormalization if you wanted the past invoices to change when you updated the current price --- but obviously you do not. Normalization theory is just a tool to ensure that when you want to change "one fact", you only have to change it in one place in your database. What is "one fact" is for you to define. In this application, past invoice prices are clearly not the same fact as current prices. regards, tom lane
Forgot to make my other point ... Matthew Hagerty <matthew@venux.net> writes: > ... But now instead of just needing to store a price, I need > to store almost every field from one table into another to keep the history > records accurate. Well, is that wrong? If the history records represent past values of the "master" record, that may be exactly what you need to do. Perhaps you are concerned that there will be a lot of duplication because history records are generated much more frequently than the master data actually changes. If so, you might steal an idea from the old "time travel" feature in Postgres: generate a new master record whenever the master data changes, and mark each such record with its beginning and ending times of validity. (A record is created with begin time = 'now' and end time = '+infinity'; when it is superseded the end time is set to 'now'; *no* other change is ever made to an existing record.) Then the history records can link to the current- at-the-time master record, and they only need to directly store whatever detail data is not in the master record. You can also readily find out what the master values were at any past time, whether there is a history record for that particular time or not. Whether this is worth the trouble depends on how much space you can save in redundant history records and on the desired update properties --- that is, do you want to be able to change the data of many history records by modifying one copy of the master record? (Of course, modifying a past master record violates the "time travel" abstraction, but if you really need to be able to rewrite history you can.) The time travel feature was really kinda cool. I think it is still described in the Postgres documentation, even though it was removed a while ago for performance reasons. (The space overhead was intolerable for applications that didn't need time travel.) regards, tom lane
Matthew Hagerty wrote: > So tell me, how do I set up a database where I have to rely on certain > records to not change, but they have to change? Let me explain. Invoices > table has a unique ID. Line Items table stores the Invoice ID (foreign > key). Now, the Line Item table also stores a product ID, used to lookup > the price in the Products table. So, you create an invoice. 1 month later > there is a price change on some products. Now all invoices are inaccurate, > they do not reflect what was charged at the time of the invoice, only what > that invoice would be today. > > Okay, so lookup the price and store it in the Line Item table, problem > fixed. Nope, that brakes a 3rd or 4th normal, I think. You can have no > redundant data. I learned this little problem the hard way, I created a > system based on looking everything up, very normalized. To fix it, any > price changes resulted in a new product being created with the new price > and the old product being voided and only used to build history invoices. > This wound up costing more space that simply storing the price in the Line > Item table. The data is not redundant if you define the price in the line item table to be the invoiced price. The invoiced price can be the product price at the time the invoice was created, or it could be the product price less a discount, or it could be the product price plus a markup, or it could be a price a salesman said the buyer could purchase the product at. The point I trying to make is that the invoiced price may have the same value as the product price, but it is not the same as the product price. Therefore it is not redundant data and does not break any normalization rules. ___________________________________________________________________________ ____ | Billy G. Allie | Domain....: Bill.Allie@mug.org | /| | 7436 Hartwell | Compuserve: 76337,2061 |-/-|----- | Dearborn, MI 48126| MSN.......: B_G_Allie@email.msn.com |/ |LLIE | (313) 582-1540 |