Thread: Request for advice: Table design
Good morning, list! I am in the middle of a project and i am just about to decide how to lay out the table(s) for work orders. I had been planningto use one table, but now i'm not so sure that it would be optimal. The order history will grow with something like 150K-200K rows/year and will have to be saved for ten years. The total number of active (new+validated+planned+ready_for_invoicing) orders would be something like 5K rows. I am thinking about two solutions: 1. One table, "orders" with a column named "status". New orders, validated orders, planned orders, ready for invoicing, andold orders, all in one big table. The status column would be updated a number of times for each order. 2. A number of tables, "new_orders", "validated_orders", "order_history" etc... No status column. Order rows would be movedfrom one table to another. Perhaps i should have only two tables: "orders" and "order_history"? More tables would mean more indexes (but smaller in size), harder to use/maintain a lot of tables and indexes and triggersand stuff... One table would mean that a lot of inactive orders would slow down access to active rows in the table... Any suggestions? Real world examples? /M
Mattias Kregert schrieb: > The order history will grow with something like 150K-200K rows/year > and will have to be saved for ten years. The total number of active > (new+validated+planned+ready_for_invoicing) orders would be something > like 5K rows. > > I am thinking about two solutions: > > 1. One table, "orders" with a column named "status". New orders, > validated orders, planned orders, ready for invoicing, and old > orders, all in one big table. The status column would be updated a > number of times for each order. > > 2. A number of tables, "new_orders", "validated_orders", > "order_history" etc... No status column. Order rows would be moved > from one table to another. Perhaps i should have only two tables: > "orders" and "order_history"? > > > More tables would mean more indexes (but smaller in size), harder to > use/maintain a lot of tables and indexes and triggers and stuff... > One table would mean that a lot of inactive orders would slow down > access to active rows in the table... > From my experience, I'd go for two tables orders and order_history. Then move orders which are no longer used from orders to order_history. E.g. you could keep only the last two years in orders, and move everything else over to order_history. Thus you'll never have more than 300-400k rows in the orders table, which isn't much at all. The only drawback is, that your application needs to be aware of that, in order to be able to offer reporting on historic orders. Using different tables for different order statuses is not a good idea, you will eventually run into problems with that. Thomas
Use one table - it's easier to maintain, more relationally correct and a growth of 150K-200K rows a year should be nothing to PostgreSQL. 2 million rows isn't very many or difficult for PostgreSQL to manage so long as it is correctly indexed and vacuumed following any really large bulk data changes. The speed of PostgreSQL won't be an issue either providing you have allocated enough memory in your pg_hba.conf and you are running on a reasonable server (ie - at leats 512mb of ram) You could actually consider three tables though: CREATE TABLE orders ( orderid SERIAL8 NOT NULL PRIMARY KEY, ... <misc order record stuff> ) WITHOUT OIDS; CREATE TABLE orderstatuses ( orderstatus VARCHAR(20) NOT NULL PRIMARY KEY, statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status', ) WITHOUT OIDS; CREATE TABLE orderhistory ( orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON DELETE CASCADE, historyuser VARCHAR(20) NOT NULL DEFAULT current_user, historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp, historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON UPDATE CASCADE ON DELETE RESTRICT, CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen) ) WITHOUT OIDS; ... and maybe create a non-unique index on orderid in orderhistory. This way you can always get the status of an order by selecting the most recent entry in the orderhistory table pertaining to a particular record and use a single plpgsql function add/modify data. You could also use a rewrite rule on a view displaying the most recent record. And now you have an order history defining an order's state at a point in time. Your style may differ to mine in terms of id's etc.. I know storing a whole bunch of varchar's in the orderhistory table and referencing orderstatuses may be a bit funny but I've always been of the opinion that it's best to use a key that actually describes the data. Rgds, Jason On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote: > Good morning, list! > I am in the middle of a project and i am just about to decide how to lay > out the table(s) for work orders. I had been planning to use one table, but > now i'm not so sure that it would be optimal. > > The order history will grow with something like 150K-200K rows/year and > will have to be saved for ten years. The total number of active > (new+validated+planned+ready_for_invoicing) orders would be something like > 5K rows. > > I am thinking about two solutions: > > 1. One table, "orders" with a column named "status". New orders, validated > orders, planned orders, ready for invoicing, and old orders, all in one big > table. The status column would be updated a number of times for each order. > > 2. A number of tables, "new_orders", "validated_orders", "order_history" > etc... No status column. Order rows would be moved from one table to > another. Perhaps i should have only two tables: "orders" and > "order_history"? > > > More tables would mean more indexes (but smaller in size), harder to > use/maintain a lot of tables and indexes and triggers and stuff... One > table would mean that a lot of inactive orders would slow down access to > active rows in the table... > > Any suggestions? Real world examples? > > /M > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
I agree with Jason. There's no reason to use several tables for data that are essentially the same in character; that's the point of a relational database to begin with! I also agree about the orderhistory table: use it to store cumulative updates, linked one-to-many with the orders table, so you can get the complete history of an order as well as the latest status very easily. Best, Andy Perrin ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu On Tue, 17 Jun 2003, Jason Godden wrote: > Use one table - it's easier to maintain, more relationally correct and a > growth of 150K-200K rows a year should be nothing to PostgreSQL. > > 2 million rows isn't very many or difficult for PostgreSQL to manage so long > as it is correctly indexed and vacuumed following any really large bulk data > changes. The speed of PostgreSQL won't be an issue either providing you have > allocated enough memory in your pg_hba.conf and you are running on a > reasonable server (ie - at leats 512mb of ram) > > You could actually consider three tables though: > > CREATE TABLE orders ( > orderid SERIAL8 NOT NULL PRIMARY KEY, > ... > <misc order record stuff> > ) WITHOUT OIDS; > > CREATE TABLE orderstatuses ( > orderstatus VARCHAR(20) NOT NULL PRIMARY KEY, > statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status', > ) WITHOUT OIDS; > > CREATE TABLE orderhistory ( > orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON > DELETE CASCADE, > historyuser VARCHAR(20) NOT NULL DEFAULT current_user, > historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp, > historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON > UPDATE CASCADE ON DELETE RESTRICT, > CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen) > ) WITHOUT OIDS; > > ... and maybe create a non-unique index on orderid in orderhistory. > > This way you can always get the status of an order by selecting the most > recent entry in the orderhistory table pertaining to a particular record and > use a single plpgsql function add/modify data. You could also use a rewrite > rule on a view displaying the most recent record. And now you have an order > history defining an order's state at a point in time. > > Your style may differ to mine in terms of id's etc.. I know storing a whole > bunch of varchar's in the orderhistory table and referencing orderstatuses > may be a bit funny but I've always been of the opinion that it's best to use > a key that actually describes the data. > > Rgds, > > Jason > > On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote: > > Good morning, list! > > I am in the middle of a project and i am just about to decide how to lay > > out the table(s) for work orders. I had been planning to use one table, but > > now i'm not so sure that it would be optimal. > > > > The order history will grow with something like 150K-200K rows/year and > > will have to be saved for ten years. The total number of active > > (new+validated+planned+ready_for_invoicing) orders would be something like > > 5K rows. > > > > I am thinking about two solutions: > > > > 1. One table, "orders" with a column named "status". New orders, validated > > orders, planned orders, ready for invoicing, and old orders, all in one big > > table. The status column would be updated a number of times for each order. > > > > 2. A number of tables, "new_orders", "validated_orders", "order_history" > > etc... No status column. Order rows would be moved from one table to > > another. Perhaps i should have only two tables: "orders" and > > "order_history"? > > > > > > More tables would mean more indexes (but smaller in size), harder to > > use/maintain a lot of tables and indexes and triggers and stuff... One > > table would mean that a lot of inactive orders would slow down access to > > active rows in the table... > > > > Any suggestions? Real world examples? > > > > /M > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
Mattias Kregert wrote: > Good morning, list! > I am in the middle of a project and i am just about to decide > how to lay out the table(s) for work orders. I had been > planning to use one table, but now i'm not so sure > that it would be optimal. By "optimal", do you mean "logically consistent"? > The order history will grow with something like 150K-200K > rows/year and will have to be saved for ten years. > The total number of active > (new+validated+planned+ready_for_invoicing) orders > would be something like 5K rows. That shouldn't affect logical design, unless you know you will be querying the database in a manner which cannot use indexes. > I am thinking about two solutions: > > 1. One table, "orders" with a column named "status". > New orders, validated orders, planned orders, ready > for invoicing, and old orders, all in one big table. > The status column would be updated a number of > times for each order. > > 2. A number of tables, "new_orders", "validated_orders", > "order_history" etc... No status column. Order rows > would be moved from one table to another. Perhaps i > should have only two tables: "orders" and "order_history"? > > More tables would mean more indexes (but smaller in size), > harder to use/maintain a lot of tables and indexes and > triggers and stuff... One table would mean that a lot > of inactive orders would slow down access to active > rows in the table... The question cannot be answered without a fully defined requirements document from the customer. Do you need to know only what the current status of a workorder is? If yes, one table might do. Otherwise, you need a history table. For each change in the the status, does the customer require the same attributes of the transaction to be recorded? If yes, then a two-table workorder/orderhistory table might suffice. If no, then a separate relation for each transaction type will be required. etc. etc. For example, Does the transition between "planned orders" and "ready for invoicing" involve the same sort of attributes as the transition from non-existence to "new orders"? Or, if the data involved is different, does the customer not care? And has stated so explicitly? Without such information, a design would be pure speculation. HTH, Mike Mascari mascarm@mascari.com
Mattias, Here goes how we do it: order_status (id, name): valid order statuses order (partners, dates, status, etc): main data of order refs to order_status order_statuschange (order, timestamp, oldstatus, newstatus, user): history of status changes on each order refs to order (cascade), order_status order_mod (order, mod, user, closed, etc): separate modifications for an order (only the newest is able to receive items, see below) refs to order (cascade) order_item (order, itemno, archive, archive_from_mod): items in each order. refs to order (cascade), order_mod order_notify (order, timestamp, user, operation): log and notify table does NOT ref to order so delete from order leaves this table alone. The same structure goes for shipments and invoices. A note: order_mod is a historical thing we'd like to get rid of, but our hands are always full to do that, so we let it stay. G. ------------------------------- cut here ------------------------------- ----- Original Message ----- From: "Mattias Kregert" <mattias@kregert.se> Sent: Tuesday, June 17, 2003 10:35 AM More tables would mean more indexes (but smaller in size), harder to use/maintain a lot of tables and indexes and triggers and stuff... One table would mean that a lot of inactive orders would slow down access to active rows in the table... Any suggestions? Real world examples?
Hi As an additional note; Older data is moved into a seperate table to reduce the number of records that require regular vacuuming. Since the tables would contain similar data it is simple to use union selections in a view with an additional column to indicate which table the data comes from. Using a view that combines the data from the two tables using a union, the data will appear to be comming from a single table. This method make archival access transparent. I have a realtime data collection system that I built. The data is put into tables on a yearly and monthly basis on the fly and new tables are created as needed. I use a union to join tables to access the data over several months. I just thought of a new idea, I am going to write a function to join the tables required over a timespan - but that's another story. Two tables are easy to join with a union : {if the column types are exactly matched} create view all_data select *,'current_data'::text as data_table from current_data ... union select *,'archive_data'::text from archive_data ... ; The last column will indicate the data's origin. Now to see all the data : select * from all_data ; Thats about it, using this method allows the "dynamic" table to small for quick maintenace and operation, while the "static" table needs less maintenace so it can be large with out the penalties incurred by frequent maintenace. Guy Andrew Perrin wrote: >I agree with Jason. There's no reason to use several tables for data that >are essentially the same in character; that's the point of a relational >database to begin with! I also agree about the orderhistory table: use it >to store cumulative updates, linked one-to-many with the orders table, so >you can get the complete history of an order as well as the latest status >very easily. > >Best, >Andy Perrin > >---------------------------------------------------------------------- >Andrew J Perrin - http://www.unc.edu/~aperrin >Assistant Professor of Sociology, U of North Carolina, Chapel Hill >clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu > > >On Tue, 17 Jun 2003, Jason Godden wrote: > > >>Use one table - it's easier to maintain, more relationally correct and a >>growth of 150K-200K rows a year should be nothing to PostgreSQL. >> >>2 million rows isn't very many or difficult for PostgreSQL to manage so long >>as it is correctly indexed and vacuumed following any really large bulk data >>changes. The speed of PostgreSQL won't be an issue either providing you have >>allocated enough memory in your pg_hba.conf and you are running on a >>reasonable server (ie - at leats 512mb of ram) >> >>You could actually consider three tables though: >> >>CREATE TABLE orders ( >> orderid SERIAL8 NOT NULL PRIMARY KEY, >> ... >> <misc order record stuff> >>) WITHOUT OIDS; >> >>CREATE TABLE orderstatuses ( >> orderstatus VARCHAR(20) NOT NULL PRIMARY KEY, >> statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status', >>) WITHOUT OIDS; >> >>CREATE TABLE orderhistory ( >> orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON >>DELETE CASCADE, >> historyuser VARCHAR(20) NOT NULL DEFAULT current_user, >> historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp, >> historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON >>UPDATE CASCADE ON DELETE RESTRICT, >> CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen) >>) WITHOUT OIDS; >> >>... and maybe create a non-unique index on orderid in orderhistory. >> >>This way you can always get the status of an order by selecting the most >>recent entry in the orderhistory table pertaining to a particular record and >>use a single plpgsql function add/modify data. You could also use a rewrite >>rule on a view displaying the most recent record. And now you have an order >>history defining an order's state at a point in time. >> >>Your style may differ to mine in terms of id's etc.. I know storing a whole >>bunch of varchar's in the orderhistory table and referencing orderstatuses >>may be a bit funny but I've always been of the opinion that it's best to use >>a key that actually describes the data. >> >>Rgds, >> >>Jason >> >>On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote: >> >>>Good morning, list! >>>I am in the middle of a project and i am just about to decide how to lay >>>out the table(s) for work orders. I had been planning to use one table, but >>>now i'm not so sure that it would be optimal. >>> >>>The order history will grow with something like 150K-200K rows/year and >>>will have to be saved for ten years. The total number of active >>>(new+validated+planned+ready_for_invoicing) orders would be something like >>>5K rows. >>> >>>I am thinking about two solutions: >>> >>>1. One table, "orders" with a column named "status". New orders, validated >>>orders, planned orders, ready for invoicing, and old orders, all in one big >>>table. The status column would be updated a number of times for each order. >>> >>>2. A number of tables, "new_orders", "validated_orders", "order_history" >>>etc... No status column. Order rows would be moved from one table to >>>another. Perhaps i should have only two tables: "orders" and >>>"order_history"? >>> >>> >>>More tables would mean more indexes (but smaller in size), harder to >>>use/maintain a lot of tables and indexes and triggers and stuff... One >>>table would mean that a lot of inactive orders would slow down access to >>>active rows in the table... >>> >>>Any suggestions? Real world examples? >>> >>>/M >>> >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >>> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 7: don't forget to increase your free space map settings >> >> > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > > >