Thread: Request for advice: Table design

Request for advice: Table design

From
"Mattias Kregert"
Date:
Good morning, list!
I am in the middle of a project and i am just about to decide how to lay out the table(s) for work orders. I had been
planningto use one table, but now i'm not so sure that it would be optimal. 

The order history will grow with something like 150K-200K rows/year and will have to be saved for ten years.
The total number of active (new+validated+planned+ready_for_invoicing) orders would be something like 5K rows.

I am thinking about two solutions:

1. One table, "orders" with a column named "status". New orders, validated orders, planned orders, ready for invoicing,
andold orders, all in one big table. The status column would be updated a number of times for each order. 

2. A number of tables, "new_orders", "validated_orders", "order_history" etc... No status column. Order rows would be
movedfrom one table to another. Perhaps i should have only two tables: "orders" and "order_history"? 


More tables would mean more indexes (but smaller in size), harder to use/maintain a lot of tables and indexes and
triggersand stuff... One table would mean that a lot of inactive orders would slow down access to active rows in the
table...

Any suggestions? Real world examples?

/M


Re: Request for advice: Table design

From
Thomas Kellerer
Date:
Mattias Kregert schrieb:
> The order history will grow with something like 150K-200K rows/year
> and will have to be saved for ten years. The total number of active
> (new+validated+planned+ready_for_invoicing) orders would be something
> like 5K rows.
>
> I am thinking about two solutions:
>
> 1. One table, "orders" with a column named "status". New orders,
> validated orders, planned orders, ready for invoicing, and old
> orders, all in one big table. The status column would be updated a
> number of times for each order.
>
> 2. A number of tables, "new_orders", "validated_orders",
> "order_history" etc... No status column. Order rows would be moved
> from one table to another. Perhaps i should have only two tables:
> "orders" and "order_history"?
>
>
> More tables would mean more indexes (but smaller in size), harder to
> use/maintain a lot of tables and indexes and triggers and stuff...
> One table would mean that a lot of inactive orders would slow down
> access to active rows in the table...
>

 From my experience, I'd go for two tables orders and order_history.
Then move orders which are no longer used from orders to order_history.
E.g. you could keep only the last two years in orders, and move
everything else over to order_history. Thus you'll never have more than
300-400k rows in the orders table, which isn't much at all. The only
drawback is, that your application needs to be aware of that, in order
to be able to offer reporting on historic orders.

Using different tables for different order statuses is not a good idea,
you will eventually run into problems with that.

Thomas


Re: Request for advice: Table design

From
Jason Godden
Date:
Use one table - it's easier to maintain, more relationally correct and a
growth of 150K-200K rows a year should be nothing to PostgreSQL.

2 million rows isn't very many or difficult for PostgreSQL to manage so long
as it is correctly indexed and vacuumed following any really large bulk data
changes.  The speed of PostgreSQL won't be an issue either providing you have
allocated enough memory in your pg_hba.conf and you are running on a
reasonable server (ie - at leats 512mb of ram)

You could actually consider three tables though:

CREATE TABLE orders (
  orderid SERIAL8 NOT NULL PRIMARY KEY,
  ...
  <misc order record stuff>
) WITHOUT OIDS;

CREATE TABLE orderstatuses (
  orderstatus VARCHAR(20) NOT NULL PRIMARY KEY,
  statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status',
) WITHOUT OIDS;

CREATE TABLE orderhistory (
  orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON
DELETE CASCADE,
  historyuser VARCHAR(20) NOT NULL DEFAULT current_user,
  historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
  historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON
UPDATE CASCADE ON DELETE RESTRICT,
  CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen)
) WITHOUT OIDS;

... and maybe create a non-unique index on orderid in orderhistory.

This way you can always get the status of an order by selecting the most
recent entry in the orderhistory table pertaining to a particular record and
use a single plpgsql function add/modify data.  You could also use a rewrite
rule on a view displaying the most recent record.  And now you have an order
history defining an order's state at a point in time.

Your style may differ to mine in terms of id's etc.. I know storing a whole
bunch of varchar's in the orderhistory table and referencing orderstatuses
may be a bit funny but I've always been of the opinion that it's best to use
a key that actually describes the data.

Rgds,

Jason

On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote:
> Good morning, list!
> I am in the middle of a project and i am just about to decide how to lay
> out the table(s) for work orders. I had been planning to use one table, but
> now i'm not so sure that it would be optimal.
>
> The order history will grow with something like 150K-200K rows/year and
> will have to be saved for ten years. The total number of active
> (new+validated+planned+ready_for_invoicing) orders would be something like
> 5K rows.
>
> I am thinking about two solutions:
>
> 1. One table, "orders" with a column named "status". New orders, validated
> orders, planned orders, ready for invoicing, and old orders, all in one big
> table. The status column would be updated a number of times for each order.
>
> 2. A number of tables, "new_orders", "validated_orders", "order_history"
> etc... No status column. Order rows would be moved from one table to
> another. Perhaps i should have only two tables: "orders" and
> "order_history"?
>
>
> More tables would mean more indexes (but smaller in size), harder to
> use/maintain a lot of tables and indexes and triggers and stuff... One
> table would mean that a lot of inactive orders would slow down access to
> active rows in the table...
>
> Any suggestions? Real world examples?
>
> /M
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: Request for advice: Table design

From
Andrew Perrin
Date:
I agree with Jason. There's no reason to use several tables for data that
are essentially the same in character; that's the point of a relational
database to begin with!  I also agree about the orderhistory table: use it
to store cumulative updates, linked one-to-many with the orders table, so
you can get the complete history of an order as well as the latest status
very easily.

Best,
Andy Perrin

----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu


On Tue, 17 Jun 2003, Jason Godden wrote:

> Use one table - it's easier to maintain, more relationally correct and a
> growth of 150K-200K rows a year should be nothing to PostgreSQL.
>
> 2 million rows isn't very many or difficult for PostgreSQL to manage so long
> as it is correctly indexed and vacuumed following any really large bulk data
> changes.  The speed of PostgreSQL won't be an issue either providing you have
> allocated enough memory in your pg_hba.conf and you are running on a
> reasonable server (ie - at leats 512mb of ram)
>
> You could actually consider three tables though:
>
> CREATE TABLE orders (
>   orderid SERIAL8 NOT NULL PRIMARY KEY,
>   ...
>   <misc order record stuff>
> ) WITHOUT OIDS;
>
> CREATE TABLE orderstatuses (
>   orderstatus VARCHAR(20) NOT NULL PRIMARY KEY,
>   statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status',
> ) WITHOUT OIDS;
>
> CREATE TABLE orderhistory (
>   orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON
> DELETE CASCADE,
>   historyuser VARCHAR(20) NOT NULL DEFAULT current_user,
>   historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
>   historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON
> UPDATE CASCADE ON DELETE RESTRICT,
>   CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen)
> ) WITHOUT OIDS;
>
> ... and maybe create a non-unique index on orderid in orderhistory.
>
> This way you can always get the status of an order by selecting the most
> recent entry in the orderhistory table pertaining to a particular record and
> use a single plpgsql function add/modify data.  You could also use a rewrite
> rule on a view displaying the most recent record.  And now you have an order
> history defining an order's state at a point in time.
>
> Your style may differ to mine in terms of id's etc.. I know storing a whole
> bunch of varchar's in the orderhistory table and referencing orderstatuses
> may be a bit funny but I've always been of the opinion that it's best to use
> a key that actually describes the data.
>
> Rgds,
>
> Jason
>
> On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote:
> > Good morning, list!
> > I am in the middle of a project and i am just about to decide how to lay
> > out the table(s) for work orders. I had been planning to use one table, but
> > now i'm not so sure that it would be optimal.
> >
> > The order history will grow with something like 150K-200K rows/year and
> > will have to be saved for ten years. The total number of active
> > (new+validated+planned+ready_for_invoicing) orders would be something like
> > 5K rows.
> >
> > I am thinking about two solutions:
> >
> > 1. One table, "orders" with a column named "status". New orders, validated
> > orders, planned orders, ready for invoicing, and old orders, all in one big
> > table. The status column would be updated a number of times for each order.
> >
> > 2. A number of tables, "new_orders", "validated_orders", "order_history"
> > etc... No status column. Order rows would be moved from one table to
> > another. Perhaps i should have only two tables: "orders" and
> > "order_history"?
> >
> >
> > More tables would mean more indexes (but smaller in size), harder to
> > use/maintain a lot of tables and indexes and triggers and stuff... One
> > table would mean that a lot of inactive orders would slow down access to
> > active rows in the table...
> >
> > Any suggestions? Real world examples?
> >
> > /M
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: Request for advice: Table design

From
Mike Mascari
Date:
Mattias Kregert wrote:

> Good morning, list!

> I am in the middle of a project and i am just about to decide
> how to lay out the table(s) for work orders. I had been
> planning to use one table, but now i'm not so sure
> that it would be optimal.

By "optimal", do you mean "logically consistent"?

> The order history will grow with something like 150K-200K
> rows/year and will have to be saved for ten years.
> The total number of active
> (new+validated+planned+ready_for_invoicing) orders
> would be something like 5K rows.

That shouldn't affect logical design, unless you know you will be
querying the database in a manner which cannot use indexes.

> I am thinking about two solutions:
>
> 1. One table, "orders" with a column named "status".
> New orders, validated orders, planned orders, ready
> for invoicing, and old orders, all in one big table.
> The status column would be updated a number of
> times for each order.
>
> 2. A number of tables, "new_orders", "validated_orders",
> "order_history" etc... No status column. Order rows
> would be moved from one table to another. Perhaps i
> should have only two tables: "orders" and "order_history"?
>
> More tables would mean more indexes (but smaller in size),
> harder to use/maintain a lot of tables and indexes and
> triggers and stuff... One table would mean that a lot
> of inactive orders would slow down access to active
> rows in the table...

The question cannot be answered without a fully defined requirements
document from the customer.

Do you need to know only what the current status of a workorder is? If
yes, one table might do. Otherwise, you need a history table. For each
change in the the status, does the customer require the same
attributes of the transaction to be recorded? If yes, then a two-table
workorder/orderhistory table might suffice. If no, then a separate
relation for each transaction type will be required. etc. etc.

For example,

Does the transition between "planned orders" and "ready for invoicing"
involve the same sort of attributes as the transition from
non-existence to "new orders"? Or, if the data involved is different,
does the customer not care? And has stated so explicitly?

Without such information, a design would be pure speculation.

HTH,

Mike Mascari
mascarm@mascari.com


Re: Request for advice: Table design

From
SZUCS Gábor
Date:
Mattias,

Here goes how we do it:

order_status (id, name):
    valid order statuses
order (partners, dates, status, etc):
    main data of order
    refs to order_status
order_statuschange (order, timestamp, oldstatus, newstatus, user):
    history of status changes on each order
    refs to order (cascade), order_status
order_mod (order, mod, user, closed, etc):
    separate modifications for an order (only the newest is able to receive
items,
    see below)
    refs to order (cascade)
order_item (order, itemno, archive, archive_from_mod):
    items in each order.
    refs to order (cascade), order_mod
order_notify (order, timestamp, user, operation):
    log and notify table
    does NOT ref to order so delete from order leaves this table alone.

The same structure goes for shipments and invoices. A note: order_mod is a
historical thing we'd like to get rid of, but our hands are always full to
do that, so we let it stay.

G.
------------------------------- cut here -------------------------------
----- Original Message -----
From: "Mattias Kregert" <mattias@kregert.se>
Sent: Tuesday, June 17, 2003 10:35 AM


More tables would mean more indexes (but smaller in size), harder to
use/maintain a lot of tables and indexes and triggers and stuff... One table
would mean that a lot of inactive orders would slow down access to active
rows in the table...

Any suggestions? Real world examples?


Re: Request for advice: Table design

From
Guy Fraser
Date:
Hi

As an additional note;

Older data is moved into a seperate table to reduce the number of
records that require regular vacuuming. Since the tables would contain
similar data it is simple to use union selections in a view with an
additional column to indicate which table the data comes from. Using a
view that combines the data from the two tables using a union, the data
will appear to be comming from a single table. This method make archival
access transparent.

I have a realtime data collection system that I built. The data is put
into tables on a yearly and monthly basis on the fly and new tables are
created as needed. I use a union  to join tables to access the data over
several months. I just thought of a new idea, I am going to write a
function to join the tables required over a timespan - but that's
another story.

Two tables are easy to join with a union :
{if the column types are exactly matched}

create view all_data
 select *,'current_data'::text as data_table from current_data ...
 union
 select *,'archive_data'::text from archive_data ...
;

The last column will indicate the data's origin.

Now to see all the data :

select * from all_data ;

Thats about it, using this method allows the "dynamic" table to small
for quick maintenace and operation, while the "static" table needs less
maintenace so it can be large with out the penalties incurred by
frequent maintenace.

Guy

Andrew Perrin wrote:

>I agree with Jason. There's no reason to use several tables for data that
>are essentially the same in character; that's the point of a relational
>database to begin with!  I also agree about the orderhistory table: use it
>to store cumulative updates, linked one-to-many with the orders table, so
>you can get the complete history of an order as well as the latest status
>very easily.
>
>Best,
>Andy Perrin
>
>----------------------------------------------------------------------
>Andrew J Perrin - http://www.unc.edu/~aperrin
>Assistant Professor of Sociology, U of North Carolina, Chapel Hill
>clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu
>
>
>On Tue, 17 Jun 2003, Jason Godden wrote:
>
>
>>Use one table - it's easier to maintain, more relationally correct and a
>>growth of 150K-200K rows a year should be nothing to PostgreSQL.
>>
>>2 million rows isn't very many or difficult for PostgreSQL to manage so long
>>as it is correctly indexed and vacuumed following any really large bulk data
>>changes.  The speed of PostgreSQL won't be an issue either providing you have
>>allocated enough memory in your pg_hba.conf and you are running on a
>>reasonable server (ie - at leats 512mb of ram)
>>
>>You could actually consider three tables though:
>>
>>CREATE TABLE orders (
>>  orderid SERIAL8 NOT NULL PRIMARY KEY,
>>  ...
>>  <misc order record stuff>
>>) WITHOUT OIDS;
>>
>>CREATE TABLE orderstatuses (
>>  orderstatus VARCHAR(20) NOT NULL PRIMARY KEY,
>>  statusdesc VARCHAR(200) NOT NULL DEFAULT 'New Status',
>>) WITHOUT OIDS;
>>
>>CREATE TABLE orderhistory (
>>  orderid INT8 NOT NULL REFERENCES orders (orderid) ON UPDATE CASCADE ON
>>DELETE CASCADE,
>>  historyuser VARCHAR(20) NOT NULL DEFAULT current_user,
>>  historywhen TIMESTAMP NOT NULL DEFAULT current_timestamp,
>>  historystatus VARCHAR(20) NOT NULL REFERENCES orderstatuses (orderstatus) ON
>>UPDATE CASCADE ON DELETE RESTRICT,
>>  CONSTRAINT orderhistory_pkey PRIMARY KEY (orderid,historywhen)
>>) WITHOUT OIDS;
>>
>>... and maybe create a non-unique index on orderid in orderhistory.
>>
>>This way you can always get the status of an order by selecting the most
>>recent entry in the orderhistory table pertaining to a particular record and
>>use a single plpgsql function add/modify data.  You could also use a rewrite
>>rule on a view displaying the most recent record.  And now you have an order
>>history defining an order's state at a point in time.
>>
>>Your style may differ to mine in terms of id's etc.. I know storing a whole
>>bunch of varchar's in the orderhistory table and referencing orderstatuses
>>may be a bit funny but I've always been of the opinion that it's best to use
>>a key that actually describes the data.
>>
>>Rgds,
>>
>>Jason
>>
>>On Tue, 17 Jun 2003 06:35 pm, Mattias Kregert wrote:
>>
>>>Good morning, list!
>>>I am in the middle of a project and i am just about to decide how to lay
>>>out the table(s) for work orders. I had been planning to use one table, but
>>>now i'm not so sure that it would be optimal.
>>>
>>>The order history will grow with something like 150K-200K rows/year and
>>>will have to be saved for ten years. The total number of active
>>>(new+validated+planned+ready_for_invoicing) orders would be something like
>>>5K rows.
>>>
>>>I am thinking about two solutions:
>>>
>>>1. One table, "orders" with a column named "status". New orders, validated
>>>orders, planned orders, ready for invoicing, and old orders, all in one big
>>>table. The status column would be updated a number of times for each order.
>>>
>>>2. A number of tables, "new_orders", "validated_orders", "order_history"
>>>etc... No status column. Order rows would be moved from one table to
>>>another. Perhaps i should have only two tables: "orders" and
>>>"order_history"?
>>>
>>>
>>>More tables would mean more indexes (but smaller in size), harder to
>>>use/maintain a lot of tables and indexes and triggers and stuff... One
>>>table would mean that a lot of inactive orders would slow down access to
>>>active rows in the table...
>>>
>>>Any suggestions? Real world examples?
>>>
>>>/M
>>>
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 7: don't forget to increase your free space map settings
>>
>>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>
>
>