Re: Table design - Mailing list pgsql-general

From Guy Rouillier
Subject Re: Table design
Date
Msg-id CC1CF380F4D70844B01D45982E671B239E8CF7@mtxexch01.add0.masergy.com
Whole thread Raw
In response to Table design  (Sean Davis <sdavis2@mail.nih.gov>)
Responses Re: Table design
List pgsql-general
Sean Davis wrote:
> This might be a bit off-topic, but I'm curious what folks would do
> with this
> situation:
>
> I have about 6 different tab-delimited file types, all of which store
> similar information (microarray gene expression).  However, the files
> come from different manufacturers, so each has slightly different
> fields with different meanings.  However, there are a few columns
> that are shared.  I may need to add table formats in the future (as
> we get more manufacturers). I can think of at least three ways to go
> about storing these data:
>
> 1) Create a single table that has as many columns as needed for ALL
> formats and make manufacturer-specific views, naming columns in the
> view as appropriate.  Then put rules on the view for inserts,
> updates, etc.  This is my first choice, I think, but adding a new
> manufacturer's format means creating a new view and possibly adding
> columns; some columns may NULL for large portions of the table.
>
> 2) Use postgres inheritance, but even shared columns in our data may
> have different names depending on the manufacturer, so there may be
> views involved anyway.

I'm unclear if this is just a naming issue, or if the fields in the
files have different meaning.  If it's just a case that supplier A names
a field "foo" while supplier B names a field with the same meaning
"bar", I would think you'd want to coalesce all these incoming files
into a single table containing columns that have meaning to your
organization.  The effort then just becomes one of mapping incoming
fields into the proper columns, but the end result would be something
much more consistent and meaningful to your organization.

If on the other hand all these incoming fields have different meaning
and you need to keep them stored separately, I would look into option
(4): just keep a separate table for each supplier, since you said that
even shared fields may have different meaning; then use a view over all
the tables to answer any queries across suppliers.

But I definitely wouldn't let the way your suppliers name their fields
in the files they send you drive how you design your database.  That's
just a data mapping issue which is easily solved during data import.

>
> 3) Use a fully-normalized strategy that stacks each column into one
> very long table--this would be my last choice.
>
> Thanks for any insight.
>
> (For replies, please try to reply to me directly as well as the list
> as I just get digests right now).
>
> Thanks,
> Sean
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org



--
Guy Rouillier


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: 'AS' column-alias beign ignored on outer select
Next
From: Sean Davis
Date:
Subject: Re: Table design