Chuck,
> Just a couple more questions, and I think I'll call it quits.
Ha!
> 1) gb_accessions :: clones
>
> I think the values in TABLE gb_accessions should be brought into
> TABLE clones and not maintained as a separate TABLE.
>
> relationship:
> Some clones will not have values now in gb_accessions (but
> these columns(accn_no, gi_no) would be defined NULL).
> Accession numbers will exist for 1 or more clones.
I disagree with this for two reasons:
1. The GB information will not be filled in at the same time as the rest
of the data.
2. The GB information consists of two columns (accn_no, gi_no) and not
one. If it were one column, the easier approach (adding a Nullable gb
column) might make more sense.
Therefore, I recommend that you maintain gb as a relational table, with
a nullable foriegn key to the GB table in Clones (and no ON DELETE
CASCADE!) This has the added benefit that if GB requires more data
(i.e. additional columns) in the future, it doesn't mess things up for
your clone table.
>
> 2) library :: clone
>
> relationship:
> a clone will come from 1 and only 1 library
> a library will contain many clones
>
> regarding column 'project' in both TABLES. (project will be UNIQUE)
> Currently, I have clone.project as a FK refering to TABLE library. I
> think this is ok?
Sure. Depending on how things work, you will want this column to be NOT
NULL (every clone has a library) or NULL (some clones do not have
libraries). However, see my note below regarding NULLability.
> 3) Use of NULL
>
> It would facilitate entering data into TABLE clone if seq, qual...
> were defined as NULL even though values for these columns will/does
> exist. If I defined these columns as NULL I would not have to have
> ALL of the data together at one time, but could enter it in steps.
> Is this an ill conceived notion?
Well, this purpose is what NULL is for. Cases where data is unknown or
not yet available. That being said, allowing NULLs takes data integrity
out of the table design and moves it elsewhere in the software. If you
allow NULLs for seq, then you will need to create a data integrity
report that searches for old entries with NULL in the seq column.
Otherwise, you risk having some required data never filled in.
Also, remember that you can't JOIN on a NULL value. For example, if you
allow NULLs in, say clones.gb_id, then if you do a report on clones JOIN
gb_accessions, the clones who are missing GB will not show up with a
blank GB, instead they will not show up at all! You can get around this
with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
joins.
Instead, I recommend that everywhere it is possible, you have an actual
value that indicates why the data has not been filled in. FOr example,
you could create a gb_accession with the ID of zero (0) (and accn_no,
gi_no of 0 as well) which would indicate (to you) "gb not run yet".
This gives you more information than NULL (which could indicate a number
of things: GB not run, GB lost, data error, program error, etc.), as
well as supporting JOINs cleanly. You could even have more than one
such value to indicate different reasons for missing info.
-Josh Berkus
______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh@agliodbs.com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco