Thread: schema design

schema design

From
Jodi Kanter
Date:
I have an interesting issue that I am working through and was hoping to get some insight from the group. I will do my best to explain the scenario and hope I do not confuse anyone.
 
We are storing data regarding some analyses that are being run here. We also have a set of tables in which we store information "about" the types of analyses being run. I am using these tables to do a validity check on what file types (or input files) are acceptable for a certain type of analyses.
 
I have the following tables:
 
An ANALYSIS table which merely holds the name and some specifics about the type of analysis.
A FILETYPES table which stores information about the types of input and output files for each analysis. We have chosen to use file extensions as a way to identify a file type.
Some analyses can have multiple input and/or output files so I have linking tables between the ANALYSIS and FILETYPES table.
 
My question is due to a recent development. Apparently when an analysis is run the individual can specify what type of file the output can be. Since these tables are holding the data regarding the "form" of the analysis I must somehow show that each analysis can result in various filetypes. I thought I had done this and had an "extension" field in the FILETYPES table. Now I am being told that each of these input or output files can have more than one file extension.
My solution was to add another table to hold the extensions. I think this is better than allowing extensions to concatenate within a field in the filetypes table which was suggested by a programmer here. Her intention was to parse the field.  I prefer a more "appropriate" solution, but I am not convinced mine is the best solution since there will be some repeating data in the extensions table. Various analyses can be associated with the same file types so the extension field in the extensions table will have repeating values.
I have attached a small schema with these few tables to assist with your understanding our issue.
I would appreciate suggestions.
Thank you,
Jodi Kanter
 

_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu


 

 

 

Attachment

Re: schema design

From
"Michael Ansley (UK)"
Date:
According to your schema, different analyses linked to different file types will not cause redundancies in the list of extensions.  However, if you have multiple file types for an extension, then you will get duplicates.  I'm not sure how likely this is, but it's possible in a generic situation, e.g.: .doc could mean a Word doc, or a WP doc.  In this situation, do you want one extension record or two.  If you only want one, then create a many-to-many resolution table between file type and extension.  If you are happy with two, then your schema is fine. 
 
Of course, you could always use a field array to hold the extensions, rather than parsing a string field, but you will still get duplication.  And that's not very portable.
-----Original Message-----
From: Jodi Kanter [mailto:jkanter@virginia.edu]
Sent: 06 November 2002 17:04
To: Postgres Admin List
Subject: [ADMIN] schema design

I have an interesting issue that I am working through and was hoping to get some insight from the group. I will do my best to explain the scenario and hope I do not confuse anyone.
 
We are storing data regarding some analyses that are being run here. We also have a set of tables in which we store information "about" the types of analyses being run. I am using these tables to do a validity check on what file types (or input files) are acceptable for a certain type of analyses.
 
I have the following tables:
 
An ANALYSIS table which merely holds the name and some specifics about the type of analysis.
A FILETYPES table which stores information about the types of input and output files for each analysis. We have chosen to use file extensions as a way to identify a file type.
Some analyses can have multiple input and/or output files so I have linking tables between the ANALYSIS and FILETYPES table.
 
My question is due to a recent development. Apparently when an analysis is run the individual can specify what type of file the output can be. Since these tables are holding the data regarding the "form" of the analysis I must somehow show that each analysis can result in various filetypes. I thought I had done this and had an "extension" field in the FILETYPES table. Now I am being told that each of these input or output files can have more than one file extension.
My solution was to add another table to hold the extensions. I think this is better than allowing extensions to concatenate within a field in the filetypes table which was suggested by a programmer here. Her intention was to parse the field.  I prefer a more "appropriate" solution, but I am not convinced mine is the best solution since there will be some repeating data in the extensions table. Various analyses can be associated with the same file types so the extension field in the extensions table will have repeating values.
I have attached a small schema with these few tables to assist with your understanding our issue.
I would appreciate suggestions.
Thank you,
Jodi Kanter
 

_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu


 

 

 



This e-mail and any attachments are confidential and may also be privileged and/or copyright
material of Intec Telecom Systems PLC (or its affiliated companies). If you are not an
intended or authorised recipient of this e-mail or have received it in error, please delete
it immediately and notify the sender by e-mail. In such a case, reading, reproducing,
printing or further dissemination of this e-mail is strictly prohibited and may be unlawful.
Intec Telecom Systems PLC. does not represent or warrant that an attachment hereto is free
from computer viruses or other defects. The opinions expressed in this e-mail and any
attachments may be those of the author and are not necessarily those of Intec Telecom
Systems PLC.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.