Re: Storing number '001' ? - Mailing list pgsql-novice
From | Josh Berkus |
---|---|
Subject | Re: Storing number '001' ? |
Date | |
Msg-id | web-529002@davinci.ethosmedia.com Whole thread Raw |
In response to | Storing number '001' ? (Charles Hauser <chauser@acpub.duke.edu>) |
List | pgsql-novice |
Chuck, > >1. Are "Fasta" and "Sequence" the same thing? (further questions > assume > >this to be the case). > > For our purposes, yes. Strictly speaking however, fasta denotes a > particular format the sequence is in: > > >894001A01.x1 > GATCGATCGCTACGTCAGAC > > is fasta formatted sequence, whereas: > > GATCGATCGCTACGTCAGAC > > is sequence. > > In TABLE clone_fasta.seq I store the latter, ie > 'GATCGATCGCTACGTCAGAC'. If it helps, I can name TABLE clone_fasta > TABLE clone_sequence? No, don't rename your tables for my convenience. I was just confused because you were using the word "fasta" in some places and "sequence" in others. > >2. We established in the last e-mail that there is potentially more > than > >one clone related to each clone_fasta and to each clone_qual record, > and > >that no clone has more than one fasta or qual record. Is that still > >true? > > I believe not. Let me answere this with an example, and go from > there. > I will simplify the clone id and not break it down into 6 fields. > > > TABLE clone TABLE clone_fasta > TABLE clone_qual > clone seq > qual > record 1: 894001A01.x1 <------> GATCGATATATA..... > <------> {9 9 9 23 34 45 ...} > > record 2: 894001A01.y1 <------> TTTTTTGATGAT..... > <------> {3 4 6 9 14 34 21 ...} > > record 3: 894001A02.x1 <------> GTTTCACTAGCT..... > <------> {8 5 15 31 24 7 ...} > > > From the above example, which is universally true, I would state > that: > > 1. one and only one clone relates to each clone_fasta and to each > clone_qual. > 2. no clone has more than one fasta or qual record. > > Stated another way, each clone has one and only one fasta(sequence), > and one and only one qual. So, two more questions: 1. Does more than one clone potentially relate to each fasta? Do we care? (i.e. will we ever query the database for "Which clones relate to sequence x?" Or do we receive data like "CLones 1999, 2001, and 2173 have sequence x)?") 2. Does clone_qual properly relate to clones, or to clone_fasta? From your description, I can see things going either way. 3. The contigs: is a contig assembled out of clones, sequences (fasta) or quals? I'm not clear on this. From your description, it seems like a contig might actually represent 1-2 sequences(fastas or qual?), as opposed to 1-2 clones. > >3. In what order does the data arrive for your tables? I.e., is > this an > >accurate order of events: > >(1) Clone data > >(2) Sequence (Fasta?) data > >(3) Qual data > >(4) Contig data > >(5) Library and Genebank data. > >Is this accurate? > > > More accurate to order them as: > > (5a) Library : is updated with each new project > (1),(2),(3) : arrive simultaneously, but would be entered in the > order you listed. > (5b) Genbank : data submitted to Genbank after (1,2,3) are in hand > (4) > (6) blast We'll hash this out eventually! I think we're still struggling with you speaking biologist and me speaking DBA ... -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
pgsql-novice by date: