Re: Storing number '001' ? - Mailing list pgsql-novice

From Josh Berkus
Subject Re: Storing number '001' ?
Date
Msg-id web-529002@davinci.ethosmedia.com
Whole thread Raw
In response to Storing number '001' ?  (Charles Hauser <chauser@acpub.duke.edu>)
List pgsql-novice
Chuck,

> >1. Are "Fasta" and "Sequence" the same thing? (further questions
> assume
> >this to be the case).
>
> For our purposes, yes.  Strictly speaking however, fasta denotes a
> particular format the sequence is in:
>
> >894001A01.x1
> GATCGATCGCTACGTCAGAC
>
> is fasta formatted sequence, whereas:
>
> GATCGATCGCTACGTCAGAC
>
> is sequence.
>
> In TABLE clone_fasta.seq I store the latter, ie
> 'GATCGATCGCTACGTCAGAC'.  If it helps, I can name TABLE clone_fasta
> TABLE clone_sequence?

No, don't rename your tables for my convenience.  I was just confused
because you were using the word "fasta" in some places and "sequence" in
others.

> >2. We established in the last e-mail that there is potentially more
> than
> >one clone related to each clone_fasta and to each clone_qual record,
> and
> >that no clone has more than one fasta or qual record.  Is that still
> >true?
>
> I believe not. Let me answere this with an example, and go from
> there.
> I will simplify the clone id and not break it down into 6 fields.
>
>
>   TABLE  clone   TABLE clone_fasta
>   TABLE clone_qual
>   clone     seq
>    qual
> record 1: 894001A01.x1 <------> GATCGATATATA.....
> <------>    {9 9 9 23 34 45 ...}
>
> record 2: 894001A01.y1 <------> TTTTTTGATGAT.....
> <------>    {3 4 6 9 14 34 21 ...}
>
> record 3: 894001A02.x1 <------> GTTTCACTAGCT.....
> <------>    {8 5 15 31 24 7 ...}
>
>
>  From the above example, which is universally true, I would state
> that:
>
> 1. one and only one clone relates to each clone_fasta and to each
> clone_qual.
> 2. no clone has more than one fasta or qual record.
>
> Stated another way, each clone has one and only one fasta(sequence),
> and one and only one qual.

So, two more questions:
1. Does more than one clone potentially relate to each fasta?  Do we
care? (i.e. will we ever query the database for "Which clones relate to
sequence x?"  Or do we receive data like "CLones 1999, 2001, and 2173
have sequence x)?")
2. Does clone_qual properly relate to clones, or to clone_fasta?  From
your description, I can see things going either way.
3. The contigs:  is a contig assembled out of clones, sequences (fasta)
or quals?  I'm not clear on this.  From your description, it seems like
a contig might actually represent 1-2 sequences(fastas or qual?), as
opposed to 1-2 clones.

> >3. In what order does the data arrive for your tables?  I.e., is
> this an
> >accurate order of events:
> >(1) Clone data
> >(2) Sequence (Fasta?) data
> >(3) Qual data
> >(4) Contig data
> >(5) Library and Genebank data.
> >Is this accurate?
>
>
> More accurate to order them as:
>
> (5a) Library : is updated with each new project
> (1),(2),(3) : arrive simultaneously, but would be entered in the
> order you listed.
> (5b) Genbank : data submitted to Genbank after (1,2,3) are in hand
> (4)
> (6) blast

We'll hash this out eventually!  I think we're still struggling with you
speaking biologist and me speaking DBA ...

-Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

pgsql-novice by date:

Previous
From: Stephen Ingram
Date:
Subject: Re: A question about constraints.
Next
From: Brian Avis
Date:
Subject: Vacuum