Re: Help with queries. - Mailing list pgsql-sql

From Adrian Johnson
Subject Re: Help with queries.
Date
Msg-id AANLkTikkstpxYtKukzU_gDWTwE7qmjShkGZpRBLqPs4y@mail.gmail.com
Whole thread Raw
In response to Re: Help with queries.  ("Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>)
List pgsql-sql
Hello Oliver.
thanks for your reply. Here are my answers. Sorry I shot e-mail in
morning and I was not clear about it.
I am afraid, I am not jamming you with all information.

Thank you very much for your help.


====================================== your questions
====================================


First, answers to questions you asked:



> Please advice me,
> What do you call coordinates to, exactly? Sorry, my genetics knowledge is
> very poor.
== I call coordinates cto - crom

> You call the from and to fields coordinates?
==yes

> Do they have the same meaning as the from and to fields from table snps ?
==  cfrom - cto fields have yes.


> What does it mean to map all coordinates in table snps?
== Means I want to take  snps.cfrom and snps.cto and gene.gene by
checking if they are in range of gene.cfrom and gene.cto from gene
table.

> Do you wanna know which  entries from snps map in the SRC gene?
== yes and more than that.

> Coordinates are the segment where the gene/sample fit in the chromosome? Is
> my understanding correct ?
==  yes I am calling   on  "chr19"  (cfrom - cto)  (38162736 -
38163007) are coordinates.



=================================== illustration of example
==========================
I know basic queries, but I am finding it difficult to get what I am
describing below.
I am illustrating an example here .



Here I query for a gene RHPN2 from gene table:
 gid  | chromosome |  cfrom   |   cto    | frame | gene
-------+------------+----------+----------+-------+-------93538 | chr19      | 38162736 | 38163007 | -     | RHPN293540
|chr19      | 38173254 | 38173420 | -     | RHPN293541 | chr19      | 38174563 | 38174720 | -     | RHPN293542 | chr19
   | 38176717 | 38176804 | -     | RHPN293543 | chr19      | 38178766 | 38178971 | -     | RHPN293544 | chr19      |
38182326| 38182456 | -     | RHPN293545 | chr19      | 38184987 | 38185154 | -     | RHPN293546 | chr19      | 38185553
|38185751 | -     | RHPN293547 | chr19      | 38190754 | 38190931 | -     | RHPN293548 | chr19      | 38194419 |
38194554| -     | RHPN293549 | chr19      | 38195387 | 38195475 | -     | RHPN293551 | chr19      | 38204311 | 38204397
|-     | RHPN293553 | chr19      | 38209244 | 38209383 | -     | RHPN293555 | chr19      | 38226989 | 38227115 | -
|RHPN293556 | chr19      | 38247524 | 38247603 | -     | RHPN2
 







Now taking:cto = 38162736 first row in above tablecfrom = 38247603 last row in above table
I want to get all those entries from snp table


# select sample_id, chromosome,cfrom, cto, refbase, consbase from snps
where cto > 38162736 and cfrom < 38247603 and chromosome = 'chr19' ;
sample_id | chromosome |  cfrom   |   cto    | refbase | consbase
-----------+------------+----------+----------+---------+----------        2 | chr19      | 38178828 | 38178829 | C
 | Y        5 | chr19      | 38182405 | 38182406 | T        | Y        5 | chr19      | 38182424 | 38182425 | G       |
R       5 | chr19      | 38185101 | 38185102 | C       | Y        3 | chr19      | 38182424 | 38182425 | G       | R
   1 | chr19      | 38178828 | 38178829 | C       | Y        1 | chr19      | 38182424 | 38182425 | G       | R
7| chr19      | 38185101 | 38185102 | C       | Y        6 | chr19      | 38185101 | 38185102 | C       | Y        4 |
chr19     | 38178828 | 38178829 | C       | Y        4 | chr19      | 38182424 | 38182425 | G       | R
 

From this result, we get that all samples range from 1 to 7 (sample_id).




Now, from above result (snps table) I want to filter:
1. Those that have any letter in consbase other than ATGC (although in
this case there are no A or T or G or C they are there)
2. Those entries that have sample_id 1, 2 and 3.
3. Unique entries that have cfrom and cto common to 1 and 2  and 1 and
3 but not 2 and 3.

For example see below:  (say this is result X)gene  | sample_id | chromosome |  cfrom   |   cto
-------+-----------+------------+----------+----------+---------+----------RHPN2 |         2 | chr19      | 38178828 |
38178829RHPN2|         1 | chr19      | 38178828 | 38178829RHPN2 |         3 | chr19      | 38182424 | 38182425RHPN2 |
      1 | chr19      | 38182424 | 38182425
 

Here for gene RHPN2:
cfrom - cto (38178828 | 38178829) is common to samples 1 and 2
cfrom - cto ( 38182424| 38182425) is common to samples 1 and 3.

But both these samples 1 and 2   and  1 and 3 belong same gene (RHPN2 ).


Now I  have another table coverage, where  for sample positions above,
I have the following data:
(say this is result y)sample_id | chromosome |  cfrom   |   cto    | abase | tbase | gbase | cbase
-----------+------------+----------+----------+-------+-------+-------+-------        2 | chr19      | 38178828 |
38178829|     0 |    29 |     2 |    44        1 | chr19      | 38178828 | 38178829 |     0 |    52 |     0 |    32
  3 | chr19      | 38178828 | 38178829 |     0 |     0 |     0
 
|    28        1 | chr19      | 38182424 | 38182425 |    14 |     0 |    54 |     0        2 | chr19      | 38182424 |
38182425|    17 |     0 |    55 |     0        3 | chr19      | 38182424 | 38182425 |    13 |     0 |    26 |     0
 


Now from table snps, I want to get results that satisfy conditions
that gave result X  and combine this with results Y.

How is it possible to do this?

Thank you for your help.

Adrian.


pgsql-sql by date:

Previous
From: Devrim GÜNDÜZ
Date:
Subject: Re: available RPM package for 8.4.4?
Next
From: Nicholas I
Date:
Subject: insert into help