Re: Help with queries. - Mailing list pgsql-sql
From | Adrian Johnson |
---|---|
Subject | Re: Help with queries. |
Date | |
Msg-id | AANLkTikkstpxYtKukzU_gDWTwE7qmjShkGZpRBLqPs4y@mail.gmail.com Whole thread Raw |
In response to | Re: Help with queries. ("Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>) |
List | pgsql-sql |
Hello Oliver. thanks for your reply. Here are my answers. Sorry I shot e-mail in morning and I was not clear about it. I am afraid, I am not jamming you with all information. Thank you very much for your help. ====================================== your questions ==================================== First, answers to questions you asked: > Please advice me, > What do you call coordinates to, exactly? Sorry, my genetics knowledge is > very poor. == I call coordinates cto - crom > You call the from and to fields coordinates? ==yes > Do they have the same meaning as the from and to fields from table snps ? == cfrom - cto fields have yes. > What does it mean to map all coordinates in table snps? == Means I want to take snps.cfrom and snps.cto and gene.gene by checking if they are in range of gene.cfrom and gene.cto from gene table. > Do you wanna know which entries from snps map in the SRC gene? == yes and more than that. > Coordinates are the segment where the gene/sample fit in the chromosome? Is > my understanding correct ? == yes I am calling on "chr19" (cfrom - cto) (38162736 - 38163007) are coordinates. =================================== illustration of example ========================== I know basic queries, but I am finding it difficult to get what I am describing below. I am illustrating an example here . Here I query for a gene RHPN2 from gene table: gid | chromosome | cfrom | cto | frame | gene -------+------------+----------+----------+-------+-------93538 | chr19 | 38162736 | 38163007 | - | RHPN293540 |chr19 | 38173254 | 38173420 | - | RHPN293541 | chr19 | 38174563 | 38174720 | - | RHPN293542 | chr19 | 38176717 | 38176804 | - | RHPN293543 | chr19 | 38178766 | 38178971 | - | RHPN293544 | chr19 | 38182326| 38182456 | - | RHPN293545 | chr19 | 38184987 | 38185154 | - | RHPN293546 | chr19 | 38185553 |38185751 | - | RHPN293547 | chr19 | 38190754 | 38190931 | - | RHPN293548 | chr19 | 38194419 | 38194554| - | RHPN293549 | chr19 | 38195387 | 38195475 | - | RHPN293551 | chr19 | 38204311 | 38204397 |- | RHPN293553 | chr19 | 38209244 | 38209383 | - | RHPN293555 | chr19 | 38226989 | 38227115 | - |RHPN293556 | chr19 | 38247524 | 38247603 | - | RHPN2 Now taking:cto = 38162736 first row in above tablecfrom = 38247603 last row in above table I want to get all those entries from snp table # select sample_id, chromosome,cfrom, cto, refbase, consbase from snps where cto > 38162736 and cfrom < 38247603 and chromosome = 'chr19' ; sample_id | chromosome | cfrom | cto | refbase | consbase -----------+------------+----------+----------+---------+---------- 2 | chr19 | 38178828 | 38178829 | C | Y 5 | chr19 | 38182405 | 38182406 | T | Y 5 | chr19 | 38182424 | 38182425 | G | R 5 | chr19 | 38185101 | 38185102 | C | Y 3 | chr19 | 38182424 | 38182425 | G | R 1 | chr19 | 38178828 | 38178829 | C | Y 1 | chr19 | 38182424 | 38182425 | G | R 7| chr19 | 38185101 | 38185102 | C | Y 6 | chr19 | 38185101 | 38185102 | C | Y 4 | chr19 | 38178828 | 38178829 | C | Y 4 | chr19 | 38182424 | 38182425 | G | R From this result, we get that all samples range from 1 to 7 (sample_id). Now, from above result (snps table) I want to filter: 1. Those that have any letter in consbase other than ATGC (although in this case there are no A or T or G or C they are there) 2. Those entries that have sample_id 1, 2 and 3. 3. Unique entries that have cfrom and cto common to 1 and 2 and 1 and 3 but not 2 and 3. For example see below: (say this is result X)gene | sample_id | chromosome | cfrom | cto -------+-----------+------------+----------+----------+---------+----------RHPN2 | 2 | chr19 | 38178828 | 38178829RHPN2| 1 | chr19 | 38178828 | 38178829RHPN2 | 3 | chr19 | 38182424 | 38182425RHPN2 | 1 | chr19 | 38182424 | 38182425 Here for gene RHPN2: cfrom - cto (38178828 | 38178829) is common to samples 1 and 2 cfrom - cto ( 38182424| 38182425) is common to samples 1 and 3. But both these samples 1 and 2 and 1 and 3 belong same gene (RHPN2 ). Now I have another table coverage, where for sample positions above, I have the following data: (say this is result y)sample_id | chromosome | cfrom | cto | abase | tbase | gbase | cbase -----------+------------+----------+----------+-------+-------+-------+------- 2 | chr19 | 38178828 | 38178829| 0 | 29 | 2 | 44 1 | chr19 | 38178828 | 38178829 | 0 | 52 | 0 | 32 3 | chr19 | 38178828 | 38178829 | 0 | 0 | 0 | 28 1 | chr19 | 38182424 | 38182425 | 14 | 0 | 54 | 0 2 | chr19 | 38182424 | 38182425| 17 | 0 | 55 | 0 3 | chr19 | 38182424 | 38182425 | 13 | 0 | 26 | 0 Now from table snps, I want to get results that satisfy conditions that gave result X and combine this with results Y. How is it possible to do this? Thank you for your help. Adrian.