OK
If your interested this is my solution to the problem. I'm sure there
are more efficient ways but this way works and that's what's important
at the moment.
While looking up Jeff's tips I discovered the CASE clause and used it to
generate two columns, a count of the dbs and a total across the dbs
avoiding the null values. I could then order these to get the required
row first.
SELECT distinct on (clus_id) clus_id,contig,db0,db1,db4,
CASE
WHEN db0 is not null and
db1 is not null and
db4 is not null
THEN 3
WHEN db0 is not null and
db1 is not null
THEN 2
WHEN db0 is not null and
db4 is not null
THEN 2
WHEN db1 is not null and
db4 is not null
THEN 2
END as db_count,
CASE
WHEN db0 is not null and
db1 is not null and
db4 is not null
THEN (db0+db1+db4)
WHEN db0 is not null and
db1 is not null
THEN (db0+db1)
WHEN db0 is not null and
db4 is not null
THEN (db0+db4)
WHEN db1 is not null and
db4 is not null
THEN (db0+db1)
WHEN db0 is not null
THEN db0
WHEN db1 is not null
THEN db1
WHEN db4 is not null
THEN db4
END as total
from venn order by clus_id,db_count DESC,total DESC;
clus_id | contig | db0 | db1 | db4 | db_count | total
----------+--------+---------+------+------+----------+-------
LRC00001 | 2 | 77.4 | | 82.7 | 2 | 160.1
LRC00002 | 1 | 325 | 343 | 313 | 3 | 981
LRC00003 | 1 | | | | |
LRC00004 | 1 | | | | |
LRC00005 | 1 | 294 | 294 | 116 | 3 | 704
LRC00006 | 3 | 120 | 122 | 63.9 | 3 | 305.9
LRC00007 | 1 | 178 | | | | 178
LRC00008 | 1 | | | | |
LRC00009 | 1 | 416 | | | | 416
LRC00010 | 4 | 146 | 168 | 172 | 3 | 486
LRC00011 | 1 | 179 | 100 | 95.1 | 3 | 374.1
LRC00012 | 1 | 639 | 639 | 633 | 3 | 1911
Jeff Eckermann wrote:
>--- ann hedley <ann.hedley@ed.ac.uk> wrote:
>
>
>>The problem...
>> From the table below I want to select distinct
>>clus_id and the data in
>>3 db columns, even if that data is null.
>>
>>When there are multiple contigs for a clus_id I want
>>to select the row
>>where 1st, the most db columns have a value, 2nd the
>>total value of the
>>db columns is highest.
>>
>>So for
>>LRC00006 I want contig 3,
>>LRC00010 I want contig 4, because it has 3 db values
>>(even though contig
>>2 would total more)
>>LRC00001 I want contig 2, because it would total
>>more if the null value
>>was treated as zero in the calculation
>>and for the rest I want contig 1.
>>
>>Any hints on writing such a query would be much
>>
>>
>
>If you don't mind using a PostgreSQL-only extension,
>"SELECT DISTINCT ON" may do what you want: check the
>"SELECT" page in the "SQL Commands" section of the
>manual.
>
>You will also want to use "coalesce" to replace null
>values, e.g. "select coalesce(db0,0)"
>
>
>
>>appreciated.
>>
>>lumbribase=# select
>>clus_id,contig,db0,db1,db4,(db0+db1+db4) as total
>>from venn order by clus_id,total DESC;
>> clus_id | contig | db0 | db1 | db4 | total
>>----------+--------+------+------+------+-------
>> LRC00001 | 1 | 77.4 | | 63.9 |
>> LRC00001 | 2 | 77.4 | | 82.7 |
>> LRC00002 | 1 | 325 | 343 | 313 | 981
>> LRC00003 | 1 | | | |
>> LRC00004 | 1 | | | |
>> LRC00005 | 1 | 294 | 294 | 116 | 704
>> LRC00006 | 1 | 100 | 72.8 | |
>> LRC00006 | 3 | 120 | 122 | 63.9 | 305.9
>> LRC00006 | 2 | 117 | 112 | 58.5 | 287.5
>> LRC00007 | 1 | 178 | | |
>> LRC00008 | 1 | | | |
>> LRC00009 | 1 | 416 | | |
>> LRC00010 | 2 | 324 | 167 | |
>> LRC00010 | 4 | 146 | 168 | 172 | 486
>> LRC00010 | 1 | 146 | 166 | 171 | 483
>> LRC00010 | 3 | 145 | 160 | 159 | 464
>> LRC00011 | 1 | 179 | 100 | 95.1 | 374.1
>> LRC00012 | 1 | 639 | 639 | 633 | 1911
>> LRC00012 | 3 | 505 | 509 | 508 | 1522
>> LRC00012 | 2 | 390 | 391 | 392 | 1173
>>
>>
>>
>>--
>>Ann
>>
>>"In a world without walls and fences - who needs
>>Windows and Gates ?"
>>
>> (unknown)
>>
>>
>>
>>---------------------------(end of
>>broadcast)---------------------------
>>TIP 1: subscribe and unsubscribe commands go to
>>majordomo@postgresql.org
>>
>>
>>
>
>
>
>__________________________________
>Yahoo! Mail Mobile
>Take Yahoo! Mail with you! Check email on your mobile phone.
>http://mobile.yahoo.com/learn/mail
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>
--
Ann
"In a world without walls and fences - who needs Windows and Gates ?"
(unknown)