Here's one for all you SQL jockeys out there (hope this is a good group
to post this query)
I'm currently (happily) using PostgreSQL for doing DNA expression analysis,
and I often end up with joins where columns on the left side get repeated
to matching differing
For example, a select output ends up looking like this:
(from psql. i've removed the - and | lines)
tag gb_id pos descrip
ACTATTTTTAGAGACCC NM_032685.1 307 hypothetical protein MGC13005 (MGC13005),
AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6)
AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6)
AGCCACCACGCCTGGTC NM_003693.1 260 acetyl LDL receptor; SREC=scavenger
AGCCACCGCGCCCGGCC NM_007081.1 486 RAB, member of RAS oncogene family-like 2B
AGCCACCGCGCCCGGCC NM_013412.1 486 RAB, member of RAS oncogene family-like 2A
AGCCACCGCGCCTGGCC NM_000651.2 229 complement component (3b/4b) receptor 1,
AGCCACCGCGCCTGGCC NM_000573.2 229 complement component (3b/4b) receptor 1,
ATCAAAAAAAAAAAAAA NM_079421.1 25 cyclin-dependent kinase inhibitor 2D
but I (and my colleagues) would much rather see this, which draws
attention to the duplicates (or multiples) in the left column.
tag gb_id pos descrip
ACTATTTTTAGAGACCC NM_032685.1 307 hypothetical protein MGC13005 (MGC13005),
AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6) NM_019110.2 27
hypotheticalprotein P1 p373c6 (P1P373C6)
AGCCACCACGCCTGGTC NM_003693.1 260 acetyl LDL receptor; SREC=scavenger
AGCCACCGCGCCCGGCC NM_007081.1 486 RAB, member of RAS oncogene family-like 2B NM_013412.1 486 RAB,
memberof RAS oncogene family-like 2A
AGCCACCGCGCCTGGCC NM_000651.2 229 complement component (3b/4b) receptor 1, NM_000573.2 229 complement
component(3b/4b) receptor 1,
ATCAAAAAAAAAAAAAA NM_079421.1 25 cyclin-dependent kinase inhibitor 2D
Is there any slick way to accomplish this in SQL? Or is there a switch
or something I've missed in postgresql? Currently I'm postprocessing
the output with a perl script, but I'd prefer the SQL to generate
it directly, if possible.
Thanks, folks!
=sean
-------------------------------------------------------------------------------
Sean R. McCorkle mccorkle@bnl.gov | Haven't you heard, that aside
Genome Group (631) 344-4270 ph | from the details, there's
Biology Department (631) 344-3407 fax | nothing further to be found?
Brookhaven National Laboratory | --Master Dogen's 300 Koan
Upton, New York 11973 | Shobogenzo (commentary)