Thread: SELECT syntax question - combining COUNT and DISTINCT
I'm having some trouble working with aggregates. I think I'm missing the concept of how aggregates are treated. Is there a tutorial page? Here's my problem, no doubt very simple. It's one table only, nice and easy... SELECT DISTINCT pdb_id FROM chain WHERE chain_id = 'A'; But now I want to count how many records are returned and I can't work it out. I'm sure I'm missing something obvious. For instance "SELECT COUNT(pdb_id) FROM chain WHERE chain_id = 'A';" works fine but of course is no longer distinct. A secondary question (relating to my quality control problem but not the subject heading) is why I can't do this: "SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code FROM pdb_entry WHERE chain_count=1);" This looks OK but takes apparently forever (or until I lose patience) on my Mac G4. There are about 10,000 values in that subquery, so I didn't expect it to be lightning fast, but it seems ridiculous. Relevant snippets of table: pdb_entry - 19,587 entries --------- pdb_code is KEY, CHAR(4), unique, indexed chain_count is INTEGER chain - 40,844 entries ----- pdb_id is CHAR(4) Foreign key, indexed chain_id is CHAR(1) Cath Lawrence, Cath.Lawrence@anu.edu.au Senior Scientific Programmer, Centre for Bioinformation Science, John Curtin School of Medical Research (room 4088) Australian National University, Canberra ACT 0200 ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595
On 1 Oct 2003 at 17:26, Cath Lawrence wrote: Does this help? select count(distinct pdb_id),chainid from chain group by chain_id; > I'm having some trouble working with aggregates. I think I'm missing > the concept of how aggregates are treated. Is there a tutorial page? > > Here's my problem, no doubt very simple. It's one table only, nice and > easy... > SELECT DISTINCT pdb_id FROM chain WHERE chain_id = 'A'; > > But now I want to count how many records are returned and I can't work > it out. I'm sure I'm missing something obvious. > For instance "SELECT COUNT(pdb_id) FROM chain WHERE chain_id = 'A';" > works fine but of course is no longer distinct. > > > A secondary question (relating to my quality control problem but not > the subject heading) is why I can't do this: > "SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code > FROM pdb_entry WHERE chain_count=1);" > This looks OK but takes apparently forever (or until I lose patience) > on my Mac G4. There are about 10,000 values in that subquery, so I > didn't expect it to be lightning fast, but it seems ridiculous. > > > Relevant snippets of table: > pdb_entry - 19,587 entries > --------- > pdb_code is KEY, CHAR(4), unique, indexed > chain_count is INTEGER > > chain - 40,844 entries > ----- > pdb_id is CHAR(4) Foreign key, indexed > chain_id is CHAR(1) > > > > Cath Lawrence, Cath.Lawrence@anu.edu.au > Senior Scientific Programmer, Centre for Bioinformation Science, > John Curtin School of Medical Research (room 4088) > Australian National University, Canberra ACT 0200 > ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595 > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On 1 Oct 2003 at 9:55, Cath Lawrence <Cath.Lawrence@ wrote: > > > > > A secondary question (relating to my quality control problem but not > > the subject heading) is why I can't do this: > > "SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code > > FROM pdb_entry WHERE chain_count=1);" > > This looks OK but takes apparently forever (or until I lose patience) > > on my Mac G4. There are about 10,000 values in that subquery, so I > > didn't expect it to be lightning fast, but it seems ridiculous. Try indexing the column in the where clause > > > > > > Relevant snippets of table: > > pdb_entry - 19,587 entries > > --------- > > pdb_code is KEY, CHAR(4), unique, indexed > > chain_count is INTEGER > > > > chain - 40,844 entries > > ----- > > pdb_id is CHAR(4) Foreign key, indexed > > chain_id is CHAR(1) > > > > > > > > Cath Lawrence, Cath.Lawrence@anu.edu.au > > Senior Scientific Programmer, Centre for Bioinformation Science, > > John Curtin School of Medical Research (room 4088) > > Australian National University, Canberra ACT 0200 > > ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
http://www.postgresql.org/docs/7.3/static/tutorial-agg.html :) Louise -----Original Message----- From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Cath Lawrence Sent: Wednesday, October 01, 2003 1:27 AM To: pgsql-novice@postgresql.org Subject: [NOVICE] SELECT syntax question - combining COUNT and DISTINCT I'm having some trouble working with aggregates. I think I'm missing the concept of how aggregates are treated. Is there a tutorial page? Here's my problem, no doubt very simple. It's one table only, nice and easy... SELECT DISTINCT pdb_id FROM chain WHERE chain_id = 'A'; But now I want to count how many records are returned and I can't work it out. I'm sure I'm missing something obvious. For instance "SELECT COUNT(pdb_id) FROM chain WHERE chain_id = 'A';" works fine but of course is no longer distinct. A secondary question (relating to my quality control problem but not the subject heading) is why I can't do this: "SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code FROM pdb_entry WHERE chain_count=1);" This looks OK but takes apparently forever (or until I lose patience) on my Mac G4. There are about 10,000 values in that subquery, so I didn't expect it to be lightning fast, but it seems ridiculous. Relevant snippets of table: pdb_entry - 19,587 entries --------- pdb_code is KEY, CHAR(4), unique, indexed chain_count is INTEGER chain - 40,844 entries ----- pdb_id is CHAR(4) Foreign key, indexed chain_id is CHAR(1) Cath Lawrence, Cath.Lawrence@anu.edu.au Senior Scientific Programmer, Centre for Bioinformation Science, John Curtin School of Medical Research (room 4088) Australian National University, Canberra ACT 0200 ph: (02) 61257959 mobile: 0421-902694 fax: (02) 61252595 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Wed, 1 Oct 2003 17:26:55 +1000, Cath Lawrence <Cath.Lawrence@anu.edu.au> wrote: >"SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code >FROM pdb_entry WHERE chain_count=1);" >This looks OK but takes apparently forever SELECT DISTINCT c.chain_id FROM chain AS c, pdb_entry AS p WHERE c.pdb_id = p.pdb_code AND p.chain_count = 1; Servus Manfred
This should give you distinct values counts: select pdb_id,count(*) from chain where chain_id = 'A' group by pdb_id; zhix.
This should do: select pdb_id,count(*) from chain where chain_id = 'A' group by pdb_id; rgds, zhix