Thread: SELECT syntax question - combining COUNT and DISTINCT

SELECT syntax question - combining COUNT and DISTINCT

From

Cath Lawrence

Date:

01 October 2003, 04:26:34

I'm having some trouble working with aggregates. I think I'm missing
the concept of how aggregates are treated. Is there a tutorial page?

Here's my problem, no doubt very simple. It's one table only, nice and
easy...
SELECT DISTINCT pdb_id FROM chain WHERE chain_id = 'A';

But now I want to count how many records are returned and I can't work
it out. I'm sure I'm missing something obvious.
For instance "SELECT COUNT(pdb_id) FROM chain WHERE chain_id = 'A';"
works fine but of course is no longer distinct.


A secondary question (relating to my quality control problem but not
the subject heading) is why I can't do this:
"SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code
FROM pdb_entry WHERE chain_count=1);"
This looks OK but takes apparently forever (or until I lose patience)
on my Mac G4. There are about 10,000 values in that subquery, so I
didn't expect it to be lightning fast, but it seems ridiculous.


Relevant snippets of table:
pdb_entry - 19,587 entries
---------
pdb_code is KEY, CHAR(4), unique, indexed
chain_count is INTEGER

chain - 40,844 entries
-----
pdb_id is CHAR(4) Foreign key, indexed
chain_id is CHAR(1)



Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Senior Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research (room 4088)
Australian National University,  Canberra ACT 0200
ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595

Re: SELECT syntax question - combining COUNT and DISTINCT

From

paul@entropia.co.uk

Date:

01 October 2003, 06:10:24

On 1 Oct 2003 at 17:26, Cath Lawrence wrote:

Does this help?

 select count(distinct pdb_id),chainid from chain group by chain_id;

> I'm having some trouble working with aggregates. I think I'm missing
> the concept of how aggregates are treated. Is there a tutorial page?
>
> Here's my problem, no doubt very simple. It's one table only, nice and
> easy...
> SELECT DISTINCT pdb_id FROM chain WHERE chain_id = 'A';
>
> But now I want to count how many records are returned and I can't work
> it out. I'm sure I'm missing something obvious.
> For instance "SELECT COUNT(pdb_id) FROM chain WHERE chain_id = 'A';"
> works fine but of course is no longer distinct.
>
>
> A secondary question (relating to my quality control problem but not
> the subject heading) is why I can't do this:
> "SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code
> FROM pdb_entry WHERE chain_count=1);"
> This looks OK but takes apparently forever (or until I lose patience)
> on my Mac G4. There are about 10,000 values in that subquery, so I
> didn't expect it to be lightning fast, but it seems ridiculous.
>
>
> Relevant snippets of table:
> pdb_entry - 19,587 entries
> ---------
> pdb_code is KEY, CHAR(4), unique, indexed
> chain_count is INTEGER
>
> chain - 40,844 entries
> -----
> pdb_id is CHAR(4) Foreign key, indexed
> chain_id is CHAR(1)
>
>
>
> Cath Lawrence,                       Cath.Lawrence@anu.edu.au
> Senior Scientific Programmer,  Centre for Bioinformation Science,
> John Curtin School of Medical Research (room 4088)
> Australian National University,  Canberra ACT 0200
> ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: SELECT syntax question - combining COUNT and DISTINCT

From

paul@entropia.co.uk

Date:

01 October 2003, 06:28:07

On 1 Oct 2003 at 9:55, Cath Lawrence <Cath.Lawrence@ wrote:

 >
> >
> > A secondary question (relating to my quality control problem but not
> > the subject heading) is why I can't do this:
> > "SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code
> > FROM pdb_entry WHERE chain_count=1);"
> > This looks OK but takes apparently forever (or until I lose patience)
> > on my Mac G4. There are about 10,000 values in that subquery, so I
> > didn't expect it to be lightning fast, but it seems ridiculous.
 Try indexing the column in the where clause

> >
> >
> > Relevant snippets of table:
> > pdb_entry - 19,587 entries
> > ---------
> > pdb_code is KEY, CHAR(4), unique, indexed
> > chain_count is INTEGER
> >
> > chain - 40,844 entries
> > -----
> > pdb_id is CHAR(4) Foreign key, indexed
> > chain_id is CHAR(1)
> >
> >
> >
> > Cath Lawrence,                       Cath.Lawrence@anu.edu.au
> > Senior Scientific Programmer,  Centre for Bioinformation Science,
> > John Curtin School of Medical Research (room 4088)
> > Australian National University,  Canberra ACT 0200
> > ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

Re: SELECT syntax question - combining COUNT and DISTINCT

From

"Louise Cofield"

Date:

01 October 2003, 11:54:45


http://www.postgresql.org/docs/7.3/static/tutorial-agg.html

:)

Louise

-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Cath Lawrence
Sent: Wednesday, October 01, 2003 1:27 AM
To: pgsql-novice@postgresql.org
Subject: [NOVICE] SELECT syntax question - combining COUNT and DISTINCT

I'm having some trouble working with aggregates. I think I'm missing
the concept of how aggregates are treated. Is there a tutorial page?

Here's my problem, no doubt very simple. It's one table only, nice and
easy...
SELECT DISTINCT pdb_id FROM chain WHERE chain_id = 'A';

But now I want to count how many records are returned and I can't work
it out. I'm sure I'm missing something obvious.
For instance "SELECT COUNT(pdb_id) FROM chain WHERE chain_id = 'A';"
works fine but of course is no longer distinct.


A secondary question (relating to my quality control problem but not
the subject heading) is why I can't do this:
"SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code
FROM pdb_entry WHERE chain_count=1);"
This looks OK but takes apparently forever (or until I lose patience)
on my Mac G4. There are about 10,000 values in that subquery, so I
didn't expect it to be lightning fast, but it seems ridiculous.


Relevant snippets of table:
pdb_entry - 19,587 entries
---------
pdb_code is KEY, CHAR(4), unique, indexed
chain_count is INTEGER

chain - 40,844 entries
-----
pdb_id is CHAR(4) Foreign key, indexed
chain_id is CHAR(1)



Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Senior Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research (room 4088)
Australian National University,  Canberra ACT 0200
ph: (02) 61257959   mobile: 0421-902694   fax: (02) 61252595


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: SELECT syntax question - combining COUNT and DISTINCT

From

Manfred Koizar

Date:

01 October 2003, 15:06:48

On Wed, 1 Oct 2003 17:26:55 +1000, Cath Lawrence
<Cath.Lawrence@anu.edu.au> wrote:
>"SELECT DISTINCT chain_id FROM chain WHERE pdb_id IN (SELECT pdb_code
>FROM pdb_entry WHERE chain_count=1);"
>This looks OK but takes apparently forever

SELECT DISTINCT c.chain_id
  FROM chain AS c, pdb_entry AS p
 WHERE c.pdb_id = p.pdb_code
   AND p.chain_count = 1;

Servus
 Manfred

Re: SELECT syntax question - combining COUNT and DISTINCT

From

zmilos@sympatico.ca (zhix)

Date:

07 October 2003, 15:55:25

This should give you distinct values counts:

select pdb_id,count(*)
from   chain
where  chain_id = 'A'
group by pdb_id;

zhix.

Re: SELECT syntax question - combining COUNT and DISTINCT

From

zmilos@sympatico.ca (zhix)

Date:

07 October 2003, 15:58:41

This should do:

select pdb_id,count(*)
from   chain
where  chain_id = 'A'
group by pdb_id;

rgds,
zhix