Thread: count (DISTINCT expression [ , ... ] ) and documentation

count (DISTINCT expression [ , ... ] ) and documentation

From
Ivan Sergio Borgonovo
Date:
I noticed that starting from 8.2 the documentation at
http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
say that multiple distinct expressions are supported

aggregate_name (DISTINCT expression [, expression] )

While previous docs just listed one:

aggregate_name (DISTINCT expression)

Still I'm using 8.3 and

select count(distinct c1, c2) from table1;

report:

No function matches the given name and argument types. You might
need to add explicit type casts.

What should I write in spite of?

select count(distinct c1, c2) from table1;

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: count (DISTINCT expression [ , ... ] ) and documentation

From
"Pavel Stehule"
Date:
Hello

count has only one argument,

try:

postgres=# select * from fooa;
 a  | b
----+----
 10 | 20
(1 row)

postgres=# select count(distinct a,b) from fooa;
ERROR:  function count(integer, integer) does not exist
LINE 1: select count(distinct a,b) from fooa;
               ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.
postgres=# select count(distinct (a,b)) from fooa;
 count
-------
     1
(1 row)

regards
Pavel Stehule

2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> I noticed that starting from 8.2 the documentation at
> http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
> say that multiple distinct expressions are supported
>
> aggregate_name (DISTINCT expression [, expression] )
>
> While previous docs just listed one:
>
> aggregate_name (DISTINCT expression)
>
> Still I'm using 8.3 and
>
> select count(distinct c1, c2) from table1;
>
> report:
>
> No function matches the given name and argument types. You might
> need to add explicit type casts.
>
> What should I write in spite of?
>
> select count(distinct c1, c2) from table1;
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: count (DISTINCT expression [ , ... ] ) and documentation

From
Ivan Sergio Borgonovo
Date:
On Fri, 26 Dec 2008 15:46:48 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

> count has only one argument,

then what was changed between 8.1 and 8.2 to change the docs?
None of the functions listed in:
http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
seems to support
aggregate(distinct exp [,exp])

Does the change reflect the change in the possibility to write user
defined aggregates that support more then one distinct expression?

The first thing that comes to my mind to emulate
count(distinct a,b)
would be to

create table test.dist (a int, b int);
insert into test.dist values(1,0);
insert into test.dist values(1,0);
insert into test.dist values(1,1);
insert into test.dist values(0,0);
select count(*) from (select distinct a,b from test.dist ) a;

but still I can't think of anything that would work with
aggregate(distinct a,b)
not just count.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: count (DISTINCT expression [ , ... ] ) and documentation

From
"Pavel Stehule"
Date:
2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> On Fri, 26 Dec 2008 15:46:48 +0100
> "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>
>> count has only one argument,
>
> then what was changed between 8.1 and 8.2 to change the docs?
> None of the functions listed in:
> http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
> seems to support
> aggregate(distinct exp [,exp])

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html

http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE


>
> Does the change reflect the change in the possibility to write user
> defined aggregates that support more then one distinct expression?

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
    SFUNC = sfunc,
    STYPE = state_data_type
    [ , FINALFUNC = ffunc ]
    [ , INITCOND = initial_condition ]
    [ , SORTOP = sort_operator ]
)

you are able to write multi param aggregates.

regards
Pavel Stehule


>
> The first thing that comes to my mind to emulate
> count(distinct a,b)
> would be to
>
> create table test.dist (a int, b int);
> insert into test.dist values(1,0);
> insert into test.dist values(1,0);
> insert into test.dist values(1,1);
> insert into test.dist values(0,0);
> select count(*) from (select distinct a,b from test.dist ) a;
>
> but still I can't think of anything that would work with
> aggregate(distinct a,b)
> not just count.
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: count (DISTINCT expression [ , ... ] ) and documentation

From
Ivan Sergio Borgonovo
Date:
On Fri, 26 Dec 2008 16:23:52 +0100
"Pavel Stehule" <pavel.stehule@gmail.com> wrote:

> 2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> > On Fri, 26 Dec 2008 15:46:48 +0100
> > "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
> >
> >> count has only one argument,
> >
> > then what was changed between 8.1 and 8.2 to change the docs?
> > None of the functions listed in:
> > http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
> > seems to support
> > aggregate(distinct exp [,exp])
>
> http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html
>
> http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE


yeah but no function seems to support

aggregate(distinct x, y)

> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>     SFUNC = sfunc,
>     STYPE = state_data_type
>     [ , FINALFUNC = ffunc ]
>     [ , INITCOND = initial_condition ]
>     [ , SORTOP = sort_operator ]
> )


OK... but how am I going to implement an user defined aggregate that
support without resorting to C?

myaggfunc(distinct x, y)?

Otherwise to what is it referring

http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html

aggregate_name (DISTINCT expression [ , ... ] )

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: count (DISTINCT expression [ , ... ] ) and documentation

From
David Fetter
Date:
On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo wrote:
> I noticed that starting from 8.2 the documentation at
> http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
> say that multiple distinct expressions are supported
>
> aggregate_name (DISTINCT expression [, expression] )

In 8.4, you'll be able to do:

WITH d AS (
    SELECT DISTINCT c1, c2 FROM table1
)
SELECT count(*) FROM d;

and very likely an OLAP version. :)

Cheers,
David.
>
> While previous docs just listed one:
>
> aggregate_name (DISTINCT expression)
>
> Still I'm using 8.3 and
>
> select count(distinct c1, c2) from table1;
>
> report:
>
> No function matches the given name and argument types. You might
> need to add explicit type casts.
>
> What should I write in spite of?
>
> select count(distinct c1, c2) from table1;
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: count (DISTINCT expression [ , ... ] ) and documentation

From
"Pavel Stehule"
Date:
2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:
> On Fri, 26 Dec 2008 16:23:52 +0100
> "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>
>> 2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>:
>> > On Fri, 26 Dec 2008 15:46:48 +0100
>> > "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>> >
>> >> count has only one argument,
>> >
>> > then what was changed between 8.1 and 8.2 to change the docs?
>> > None of the functions listed in:
>> > http://www.postgresql.org/docs/8.2/static/functions-aggregate.html
>> > seems to support
>> > aggregate(distinct exp [,exp])
>>
>> http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html
>>
>> http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE
>
>
> yeah but no function seems to support
>
> aggregate(distinct x, y)
>
>> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>>     SFUNC = sfunc,
>>     STYPE = state_data_type
>>     [ , FINALFUNC = ffunc ]
>>     [ , INITCOND = initial_condition ]
>>     [ , SORTOP = sort_operator ]
>> )
>
>
> OK... but how am I going to implement an user defined aggregate that
> support without resorting to C?
>
> myaggfunc(distinct x, y)?
>
> Otherwise to what is it referring
>
> http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
>
> aggregate_name (DISTINCT expression [ , ... ] )
>

ok, I tested and it isn't supported yet. This is documentation bug.
DISTINCT is allowed only for single argument aggregate.

Regards
Pavel Stehule

> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: count (DISTINCT expression [ , ... ] ) and documentation

From
Ivan Sergio Borgonovo
Date:
On Fri, 26 Dec 2008 10:43:25 -0800
David Fetter <david@fetter.org> wrote:

> On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo
> wrote:
> > I noticed that starting from 8.2 the documentation at
> > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
> > say that multiple distinct expressions are supported
> >
> > aggregate_name (DISTINCT expression [, expression] )

> In 8.4, you'll be able to do:

> WITH d AS (
>     SELECT DISTINCT c1, c2 FROM table1
> )
> SELECT count(*) FROM d;

Nice, but what will be the difference from
select count(*) from (select distinct c1, c2 from t);
?
Optimisation?

Furthermore... I was actually looking at docs because I needed to
find a way supported by both postgresql and mysql and I've heard
that mysql is not that good at subselect and I doubt it supports
WITH AS. (OK not really a postgresql problem...).

Meanwhile what would you suggest as a general approach to stuff like

select count(distinct c1, c2) from t;

regardless of mysql support?
and considering mysql support?

I was thinking to find some way to exploit group by, but I didn't
come to anything useful yet.

> and very likely an OLAP version. :)

What's "an OLAP version" of WITH d AS...

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: count (DISTINCT expression [ , ... ] ) and documentation

From
David Fetter
Date:
On Fri, Dec 26, 2008 at 08:03:30PM +0100, Ivan Sergio Borgonovo wrote:
> On Fri, 26 Dec 2008 10:43:25 -0800
> David Fetter <david@fetter.org> wrote:
>
> > On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo
> > wrote:
> > > I noticed that starting from 8.2 the documentation at
> > > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html
> > > say that multiple distinct expressions are supported
> > >
> > > aggregate_name (DISTINCT expression [, expression] )
>
> > In 8.4, you'll be able to do:
>
> > WITH d AS (
> >     SELECT DISTINCT c1, c2 FROM table1
> > )
> > SELECT count(*) FROM d;
>
> Nice, but what will be the difference from
> select count(*) from (select distinct c1, c2 from t);
> ?
> Optimisation?

None especially.

> Furthermore... I was actually looking at docs because I needed to
> find a way supported by both postgresql and mysql

Generally, it's *not* a good idea to try to support more than one
back-end.  You wind up maintaining several disparate code bases, all
of which must do exactly the same thing, or you create your own RDBMS
in your client code, or worst of all, some of each.

Unless the most important attribute of the software, i.e. you can
jettison any other feature to support it, is to support more than one
RDBMS back-end, don't even try.  Examples of software which needs to
support multiple RDBMS back-ends include, and are pretty much limited
to, ERD generators and migration tools.

> > and very likely an OLAP version. :)
>
> What's "an OLAP version" of WITH d AS...

OLAP includes clauses like WINDOW() and OVER(), but since it's not
committed yet, I don't want to get too far into it :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation

From
Ivan Sergio Borgonovo
Date:
On Fri, 26 Dec 2008 12:04:48 -0800
David Fetter <david@fetter.org> wrote:

> On Fri, Dec 26, 2008 at 08:03:30PM +0100, Ivan Sergio Borgonovo
> wrote:

> > > > aggregate_name (DISTINCT expression [, expression] )

> > > In 8.4, you'll be able to do:

> > > WITH d AS (
> > >     SELECT DISTINCT c1, c2 FROM table1
> > > )
> > > SELECT count(*) FROM d;

> > Nice, but what will be the difference from
> > select count(*) from (select distinct c1, c2 from t);
> > ?
> > Optimisation?

> None especially.

So what would be the advantage compared to subselect?

> > Furthermore... I was actually looking at docs because I needed to
> > find a way supported by both postgresql and mysql
>
> Generally, it's *not* a good idea to try to support more than one
> back-end.  You wind up maintaining several disparate code bases,

Not really my main target... I was just investigating if it could
come for free ;)

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
>>> David Fetter <david@fetter.org> wrote:
>>>> In 8.4, you'll be able to do:

>>>> WITH d AS (
>>>> SELECT DISTINCT c1, c2 FROM table1
>>>> )
>>>> SELECT count(*) FROM d;

>>> Nice, but what will be the difference from
>>> select count(*) from (select distinct c1, c2 from t);
>>> ?
>>> Optimisation?

>> None especially.

> So what would be the advantage compared to subselect?

None, David just has WITH on the brain ;-)

The subselect syntax certainly seems like the one most likely to work
across different SQL implementations.  WITH is a pretty recent addition
to the SQL spec, and DISTINCT with multiple aggregate arguments isn't
in the spec at all.  The COUNT(DISTINCT ROW(x,y)) hack is a cute idea
but I'm dubious that that's portable either (it certainly doesn't work
in pre-8.4 PG).

            regards, tom lane

On Fri, Dec 26, 2008 at 07:13:48PM -0500, Tom Lane wrote:
> Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> >>> David Fetter <david@fetter.org> wrote:
> >>>> In 8.4, you'll be able to do:
>
> >>>> WITH d AS (
> >>>> SELECT DISTINCT c1, c2 FROM table1
> >>>> )
> >>>> SELECT count(*) FROM d;
>
> >>> Nice, but what will be the difference from
> >>> select count(*) from (select distinct c1, c2 from t);
> >>> ?
> >>> Optimisation?
>
> >> None especially.
>
> > So what would be the advantage compared to subselect?
>
> None, David just has WITH on the brain ;-)

LOL!

You're only saying that because it's true ;)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

subselect and count (DISTINCT expression [ , ... ] ) performances

From
Ivan Sergio Borgonovo
Date:
On Fri, 26 Dec 2008 19:13:48 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The subselect syntax certainly seems like the one most likely to
> work across different SQL implementations.  WITH is a pretty

subselects actually works on mysql too but on a 1M table with about
300K unique columns it performs more than 4 times slower than
select (distinct a,b) from table

18sec vs. 4sec

Times were similar for innodb and myisam.

Postgresql needs 17sec with subselect.

I didn't try to see how both db could perform with indexes.

mysql performance is impressive. I thought that most of the time
would be spent on "distinct" where postgresql shouldn't suffer from
its "count" implementation. But well still 300K rows to count on 1M
aren't few.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it