Thread: count (DISTINCT expression [ , ... ] ) and documentation
I noticed that starting from 8.2 the documentation at http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html say that multiple distinct expressions are supported aggregate_name (DISTINCT expression [, expression] ) While previous docs just listed one: aggregate_name (DISTINCT expression) Still I'm using 8.3 and select count(distinct c1, c2) from table1; report: No function matches the given name and argument types. You might need to add explicit type casts. What should I write in spite of? select count(distinct c1, c2) from table1; -- Ivan Sergio Borgonovo http://www.webthatworks.it
Hello count has only one argument, try: postgres=# select * from fooa; a | b ----+---- 10 | 20 (1 row) postgres=# select count(distinct a,b) from fooa; ERROR: function count(integer, integer) does not exist LINE 1: select count(distinct a,b) from fooa; ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. postgres=# select count(distinct (a,b)) from fooa; count ------- 1 (1 row) regards Pavel Stehule 2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>: > I noticed that starting from 8.2 the documentation at > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html > say that multiple distinct expressions are supported > > aggregate_name (DISTINCT expression [, expression] ) > > While previous docs just listed one: > > aggregate_name (DISTINCT expression) > > Still I'm using 8.3 and > > select count(distinct c1, c2) from table1; > > report: > > No function matches the given name and argument types. You might > need to add explicit type casts. > > What should I write in spite of? > > select count(distinct c1, c2) from table1; > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Fri, 26 Dec 2008 15:46:48 +0100 "Pavel Stehule" <pavel.stehule@gmail.com> wrote: > count has only one argument, then what was changed between 8.1 and 8.2 to change the docs? None of the functions listed in: http://www.postgresql.org/docs/8.2/static/functions-aggregate.html seems to support aggregate(distinct exp [,exp]) Does the change reflect the change in the possibility to write user defined aggregates that support more then one distinct expression? The first thing that comes to my mind to emulate count(distinct a,b) would be to create table test.dist (a int, b int); insert into test.dist values(1,0); insert into test.dist values(1,0); insert into test.dist values(1,1); insert into test.dist values(0,0); select count(*) from (select distinct a,b from test.dist ) a; but still I can't think of anything that would work with aggregate(distinct a,b) not just count. -- Ivan Sergio Borgonovo http://www.webthatworks.it
2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Fri, 26 Dec 2008 15:46:48 +0100 > "Pavel Stehule" <pavel.stehule@gmail.com> wrote: > >> count has only one argument, > > then what was changed between 8.1 and 8.2 to change the docs? > None of the functions listed in: > http://www.postgresql.org/docs/8.2/static/functions-aggregate.html > seems to support > aggregate(distinct exp [,exp]) http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE > > Does the change reflect the change in the possibility to write user > defined aggregates that support more then one distinct expression? CREATE AGGREGATE name ( input_data_type [ , ... ] ) ( SFUNC = sfunc, STYPE = state_data_type [ , FINALFUNC = ffunc ] [ , INITCOND = initial_condition ] [ , SORTOP = sort_operator ] ) you are able to write multi param aggregates. regards Pavel Stehule > > The first thing that comes to my mind to emulate > count(distinct a,b) > would be to > > create table test.dist (a int, b int); > insert into test.dist values(1,0); > insert into test.dist values(1,0); > insert into test.dist values(1,1); > insert into test.dist values(0,0); > select count(*) from (select distinct a,b from test.dist ) a; > > but still I can't think of anything that would work with > aggregate(distinct a,b) > not just count. > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Fri, 26 Dec 2008 16:23:52 +0100 "Pavel Stehule" <pavel.stehule@gmail.com> wrote: > 2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>: > > On Fri, 26 Dec 2008 15:46:48 +0100 > > "Pavel Stehule" <pavel.stehule@gmail.com> wrote: > > > >> count has only one argument, > > > > then what was changed between 8.1 and 8.2 to change the docs? > > None of the functions listed in: > > http://www.postgresql.org/docs/8.2/static/functions-aggregate.html > > seems to support > > aggregate(distinct exp [,exp]) > > http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html > > http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE yeah but no function seems to support aggregate(distinct x, y) > CREATE AGGREGATE name ( input_data_type [ , ... ] ) ( > SFUNC = sfunc, > STYPE = state_data_type > [ , FINALFUNC = ffunc ] > [ , INITCOND = initial_condition ] > [ , SORTOP = sort_operator ] > ) OK... but how am I going to implement an user defined aggregate that support without resorting to C? myaggfunc(distinct x, y)? Otherwise to what is it referring http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html aggregate_name (DISTINCT expression [ , ... ] ) -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo wrote: > I noticed that starting from 8.2 the documentation at > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html > say that multiple distinct expressions are supported > > aggregate_name (DISTINCT expression [, expression] ) In 8.4, you'll be able to do: WITH d AS ( SELECT DISTINCT c1, c2 FROM table1 ) SELECT count(*) FROM d; and very likely an OLAP version. :) Cheers, David. > > While previous docs just listed one: > > aggregate_name (DISTINCT expression) > > Still I'm using 8.3 and > > select count(distinct c1, c2) from table1; > > report: > > No function matches the given name and argument types. You might > need to add explicit type casts. > > What should I write in spite of? > > select count(distinct c1, c2) from table1; > > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>: > On Fri, 26 Dec 2008 16:23:52 +0100 > "Pavel Stehule" <pavel.stehule@gmail.com> wrote: > >> 2008/12/26 Ivan Sergio Borgonovo <mail@webthatworks.it>: >> > On Fri, 26 Dec 2008 15:46:48 +0100 >> > "Pavel Stehule" <pavel.stehule@gmail.com> wrote: >> > >> >> count has only one argument, >> > >> > then what was changed between 8.1 and 8.2 to change the docs? >> > None of the functions listed in: >> > http://www.postgresql.org/docs/8.2/static/functions-aggregate.html >> > seems to support >> > aggregate(distinct exp [,exp]) >> >> http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html >> >> http://www.postgresql.org/docs/8.2/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE > > > yeah but no function seems to support > > aggregate(distinct x, y) > >> CREATE AGGREGATE name ( input_data_type [ , ... ] ) ( >> SFUNC = sfunc, >> STYPE = state_data_type >> [ , FINALFUNC = ffunc ] >> [ , INITCOND = initial_condition ] >> [ , SORTOP = sort_operator ] >> ) > > > OK... but how am I going to implement an user defined aggregate that > support without resorting to C? > > myaggfunc(distinct x, y)? > > Otherwise to what is it referring > > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html > > aggregate_name (DISTINCT expression [ , ... ] ) > ok, I tested and it isn't supported yet. This is documentation bug. DISTINCT is allowed only for single argument aggregate. Regards Pavel Stehule > -- > Ivan Sergio Borgonovo > http://www.webthatworks.it > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Fri, 26 Dec 2008 10:43:25 -0800 David Fetter <david@fetter.org> wrote: > On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo > wrote: > > I noticed that starting from 8.2 the documentation at > > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html > > say that multiple distinct expressions are supported > > > > aggregate_name (DISTINCT expression [, expression] ) > In 8.4, you'll be able to do: > WITH d AS ( > SELECT DISTINCT c1, c2 FROM table1 > ) > SELECT count(*) FROM d; Nice, but what will be the difference from select count(*) from (select distinct c1, c2 from t); ? Optimisation? Furthermore... I was actually looking at docs because I needed to find a way supported by both postgresql and mysql and I've heard that mysql is not that good at subselect and I doubt it supports WITH AS. (OK not really a postgresql problem...). Meanwhile what would you suggest as a general approach to stuff like select count(distinct c1, c2) from t; regardless of mysql support? and considering mysql support? I was thinking to find some way to exploit group by, but I didn't come to anything useful yet. > and very likely an OLAP version. :) What's "an OLAP version" of WITH d AS... -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Fri, Dec 26, 2008 at 08:03:30PM +0100, Ivan Sergio Borgonovo wrote: > On Fri, 26 Dec 2008 10:43:25 -0800 > David Fetter <david@fetter.org> wrote: > > > On Fri, Dec 26, 2008 at 03:34:33PM +0100, Ivan Sergio Borgonovo > > wrote: > > > I noticed that starting from 8.2 the documentation at > > > http://www.postgresql.org/docs/8.2/interactive/sql-expressions.html > > > say that multiple distinct expressions are supported > > > > > > aggregate_name (DISTINCT expression [, expression] ) > > > In 8.4, you'll be able to do: > > > WITH d AS ( > > SELECT DISTINCT c1, c2 FROM table1 > > ) > > SELECT count(*) FROM d; > > Nice, but what will be the difference from > select count(*) from (select distinct c1, c2 from t); > ? > Optimisation? None especially. > Furthermore... I was actually looking at docs because I needed to > find a way supported by both postgresql and mysql Generally, it's *not* a good idea to try to support more than one back-end. You wind up maintaining several disparate code bases, all of which must do exactly the same thing, or you create your own RDBMS in your client code, or worst of all, some of each. Unless the most important attribute of the software, i.e. you can jettison any other feature to support it, is to support more than one RDBMS back-end, don't even try. Examples of software which needs to support multiple RDBMS back-ends include, and are pretty much limited to, ERD generators and migration tools. > > and very likely an OLAP version. :) > > What's "an OLAP version" of WITH d AS... OLAP includes clauses like WINDOW() and OVER(), but since it's not committed yet, I don't want to get too far into it :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation
From
Ivan Sergio Borgonovo
Date:
On Fri, 26 Dec 2008 12:04:48 -0800 David Fetter <david@fetter.org> wrote: > On Fri, Dec 26, 2008 at 08:03:30PM +0100, Ivan Sergio Borgonovo > wrote: > > > > aggregate_name (DISTINCT expression [, expression] ) > > > In 8.4, you'll be able to do: > > > WITH d AS ( > > > SELECT DISTINCT c1, c2 FROM table1 > > > ) > > > SELECT count(*) FROM d; > > Nice, but what will be the difference from > > select count(*) from (select distinct c1, c2 from t); > > ? > > Optimisation? > None especially. So what would be the advantage compared to subselect? > > Furthermore... I was actually looking at docs because I needed to > > find a way supported by both postgresql and mysql > > Generally, it's *not* a good idea to try to support more than one > back-end. You wind up maintaining several disparate code bases, Not really my main target... I was just investigating if it could come for free ;) -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation
From
Tom Lane
Date:
Ivan Sergio Borgonovo <mail@webthatworks.it> writes: >>> David Fetter <david@fetter.org> wrote: >>>> In 8.4, you'll be able to do: >>>> WITH d AS ( >>>> SELECT DISTINCT c1, c2 FROM table1 >>>> ) >>>> SELECT count(*) FROM d; >>> Nice, but what will be the difference from >>> select count(*) from (select distinct c1, c2 from t); >>> ? >>> Optimisation? >> None especially. > So what would be the advantage compared to subselect? None, David just has WITH on the brain ;-) The subselect syntax certainly seems like the one most likely to work across different SQL implementations. WITH is a pretty recent addition to the SQL spec, and DISTINCT with multiple aggregate arguments isn't in the spec at all. The COUNT(DISTINCT ROW(x,y)) hack is a cute idea but I'm dubious that that's portable either (it certainly doesn't work in pre-8.4 PG). regards, tom lane
Re: WITH AS vs subselect was: count (DISTINCT expression [ , ... ] ) and documentation
From
David Fetter
Date:
On Fri, Dec 26, 2008 at 07:13:48PM -0500, Tom Lane wrote: > Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > >>> David Fetter <david@fetter.org> wrote: > >>>> In 8.4, you'll be able to do: > > >>>> WITH d AS ( > >>>> SELECT DISTINCT c1, c2 FROM table1 > >>>> ) > >>>> SELECT count(*) FROM d; > > >>> Nice, but what will be the difference from > >>> select count(*) from (select distinct c1, c2 from t); > >>> ? > >>> Optimisation? > > >> None especially. > > > So what would be the advantage compared to subselect? > > None, David just has WITH on the brain ;-) LOL! You're only saying that because it's true ;) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, 26 Dec 2008 19:13:48 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > The subselect syntax certainly seems like the one most likely to > work across different SQL implementations. WITH is a pretty subselects actually works on mysql too but on a 1M table with about 300K unique columns it performs more than 4 times slower than select (distinct a,b) from table 18sec vs. 4sec Times were similar for innodb and myisam. Postgresql needs 17sec with subselect. I didn't try to see how both db could perform with indexes. mysql performance is impressive. I thought that most of the time would be spent on "distinct" where postgresql shouldn't suffer from its "count" implementation. But well still 300K rows to count on 1M aren't few. -- Ivan Sergio Borgonovo http://www.webthatworks.it