Thread: order by and aggregate

order by and aggregate

From

Tomasz Myrta

Date:

06 January 2003, 07:41:06

Hi
I created my own aggregate function working as max(sum(value))
It adds positive and negative values and finds maximum of this sum.
To work properly this function needs data to be sorted.

select  maxsum(value)
from some_table  order by some_field

doesn't work:
ERROR:  Attribute some_table.some_field must be GROUPed or used in an 
aggregate function


I found I can obey this using subselect:
select  maxsum(X.value)
from (select value  from some_table  order by some_field) X

I can't create subselect, because I want to change this query into a 
view. In my case postgresql doesn't use indexes on subselect inside a view.

What should I do?

Regards,
Tomasz Myrta

Re: order by and aggregate

From

Richard Huxton

Date:

06 January 2003, 08:41:35

On Monday 06 Jan 2003 12:44 pm, Tomasz Myrta wrote:
> Hi
> I created my own aggregate function working as max(sum(value))
> It adds positive and negative values and finds maximum of this sum.
> To work properly this function needs data to be sorted.

I'm not sure that an aggregate function should require data to be sorted
before processing. Could you show details of your function - there may be a
way to rewrite it to handle unsorted data.

> select
>    maxsum(value)
> from some_table
>    order by some_field
>
> doesn't work:
> ERROR:  Attribute some_table.some_field must be GROUPed or used in an
> aggregate function

The "order by" isn't necessarily handled before calculating maxsum() anyway.

--  Richard Huxton

Re: order by and aggregate

From

Tomasz Myrta

Date:

06 January 2003, 09:05:47

Richard Huxton wrote:

> On Monday 06 Jan 2003 12:44 pm, Tomasz Myrta wrote:
>
> >Hi
> >I created my own aggregate function working as max(sum(value))
> >It adds positive and negative values and finds maximum of this sum.
> >To work properly this function needs data to be sorted.
>
>
> I'm not sure that an aggregate function should require data to be sorted
> before processing. Could you show details of your function - there may 
> be a
> way to rewrite it to handle unsorted data.

Standard Postgresql aggregate functions don't need sorted data, but my 
function needs. Look at the data:
<value>   <sum>
3      3
-2        1
6         7 *** max_sum=7
-3        4
2         6

For example, if you inverse your data, you have:
<value>   <sum>
2      2
-3        -1
6         5
-2        3
3         6 *** max_sum=6

As you see, data order is very important in this aggregate.

The function is very easy:
CREATE OR REPLACE FUNCTION maxsum_counter(_int4, int4) RETURNS _int4 AS '
DECLARE  old_val     ALIAS for $1;  curr_val    ALIAS for $2;  new_max    int4;  new_sum    int4;
BEGIN  new_sum=old_val[1]+curr_val;  if new_sum > old_val[1] then    new_max=new_sum;  else    new_max=old_val[2];  end
if; return ''{'' || new_sum || '','' || new_max || ''}'';
 
END;
' LANGUAGE 'plpgsql';

OR REPLACE FUNCTION extract_maxsum(_int4) RETURNS "int4" AS '
DECLARE  old_val    ALIAS for $1;
BEGIN  return old_val[2];
END;
' LANGUAGE 'plpgsql';

DROP AGGREGATE maxsum int4;
CREATE AGGREGATE maxsum(
BASETYPE = int4,
SFUNC = maxsum_counter,
STYPE = _int4,
FINALFUNC = extract_maxsum,
INITCOND = '{0,0}');

> >select
> >   maxsum(value)
> >from some_table
> >   order by some_field
> >
> >doesn't work:
> >ERROR:  Attribute some_table.some_field must be GROUPed or used in an
> >aggregate function
>
>
> The "order by" isn't necessarily handled before calculating maxsum() 
> anyway.

Nice point.
Anyway it doesn't matter, because it isn't handled at all.

Tomasz Myrta

Sorry, to many clients already

From

"cristi"

Date:

06 January 2003, 09:14:04

When I'm trying to connect I have this error message:

Something unusual has occured to cause the driver to fail.Please report this
exception: java.sql.SQLException: Sorry, to many clients already.


What should I do?

Re: order by and aggregate

From

dev@archonet.com

Date:

06 January 2003, 10:59:51

> Richard Huxton wrote:
>
>> On Monday 06 Jan 2003 12:44 pm, Tomasz Myrta wrote:
>>
>> >Hi
>> >I created my own aggregate function working as max(sum(value))
>> >It adds positive and negative values and finds maximum of this sum.
>> >To work properly this function needs data to be sorted.
>>
>>
>> I'm not sure that an aggregate function should require data to be sorted
>> before processing. Could you show details of your function - there may
>> be a
>> way to rewrite it to handle unsorted data.
>
> Standard Postgresql aggregate functions don't need sorted data, but my
> function needs. Look at the data:
> <value>   <sum>
> 3      3
> -2        1
> 6         7 *** max_sum=7
> -3        4
> 2         6
>
> For example, if you inverse your data, you have:
> <value>   <sum>
> 2      2
> -3        -1
> 6         5
> -2        3
> 3         6 *** max_sum=6
>
> As you see, data order is very important in this aggregate.

Ah - so it's maximum of a running-total rather than a sum.

AFAIK you are out of luck with aggregate functions. The order data is
supplied to them is *not* defined - the "order by" operates just before
results are output. Not much you can do about this, the whole basis of
relational DBs are sets and sets don't have any idea of ordering.

However, there are three options:

You could define a cursor (or a table-function in 7.3) which would handle
the order-by and then calculate the running-total on the fly. You then
just need a standard max(running_total) call to pick out the value.
Actually, if you use the cursor you might need to implement the max() in
the application.

Alternatively, you could add a running_total column and use a trigger to
ensure the value is kept up to date.

Finally, you could do the work in the application.

Difficult to say which is the best for you. If you have 7.3, don't need
these figures often and do a lot of updates/inserts I'd recommend option
1. If you're using 7.2, don't do a lot of inserts and want the figures
frequently I'd choose option 2.

HTH

- Richard Huxton

Re: order by and aggregate

From

Tomasz Myrta

Date:

06 January 2003, 11:32:12

dev@archonet.com wrote:

> Ah - so it's maximum of a running-total rather than a sum.

Sorry, my english still has a lot of black-holes :-(

> AFAIK you are out of luck with aggregate functions. The order data is
> supplied to them is *not* defined - the "order by" operates just before
> results are output. Not much you can do about this, the whole basis of
> relational DBs are sets and sets don't have any idea of ordering.

You are right, but if it is possible to use some tricks, why not to use 
them?

> However, there are three options:
>
> You could define a cursor (or a table-function in 7.3) which would handle
> the order-by and then calculate the running-total on the fly. You then
> just need a standard max(running_total) call to pick out the value.
> Actually, if you use the cursor you might need to implement the max() in
> the application.

>
>
> Alternatively, you could add a running_total column and use a trigger to
> ensure the value is kept up to date.

>
>
> Finally, you could do the work in the application.
>
> Difficult to say which is the best for you. If you have 7.3, don't need
> these figures often and do a lot of updates/inserts I'd recommend option
> 1. If you're using 7.2, don't do a lot of inserts and want the figures
> frequently I'd choose option 2.

All of these options look a bit difficult.
Currently I use pl/pgsql function and query with sorted subselect:
select maxsum(X.val) from (select val from some_table order by key) X
It isn't a very big problem for me. I was just wondering if I can change 
this pl/pgsql function in a view. Few weeks ago I asked on 
pgsql-performance about views and subselects. The conclusion was that 
postgresql planner doesn't work well when joining subselects - it wastes 
time on querying all rows of subselect.

1. I think it could rather slow down than speed up my solution.
2. I can't store this value, because each time data range changes.
3. I want to do as much as possible inside postgres.

I think I will have to stay with pl/pgsql function and sorted subquery.

Thanks for your help,
Tomasz Myrta

Re: order by and aggregate

From

Tom Lane

Date:

06 January 2003, 11:34:52

Tomasz Myrta <jasiek@klaster.net> writes:
> I found I can obey this using subselect:
> select
>    maxsum(X.value)
> from
>   (select value
>    from some_table
>    order by some_field) X

> I can't create subselect, because I want to change this query into a 
> view. In my case postgresql doesn't use indexes on subselect inside a view.

I don't understand what you think is wrong with this solution.  It works
fine for me:

regression=# create view vv as
regression-# select max(unique1) from (select unique1 from tenk1
regression(# order by unique2) x;
CREATE VIEW
regression=# explain select * from vv;                                           QUERY PLAN

---------------------------------------------------------------------------------------------------Subquery Scan vv
(cost=523.00..523.00rows=1 width=8)  ->  Aggregate  (cost=523.00..523.00 rows=1 width=8)        ->  Subquery Scan x
(cost=0.00..498.00rows=10000 width=8)              ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..498.00
rows=10000width=8)
 
(4 rows)
        regards, tom lane

Re: order by and aggregate

From

Tom Lane

Date:

06 January 2003, 11:37:02

Tomasz Myrta <jasiek@klaster.net> writes:
> Standard Postgresql aggregate functions don't need sorted data, but my 
> function needs. Look at the data:
> <value>   <sum>
> 3      3
> -2        1
> 6         7 *** max_sum=7
> -3        4
> 2         6

But if the input data is sorted into increasing order, then the largest
running sum value is always at the end.  Therefore max(sum()) is
equivalent to sum(); therefore you do not need the special aggregate,
nor the ordering.
        regards, tom lane

Re: order by and aggregate

From

Tomasz Myrta

Date:

06 January 2003, 12:05:01

Tom Lane wrote:

> Tomasz Myrta  writes:
>
> >Standard Postgresql aggregate functions don't need sorted data, but my
> >function needs. Look at the data:
> >
> >3      3
> >-2        1
> >6         7 *** max_sum=7
> >-3        4
> >2         6
>
>
> But if the input data is sorted into increasing order, then the largest
> running sum value is always at the end.  Therefore max(sum()) is
> equivalent to sum(); therefore you do not need the special aggregate,
> nor the ordering.
>
>             regards, tom lane

Sorting data by "value" was only an example,
In my case data is sorted by another key and has mixed positive and 
negative values.
Tomasz Myrta

Re: order by and aggregate

From

Tomasz Myrta

Date:

06 January 2003, 12:05:10

Tom Lane wrote:

> I don't understand what you think is wrong with this solution.  It works
> fine for me:
>
> regression=# create view vv as
> regression-# select max(unique1) from (select unique1 from tenk1
> regression(# order by unique2) x;
> CREATE VIEW
> regression=# explain select * from vv;
>                                             QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------
>  Subquery Scan vv  (cost=523.00..523.00 rows=1 width=8)
>    ->  Aggregate  (cost=523.00..523.00 rows=1 width=8)
>          ->  Subquery Scan x  (cost=0.00..498.00 rows=10000 width=8)
>                ->  Index Scan using tenk1_unique2 on tenk1 
> (cost=0.00..498.00 rows=10000 width=8)
> (4 rows)
>
OK, your view is ok.

I think, you don't remember my questions about "joining views" and 
"sub-select with aggregate" dated on 2002-10-23, so I qoute your answer 
on that question (joining views):>I think this is the same issue that Stephan identified in his response>to your other
posting("sub-select with aggregate").  When you write>    FROM x join y using (col) WHERE x.col = const>the
WHERE-restrictionis only applied to x.  I'm afraid you'll need>to write>    FROM x join y using (col) WHERE x.col =
constAND y.col = const>Ideally you should be able to write just>    FROM x join y using (col) WHERE col = const>but I
thinkthat will be taken the same as "x.col = const" :-(

I have the same problem here, query is a bit complicated and has 
some_table join (subselect) X using (some_field) which doesn't work.

Regards,
Tomasz Myrta

Re: Sorry, to many clients already

From

Robert Treat

Date:

06 January 2003, 13:20:44

On Mon, 2003-01-06 at 09:12, cristi wrote:
> When I'm trying to connect I have this error message:
> 
> Something unusual has occured to cause the driver to fail.Please report this
> exception: java.sql.SQLException: Sorry, to many clients already.
> 
> 
> What should I do?
> 

I might suggest posting to the -jdbc list in case this is something
different, but generally speaking the "too many clients" error means
that your application is attempting to connect to the db and the db is
responding that it already has created the maximum number of connections
that are allowed.  This might be an indication of other problems, but if
you simply want to raise the limit you'll need to modify max_connections
in the postgresql.conf

Robert Treat