Thread: order by and aggregate
Hi I created my own aggregate function working as max(sum(value)) It adds positive and negative values and finds maximum of this sum. To work properly this function needs data to be sorted. select maxsum(value) from some_table order by some_field doesn't work: ERROR: Attribute some_table.some_field must be GROUPed or used in an aggregate function I found I can obey this using subselect: select maxsum(X.value) from (select value from some_table order by some_field) X I can't create subselect, because I want to change this query into a view. In my case postgresql doesn't use indexes on subselect inside a view. What should I do? Regards, Tomasz Myrta
On Monday 06 Jan 2003 12:44 pm, Tomasz Myrta wrote: > Hi > I created my own aggregate function working as max(sum(value)) > It adds positive and negative values and finds maximum of this sum. > To work properly this function needs data to be sorted. I'm not sure that an aggregate function should require data to be sorted before processing. Could you show details of your function - there may be a way to rewrite it to handle unsorted data. > select > maxsum(value) > from some_table > order by some_field > > doesn't work: > ERROR: Attribute some_table.some_field must be GROUPed or used in an > aggregate function The "order by" isn't necessarily handled before calculating maxsum() anyway. -- Richard Huxton
Richard Huxton wrote: > On Monday 06 Jan 2003 12:44 pm, Tomasz Myrta wrote: > > >Hi > >I created my own aggregate function working as max(sum(value)) > >It adds positive and negative values and finds maximum of this sum. > >To work properly this function needs data to be sorted. > > > I'm not sure that an aggregate function should require data to be sorted > before processing. Could you show details of your function - there may > be a > way to rewrite it to handle unsorted data. Standard Postgresql aggregate functions don't need sorted data, but my function needs. Look at the data: <value> <sum> 3 3 -2 1 6 7 *** max_sum=7 -3 4 2 6 For example, if you inverse your data, you have: <value> <sum> 2 2 -3 -1 6 5 -2 3 3 6 *** max_sum=6 As you see, data order is very important in this aggregate. The function is very easy: CREATE OR REPLACE FUNCTION maxsum_counter(_int4, int4) RETURNS _int4 AS ' DECLARE old_val ALIAS for $1; curr_val ALIAS for $2; new_max int4; new_sum int4; BEGIN new_sum=old_val[1]+curr_val; if new_sum > old_val[1] then new_max=new_sum; else new_max=old_val[2]; end if; return ''{'' || new_sum || '','' || new_max || ''}''; END; ' LANGUAGE 'plpgsql'; OR REPLACE FUNCTION extract_maxsum(_int4) RETURNS "int4" AS ' DECLARE old_val ALIAS for $1; BEGIN return old_val[2]; END; ' LANGUAGE 'plpgsql'; DROP AGGREGATE maxsum int4; CREATE AGGREGATE maxsum( BASETYPE = int4, SFUNC = maxsum_counter, STYPE = _int4, FINALFUNC = extract_maxsum, INITCOND = '{0,0}'); > >select > > maxsum(value) > >from some_table > > order by some_field > > > >doesn't work: > >ERROR: Attribute some_table.some_field must be GROUPed or used in an > >aggregate function > > > The "order by" isn't necessarily handled before calculating maxsum() > anyway. Nice point. Anyway it doesn't matter, because it isn't handled at all. Tomasz Myrta
When I'm trying to connect I have this error message: Something unusual has occured to cause the driver to fail.Please report this exception: java.sql.SQLException: Sorry, to many clients already. What should I do?
> Richard Huxton wrote: > >> On Monday 06 Jan 2003 12:44 pm, Tomasz Myrta wrote: >> >> >Hi >> >I created my own aggregate function working as max(sum(value)) >> >It adds positive and negative values and finds maximum of this sum. >> >To work properly this function needs data to be sorted. >> >> >> I'm not sure that an aggregate function should require data to be sorted >> before processing. Could you show details of your function - there may >> be a >> way to rewrite it to handle unsorted data. > > Standard Postgresql aggregate functions don't need sorted data, but my > function needs. Look at the data: > <value> <sum> > 3 3 > -2 1 > 6 7 *** max_sum=7 > -3 4 > 2 6 > > For example, if you inverse your data, you have: > <value> <sum> > 2 2 > -3 -1 > 6 5 > -2 3 > 3 6 *** max_sum=6 > > As you see, data order is very important in this aggregate. Ah - so it's maximum of a running-total rather than a sum. AFAIK you are out of luck with aggregate functions. The order data is supplied to them is *not* defined - the "order by" operates just before results are output. Not much you can do about this, the whole basis of relational DBs are sets and sets don't have any idea of ordering. However, there are three options: You could define a cursor (or a table-function in 7.3) which would handle the order-by and then calculate the running-total on the fly. You then just need a standard max(running_total) call to pick out the value. Actually, if you use the cursor you might need to implement the max() in the application. Alternatively, you could add a running_total column and use a trigger to ensure the value is kept up to date. Finally, you could do the work in the application. Difficult to say which is the best for you. If you have 7.3, don't need these figures often and do a lot of updates/inserts I'd recommend option 1. If you're using 7.2, don't do a lot of inserts and want the figures frequently I'd choose option 2. HTH - Richard Huxton
dev@archonet.com wrote: > Ah - so it's maximum of a running-total rather than a sum. Sorry, my english still has a lot of black-holes :-( > AFAIK you are out of luck with aggregate functions. The order data is > supplied to them is *not* defined - the "order by" operates just before > results are output. Not much you can do about this, the whole basis of > relational DBs are sets and sets don't have any idea of ordering. You are right, but if it is possible to use some tricks, why not to use them? > However, there are three options: > > You could define a cursor (or a table-function in 7.3) which would handle > the order-by and then calculate the running-total on the fly. You then > just need a standard max(running_total) call to pick out the value. > Actually, if you use the cursor you might need to implement the max() in > the application. > > > Alternatively, you could add a running_total column and use a trigger to > ensure the value is kept up to date. > > > Finally, you could do the work in the application. > > Difficult to say which is the best for you. If you have 7.3, don't need > these figures often and do a lot of updates/inserts I'd recommend option > 1. If you're using 7.2, don't do a lot of inserts and want the figures > frequently I'd choose option 2. All of these options look a bit difficult. Currently I use pl/pgsql function and query with sorted subselect: select maxsum(X.val) from (select val from some_table order by key) X It isn't a very big problem for me. I was just wondering if I can change this pl/pgsql function in a view. Few weeks ago I asked on pgsql-performance about views and subselects. The conclusion was that postgresql planner doesn't work well when joining subselects - it wastes time on querying all rows of subselect. 1. I think it could rather slow down than speed up my solution. 2. I can't store this value, because each time data range changes. 3. I want to do as much as possible inside postgres. I think I will have to stay with pl/pgsql function and sorted subquery. Thanks for your help, Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes: > I found I can obey this using subselect: > select > maxsum(X.value) > from > (select value > from some_table > order by some_field) X > I can't create subselect, because I want to change this query into a > view. In my case postgresql doesn't use indexes on subselect inside a view. I don't understand what you think is wrong with this solution. It works fine for me: regression=# create view vv as regression-# select max(unique1) from (select unique1 from tenk1 regression(# order by unique2) x; CREATE VIEW regression=# explain select * from vv; QUERY PLAN ---------------------------------------------------------------------------------------------------Subquery Scan vv (cost=523.00..523.00rows=1 width=8) -> Aggregate (cost=523.00..523.00 rows=1 width=8) -> Subquery Scan x (cost=0.00..498.00rows=10000 width=8) -> Index Scan using tenk1_unique2 on tenk1 (cost=0.00..498.00 rows=10000width=8) (4 rows) regards, tom lane
Tomasz Myrta <jasiek@klaster.net> writes: > Standard Postgresql aggregate functions don't need sorted data, but my > function needs. Look at the data: > <value> <sum> > 3 3 > -2 1 > 6 7 *** max_sum=7 > -3 4 > 2 6 But if the input data is sorted into increasing order, then the largest running sum value is always at the end. Therefore max(sum()) is equivalent to sum(); therefore you do not need the special aggregate, nor the ordering. regards, tom lane
Tom Lane wrote: > Tomasz Myrta writes: > > >Standard Postgresql aggregate functions don't need sorted data, but my > >function needs. Look at the data: > > > >3 3 > >-2 1 > >6 7 *** max_sum=7 > >-3 4 > >2 6 > > > But if the input data is sorted into increasing order, then the largest > running sum value is always at the end. Therefore max(sum()) is > equivalent to sum(); therefore you do not need the special aggregate, > nor the ordering. > > regards, tom lane Sorting data by "value" was only an example, In my case data is sorted by another key and has mixed positive and negative values. Tomasz Myrta
Tom Lane wrote: > I don't understand what you think is wrong with this solution. It works > fine for me: > > regression=# create view vv as > regression-# select max(unique1) from (select unique1 from tenk1 > regression(# order by unique2) x; > CREATE VIEW > regression=# explain select * from vv; > QUERY PLAN > > --------------------------------------------------------------------------------------------------- > Subquery Scan vv (cost=523.00..523.00 rows=1 width=8) > -> Aggregate (cost=523.00..523.00 rows=1 width=8) > -> Subquery Scan x (cost=0.00..498.00 rows=10000 width=8) > -> Index Scan using tenk1_unique2 on tenk1 > (cost=0.00..498.00 rows=10000 width=8) > (4 rows) > OK, your view is ok. I think, you don't remember my questions about "joining views" and "sub-select with aggregate" dated on 2002-10-23, so I qoute your answer on that question (joining views):>I think this is the same issue that Stephan identified in his response>to your other posting("sub-select with aggregate"). When you write> FROM x join y using (col) WHERE x.col = const>the WHERE-restrictionis only applied to x. I'm afraid you'll need>to write> FROM x join y using (col) WHERE x.col = constAND y.col = const>Ideally you should be able to write just> FROM x join y using (col) WHERE col = const>but I thinkthat will be taken the same as "x.col = const" :-( I have the same problem here, query is a bit complicated and has some_table join (subselect) X using (some_field) which doesn't work. Regards, Tomasz Myrta
On Mon, 2003-01-06 at 09:12, cristi wrote: > When I'm trying to connect I have this error message: > > Something unusual has occured to cause the driver to fail.Please report this > exception: java.sql.SQLException: Sorry, to many clients already. > > > What should I do? > I might suggest posting to the -jdbc list in case this is something different, but generally speaking the "too many clients" error means that your application is attempting to connect to the db and the db is responding that it already has created the maximum number of connections that are allowed. This might be an indication of other problems, but if you simply want to raise the limit you'll need to modify max_connections in the postgresql.conf Robert Treat