Thread: order function in aggregate

order function in aggregate

From
Michael Toews
Date:
Hi,

I'm relatively new to object oriented databases and postgres (~ 4 
months), so I'm not sure what is wrong with my custom aggregate 
function. I'm using Postgres 8.3.3 (results are same on Windows and 
Ubuntu versions).

Here is my data required to explain my problem:
------------------------------------------
CREATE TABLE aggr_test
(id serial NOT NULL,sub text,grp text,CONSTRAINT aggr_test_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

INSERT INTO aggr_test (id, sub, grp) VALUES (1, 'one', 'pom pom');
INSERT INTO aggr_test (id, sub, grp) VALUES (2, 'two', 'pom pom');
INSERT INTO aggr_test (id, sub, grp) VALUES (3, 'three', 'pom pom');
INSERT INTO aggr_test (id, sub, grp) VALUES (4, 'two', 'la la');
INSERT INTO aggr_test (id, sub, grp) VALUES (5, 'three', 'la la');
INSERT INTO aggr_test (id, sub, grp) VALUES (6, 'one', 'la la');

CREATE OR REPLACE FUNCTION concat(text, text)RETURNS text AS
$BODY$DECLAREt text;
BEGINIF $1 ISNULL OR $2 ISNULL THEN  t = COALESCE($1,$2);ELSIF character_length($1) > 0 THEN  t = $1 ||', '|| $2;ELSE
t= $2;END IF;RETURN t;
 
END;$BODY$LANGUAGE 'plpgsql' IMMUTABLECOST 100;

CREATE AGGREGATE commacat("text") (SFUNC=concat,STYPE=text,SORTOP="<"
);
------------------------------------------
Now, I would like to group a summary of the strings in "sub", but they 
must be ordered alphabetically. Here I attempt the SQL:
  select commacat(sub), grp from aggr_test group by grp;

However, on my system it will output the "commacat" column as:
  "one, two, three";"pom pom"  "two, three, one";"la la"

(notice that the items in the first column are ordered differently, 
since the input values were entered in non-alphabetically)

This is where I get confused, since in the aggregate function, I have 
specified `SORTOP="<"`, which according to the documentation 
(http://www.postgresql.org/docs/8.3/interactive/sql-createaggregate.html) 
"must be equivalent to":
  SELECT sub FROM aggr_test ORDER BY sub USING <; -- LIMIT 1;

or
  SELECT sub FROM aggr_test ORDER BY sub USING >; -- LIMIT 1;

(I've modified the example to show the desired effect of ASC and DESC 
sorting, respectively)

So my question is: why is `SORTOP="<"` in my "commacat" aggregate 
function not working? Any suggestions? Is this a bug?

Note: I am aware that I can achieve my goal using a subquery:
  SELECT commacat(sub), grp FROM          (SELECT * FROM aggr_test ORDER BY grp, sub) AS foo  GROUP BY grp;

however, I'm looking to see if this ordering can done naturally within 
the aggregate function.

Thanks in advance!
+mt



Re: order function in aggregate

From
Richard Huxton
Date:
Michael Toews wrote:
> Now, I would like to group a summary of the strings in "sub", but they 
> must be ordered alphabetically. Here I attempt the SQL:
> 
>   select commacat(sub), grp from aggr_test group by grp;
> 
> However, on my system it will output the "commacat" column as:
> 
>   "one, two, three";"pom pom"
>   "two, three, one";"la la"
> 
> (notice that the items in the first column are ordered differently, 
> since the input values were entered in non-alphabetically)
> 
> This is where I get confused, since in the aggregate function, I have 
> specified `SORTOP="<"`, which according to the documentation 
> (http://www.postgresql.org/docs/8.3/interactive/sql-createaggregate.html) 
> "must be equivalent to":
> 
>   SELECT sub FROM aggr_test ORDER BY sub USING <; -- LIMIT 1;

That's for things like min()/max() where you only want the first/last 
value of some set. You want all of them.

You could accumulate the values in an array and then sort that with the 
final-func that create aggregate supports.


--   Richard Huxton  Archonet Ltd


Re: order function in aggregate

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> Michael Toews wrote:
>> This is where I get confused, since in the aggregate function, I have 
>> specified `SORTOP="<"`, which according to the documentation 
>> (http://www.postgresql.org/docs/8.3/interactive/sql-createaggregate.html) 
>> "must be equivalent to":
>> 
>> SELECT sub FROM aggr_test ORDER BY sub USING <; -- LIMIT 1;

> That's for things like min()/max() where you only want the first/last 
> value of some set. You want all of them.

Right: sortop is an assertion that the aggregate behaves like min() or
max(), not a request for the system to do something strange to the
aggregate's input.

> You could accumulate the values in an array and then sort that with the 
> final-func that create aggregate supports.

The traditional way to get presorted input to an aggregate is

select myagg(x) from (select whatever as x from ... order by ...) ss;

You can avoid that with an internal sort in the aggregate, as Richard
suggests, but it might not be worth the trouble --- and in any case
it'd be hard to sort internally on anything except the exact values
being aggregated.  With the subselect approach you can order on anything
at all.
        regards, tom lane


Re: order function in aggregate

From
Mike Toews
Date:
Richard Huxton wrote:
> Michael Toews wrote:
>
> You could accumulate the values in an array and then sort that with 
> the final-func that create aggregate supports.

Thanks for the help. Here was my final functions to aggregate things 
into a comma serpented text element (if anyone is interested):

CREATE OR REPLACE FUNCTION commacat_fn(anyarray) RETURNS text AS
$BODY$select array_to_string(sort($1),', ');$BODY$ LANGUAGE 'sql' IMMUTABLE STRICT COST 100;
ALTER FUNCTION commacat_fn(anyarray) OWNER TO postgres;

CREATE AGGREGATE commacat(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=commacat_fn, INITCOND='{}'
);

---------------

Lastly a random quick example:

select attrelid, commacat(attname) as attnames from pg_attribute group 
by attrelid order by attrelid;

Certainly there are far better examples that can be used to distill 
information in a table to a comma-separated list.

In some specific applications, a "sort(myarraytype[])" function will 
need to be created if the data type in the aggregate column does not 
have a sort function (fortunately, most data types already have this 
function defined).

Regards,
+mt