Thread: BUG #18126: Ordered set aggregate: result does not depend on sort order
BUG #18126: Ordered set aggregate: result does not depend on sort order
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18126 Logged by: Richard Wollhofen Email address: wollhofen@unisoftwareplus.com PostgreSQL version: 16.0 Operating system: Debian 16.0-1.pgdg120+1 Description: Probably related to bug report #18125. Using a custom ordered set aggregate, the sort order in the ORDER BY after the WITHIN GROUP does not show an influence on the results, as demonstrated in the script below. There are three possible ways to achieve the same ascending or descending sort order in the test data, respectively. However, the result depends on WHAT is used for sorting (columns, whole row) instead of the actual ordered result set. Tested for postgres major versions 10 - 16. Link to test script https://dbfiddle.uk/2G7NgXPn -- custom md5 aggregate sfunc CREATE FUNCTION md5_agg_sfunc(text, anyelement) RETURNS text LANGUAGE sql AS $$ SELECT md5($1 || $2::text) $$; -- custom md5 aggregate function CREATE AGGREGATE md5_agg (ORDER BY anyelement) ( STYPE = text, SFUNC = md5_agg_sfunc, INITCOND = '' ); -- test data, two correlated columns CREATE TABLE t_test (a int, b int); INSERT INTO t_test SELECT x, x + 10 FROM generate_series(1, 100000) AS x; -- Result for ASC sort order is the same for all three criteria! SELECT * FROM t_test ORDER BY t_test ASC LIMIT 3; SELECT * FROM t_test ORDER BY t_test.a ASC LIMIT 3; SELECT * FROM t_test ORDER BY t_test.b ASC LIMIT 3; -- Result for DESC sort order is the same for all three criteria! SELECT * FROM t_test ORDER BY t_test DESC LIMIT 3; SELECT * FROM t_test ORDER BY t_test.a DESC LIMIT 3; SELECT * FROM t_test ORDER BY t_test.b DESC LIMIT 3; -- expected 2 hashes, hash H1 for columns 1-3 for ASC order, -- hash H2 for columns 4-6 for DESC order SELECT md5_agg() WITHIN GROUP (ORDER BY t_test ASC), md5_agg() WITHIN GROUP (ORDER BY t_test.a ASC), md5_agg() WITHIN GROUP (ORDER BY t_test.b ASC), md5_agg() WITHIN GROUP (ORDER BY t_test DESC), md5_agg() WITHIN GROUP (ORDER BY t_test.a DESC), md5_agg() WITHIN GROUP (ORDER BY t_test.b DESC) FROM t_test; -- received 3 hashes: -- H3 for col. 1, 4 -- H4 for col. 2, 5 -- H5 for col. 3, 6 -- Sort order has no influence. -- Specification of order criterion (whole record or different columns) determines the result.
PG Bug reporting form <noreply@postgresql.org> writes: > Probably related to bug report #18125. > Using a custom ordered set aggregate, the sort order in the ORDER BY after > the WITHIN GROUP does not show an influence on the results, as demonstrated > in the script below. I think the problem in both of these bugs is that you're expecting the system to sort the input for you. As per [1], this is not the case for ordered-set aggregates: Unlike the case for normal aggregates, the sorting of input rows for an ordered-set aggregate is not done behind the scenes, but is the responsibility of the aggregate's support functions. The typical implementation approach is to keep a reference to a “tuplesort” object in the aggregate's state value, feed the incoming rows into that object, and then complete the sorting and read out the data in the final function. This design allows the final function to perform special operations such as injecting additional “hypothetical” rows into the data to be sorted. While normal aggregates can often be implemented with support functions written in PL/pgSQL or another PL language, ordered-set aggregates generally have to be written in C, since their state values aren't definable as any SQL data type. Perhaps it would be useful to have some kind of shim whereby a "dumb" transition function could be the basis of an ordered-set aggregate. But I'm not quite seeing why that would be an improvement over a traditional aggregate that you use with an ORDER BY option. regards, tom lane [1] https://www.postgresql.org/docs/current/xaggr.html#XAGGR-ORDERED-SET-AGGREGATES