Thread: Sorting an aggregated column
Hi all, I ran these commands: create temp table dw_survey(survey int,question int,answer_bool boolean,answer_int int,answer_char varchar); insert into dw_survey values(1,1,'t',null,null); insert into dw_survey values(1,2,'f',null,null); insert into dw_survey values(1,3,'t',null,null); insert into dw_survey values(1,4,null,123,null); insert into dw_survey values(1,5,null,21,null); insert into dw_survey values(1,6,null,1,null); insert into dw_survey values(2,1,'t',null,null); insert into dw_survey values(2,2,'t',null,null); insert into dw_survey values(2,3,'t',null,null); insert into dw_survey values(2,4,null,3,null); insert into dw_survey values(2,5,null,2,null); insert into dw_survey values(2,6,null,1,null); and I now have a table with data like this: Table "pg_temp_5.dw_survey" Column | Type | Modifiers -------------+-------------------+-----------survey | integer | question | integer | answer_bool| boolean | answer_int | integer | answer_char | character varying | survey | question | answer_bool | answer_int | answer_char --------+----------+-------------+------------+------------- 1 | 1 | t | | 1 | 2 | f | | 1 | 3 | t | | 1 | 4 | | 123 | 1 | 5 | | 21 | 1 | 6 | | 1 | 2 | 1 | t | | 2 | 2 | t | | 2 | 3 | t | | 2 | 4 | | 3 | 2 | 5 | | 2 | 2 | 6 | | 1 | Answers to a survey can be one of three types - boolean, integer or varchar. There can be any number of questions in a survey. I want to summarise the results of the survey like this: survey | answer1 | answer2 | answer3 | answer4 | answer5 | answer6 -------+---------+---------+---------+---------+---------+-------- 1 | t | f | t | 123 | 21 | 1 2 | t | t | t | 3 | 2 | 1 Or even like this: survey | answers -------+--------------- 1 | t,f,t,123,21,1 2 | t,t,t,3,2,1 In both cases the order of the answers must be ordered by the "question" column. I can do the second case with a user-defined string concatenating aggregate: select survey, list ( case when answer_bool = 't' then 'y'::varchar when answer_bool = 'f' then 'n'::varchar when answer_int is notnull then answer_int::varchar when answer_char is not null then answer_char::varchar end ) from dw_survey group by survey order by survey; survey | list --------+--------------------- 1 | y, n, y, 123, 21, 1 2 | y, y, y, 3, 2, 1 This output is correct in this case but there is no guarantee that the answers will come out in "question" order. I can'tsee how to incorporate sorting by the "question" column using this approach. Can anyone suggest either how to improve my current approach or a different approach to get the desired result? Thanks, David Witham Telephony Platforms Architect Unidial Pty Ltd Level 1, 174 Peel St North Melbourne,VIC 3051 Australia Ph: 03 8628 3383 Fax: 03 8628 3399
"David Witham" <davidw@unidial.com.au> writes: > This output is correct in this case but there is no guarantee that the > answers will come out in "question" order. I can't see how to > incorporate sorting by the "question" column using this approach. As of PG 7.4 you can reliably use a sorted sub-select to determine the order of inputs to a user-defined aggregate function. See for instance http://archives.postgresql.org/pgsql-general/2003-02/msg00917.php regards, tom lane
Hi Tom, Thanks for the advice. I was planning to upgrade from 7.3.2 to 7.4 soon but this adds a bit more impetus. Under 7.3.2 I rewrote the query as your example suggested: explain select survey, list ( case when answer_bool = 't' then 'y'::varchar when answer_bool = 'f' then 'n'::varchar when answer_int is not null then answer_int::varchar when answer_char is not null then answer_char::varchar end ) as answers from (select survey, answer_bool, answer_int, answer_char from dw_survey order by survey,question) as dws group by survey order by survey; --------------------------------------------------------------------------------------------Aggregate (cost=122.16..129.66rows=100 width=45) -> Group (cost=122.16..127.16 rows=1000 width=45) -> Sort (cost=122.16..124.66rows=1000 width=45) Sort Key: survey -> Subquery Scan dws (cost=69.83..72.33rows=1000 width=45) -> Sort (cost=69.83..72.33 rows=1000 width=45) Sort Key: survey, question -> Seq Scan on dw_survey (cost=0.00..20.00 rows=1000 width=45) So I see that there is the extra sort above the sub-query that wouldn't be there using 7.4. Are you saying that the sortby survey after the sort by survey,question would potentially reorder the records initially sorted by survey,question?If the sub-query had already sorted by survey (along with question), would the sort by survey bother to reorderany of the rows? E.g. if the subselect returned (assuming 1 answer from the 3 answer columns): 1,t 1,f 1,t 1,123 1,21 1,1 2,t 2,t 2,t 2,3 2,2 2,1 would the sort by survey potentially reorder these rows even though they don't need to be? Regards, David -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, 23 March 2004 16:17 To: David Witham Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] Sorting an aggregated column "David Witham" <davidw@unidial.com.au> writes: > This output is correct in this case but there is no guarantee that the > answers will come out in "question" order. I can't see how to > incorporate sorting by the "question" column using this approach. As of PG 7.4 you can reliably use a sorted sub-select to determine the order of inputs to a user-defined aggregate function. See for instance http://archives.postgresql.org/pgsql-general/2003-02/msg00917.php regards, tom lane
"David Witham" <davidw@unidial.com.au> writes: > So I see that there is the extra sort above the sub-query that > wouldn't be there using 7.4. Are you saying that the sort by survey > after the sort by survey,question would potentially reorder the > records initially sorted by survey,question? Exactly. Most implementations of qsort() aren't "stable", which means they might reorder equal keys in some random way. So to preserve the sub-ordering on question, you need the upper sort to be skipped. The 7.4 planner is smart enough to notice it doesn't need the upper sort, prior releases were not. regards, tom lane