Thread: Sorting an aggregated column

Sorting an aggregated column

From
"David Witham"
Date:
Hi all,

I ran these commands:

create temp table dw_survey(survey int,question int,answer_bool boolean,answer_int int,answer_char varchar);
insert into dw_survey values(1,1,'t',null,null);
insert into dw_survey values(1,2,'f',null,null);
insert into dw_survey values(1,3,'t',null,null);
insert into dw_survey values(1,4,null,123,null);
insert into dw_survey values(1,5,null,21,null);
insert into dw_survey values(1,6,null,1,null);
insert into dw_survey values(2,1,'t',null,null);
insert into dw_survey values(2,2,'t',null,null);
insert into dw_survey values(2,3,'t',null,null);
insert into dw_survey values(2,4,null,3,null);
insert into dw_survey values(2,5,null,2,null);
insert into dw_survey values(2,6,null,1,null);

and I now have a table with data like this:
        Table "pg_temp_5.dw_survey"  Column    |       Type        | Modifiers
-------------+-------------------+-----------survey      | integer           | question    | integer           |
answer_bool| boolean           | answer_int  | integer           | answer_char | character varying |  
survey | question | answer_bool | answer_int | answer_char
--------+----------+-------------+------------+-------------     1 |        1 | t           |            |      1 |
  2 | f           |            |      1 |        3 | t           |            |      1 |        4 |             |
123 |      1 |        5 |             |         21 |      1 |        6 |             |          1 |      2 |        1 |
t          |            |      2 |        2 | t           |            |      2 |        3 | t           |            |
    2 |        4 |             |          3 |      2 |        5 |             |          2 |      2 |        6 |
    |          1 |  

Answers to a survey can be one of three types - boolean, integer or varchar.
There can be any number of questions in a survey.

I want to summarise the results of the survey like this:

survey | answer1 | answer2 | answer3 | answer4 | answer5 | answer6
-------+---------+---------+---------+---------+---------+--------
1      |    t    |    f    |    t    |   123   |    21   |    1
2      |    t    |    t    |    t    |     3   |     2   |    1

Or even like this:

survey | answers
-------+---------------
1      | t,f,t,123,21,1
2      | t,t,t,3,2,1

In both cases the order of the answers must be ordered by the "question" column.

I can do the second case with a user-defined string concatenating aggregate:

select survey, list (
case when answer_bool = 't' then 'y'::varchar       when answer_bool = 'f' then 'n'::varchar       when answer_int is
notnull then answer_int::varchar       when answer_char is not null then answer_char::varchar   end 
)
from dw_survey
group by survey
order by survey;
survey |        list
--------+---------------------     1 | y, n, y, 123, 21, 1     2 | y, y, y, 3, 2, 1

This output is correct in this case but there is no guarantee that the answers will come out in "question" order. I
can'tsee how to incorporate sorting by the "question" column using this approach. 

Can anyone suggest either how to improve my current approach or a different approach to get the desired result?

Thanks,

David Witham
Telephony Platforms Architect
Unidial Pty Ltd
Level 1, 174 Peel St North Melbourne,VIC 3051
Australia
Ph: 03 8628 3383
Fax: 03 8628 3399


Re: Sorting an aggregated column

From
Tom Lane
Date:
"David Witham" <davidw@unidial.com.au> writes:
> This output is correct in this case but there is no guarantee that the
> answers will come out in "question" order. I can't see how to
> incorporate sorting by the "question" column using this approach.

As of PG 7.4 you can reliably use a sorted sub-select to determine the
order of inputs to a user-defined aggregate function.  See for instance
http://archives.postgresql.org/pgsql-general/2003-02/msg00917.php
        regards, tom lane


Re: Sorting an aggregated column

From
"David Witham"
Date:
Hi Tom,

Thanks for the advice. I was planning to upgrade from 7.3.2 to 7.4 soon but this adds a bit more impetus.

Under 7.3.2 I rewrote the query as your example suggested:

explain select survey, list (   case when answer_bool = 't' then 'y'::varchar       when answer_bool = 'f' then
'n'::varchar      when answer_int is not null then answer_int::varchar       when answer_char is not null then
answer_char::varchar  end 
) as answers
from (select survey, answer_bool, answer_int, answer_char from dw_survey order by survey,question)
as dws
group by survey
order by survey;

--------------------------------------------------------------------------------------------Aggregate
(cost=122.16..129.66rows=100 width=45)  ->  Group  (cost=122.16..127.16 rows=1000 width=45)        ->  Sort
(cost=122.16..124.66rows=1000 width=45)              Sort Key: survey              ->  Subquery Scan dws
(cost=69.83..72.33rows=1000 width=45)                    ->  Sort  (cost=69.83..72.33 rows=1000 width=45)
          Sort Key: survey, question                          ->  Seq Scan on dw_survey  (cost=0.00..20.00 rows=1000
width=45)

So I see that there is the extra sort above the sub-query that wouldn't be there using 7.4. Are you saying that the
sortby survey after the sort by survey,question would potentially reorder the records initially sorted by
survey,question?If the sub-query had already sorted by survey (along with question), would the sort by survey bother to
reorderany of the rows? E.g. if the subselect returned (assuming 1 answer from the 3 answer columns): 
1,t
1,f
1,t
1,123
1,21
1,1
2,t
2,t
2,t
2,3
2,2
2,1
would the sort by survey potentially reorder these rows even though they don't need to be?

Regards,
David

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, 23 March 2004 16:17
To: David Witham
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] Sorting an aggregated column


"David Witham" <davidw@unidial.com.au> writes:
> This output is correct in this case but there is no guarantee that the
> answers will come out in "question" order. I can't see how to
> incorporate sorting by the "question" column using this approach.

As of PG 7.4 you can reliably use a sorted sub-select to determine the
order of inputs to a user-defined aggregate function.  See for instance
http://archives.postgresql.org/pgsql-general/2003-02/msg00917.php
        regards, tom lane


Re: Sorting an aggregated column

From
Tom Lane
Date:
"David Witham" <davidw@unidial.com.au> writes:
> So I see that there is the extra sort above the sub-query that
> wouldn't be there using 7.4. Are you saying that the sort by survey
> after the sort by survey,question would potentially reorder the
> records initially sorted by survey,question?

Exactly.  Most implementations of qsort() aren't "stable", which means
they might reorder equal keys in some random way.  So to preserve the
sub-ordering on question, you need the upper sort to be skipped.  The
7.4 planner is smart enough to notice it doesn't need the upper sort,
prior releases were not.
        regards, tom lane