BUG #5294: Sorts on more than just the order-by clause - Mailing list pgsql-bugs

From Allen Johnson
Subject BUG #5294: Sorts on more than just the order-by clause
Date
Msg-id 201001211619.o0LGJp5I043918@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5294: Sorts on more than just the order-by clause  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5294
Logged by:          Allen Johnson
Email address:      akjohnson78@gmail.com
PostgreSQL version: 8.4.2
Operating system:   Red Hat ES 5.4
Description:        Sorts on more than just the order-by clause
Details:

I've been porting our app from Oracle to Postgres and keeping an eye on
performance. Largely, Postgres is performing just as well or better!

I did run into an issue where we are performing a group-by on about 10
columns followed by an order-by of about 5 columns. This query was taking
twice as long as Oracle. When looking at the explain plan, Postgres seems to
be using all the columns in the group-by for sorting instead of _only_ using
what is in the order-by.

While the results are correct this seems to be causing a performance problem
since Postgres is sorting on more columns than it is being asked to. I
reworked the query to get rid of the extra sorting columns but this seems
like a hack. Below is an example query followed by the ugly hack.

Note: The execution times in this example don't mean anything because they
are running on a blank test db. On the production database there was a huge
difference in execution time in favor of the hack query. I just wanted to
illustrate that the sort keys seem incorrect.

Example Query:

select
  ct.name, c.lname, c.fname, c.mname,
  c.email, c.address1, c.address2,
  c.city, c.state, c.zip_code,
  count(a.id)
from
  contacts c
  inner join contact_types ct on (ct.code = c.contact_type_code)
  left join attachments a on (a.contact_id = c.id)
where
  c.company_id = 1
group by
  ct.name, c.lname, c.fname, c.mname,
  c.email, c.address1, c.address2,
  c.city, c.state, c.zip_code
order by
  ct.name, c.lname, c.fname, c.mname;

Example Explain:
                                                                       QUERY
PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------

 GroupAggregate  (cost=27.98..28.11 rows=3 width=1864) (actual
time=0.037..0.037 rows=0 loops=1)
   ->  Sort  (cost=27.98..27.99 rows=3 width=1864) (actual time=0.035..0.035
rows=0 loops=1)
         Sort Key: ct.name, c.lname, c.fname, c.mname, c.email, c.address1,
c.address2, c.city, c.state, c.zip_code
         Sort Method:  quicksort  Memory: 17kB
         ->  Nested Loop Left Join  (cost=4.27..27.96 rows=3 width=1864)
(actual time=0.017..0.017 rows=0 loops=1)
               ->  Nested Loop  (cost=0.00..16.55 rows=1 width=1864) (actual
time=0.014..0.014 rows=0 loops=1)
                     ->  Index Scan using contacts_company on contacts c
(cost=0.00..8.27 rows=1 width=1760) (actual time=0.012..0.012 rows=0
loops=1)
                           Index Cond: (company_id = 1)
                     ->  Index Scan using contact_types_pkey on
contact_types ct  (cost=0.00..8.27 rows=1 width=152) (never executed)
                           Index Cond: ((ct.code)::text =
(c.contact_type_code)::text)
               ->  Bitmap Heap Scan on attachments a  (cost=4.27..11.37
rows=3 width=12) (never executed)
                     Recheck Cond: (a.contact_id = c.id)
                     ->  Bitmap Index Scan on attachments_contact
(cost=0.00..4.27 rows=3 width=0) (never executed)
                           Index Cond: (a.contact_id = c.id)
 Total runtime: 0.192 ms
(15 rows)

* Notice how the sort key is using many more columns than the order-by has
specified.

Hack Query:
select * from (
  select
    ct.name as ct_name, c.lname, c.fname, c.mname,
    c.email, c.address1, c.address2,
    c.city, c.state, c.zip_code,
    count(a.id)
  from
    contacts c
    inner join contact_types ct on (ct.code = c.contact_type_code)
    left join attachments a on (a.contact_id = c.id)
  where
    c.company_id = 1
  group by
    ct.name, c.lname, c.fname, c.mname,
    c.email, c.address1, c.address2,
    c.city, c.state, c.zip_code
) as results
order by
  ct_name, lname, fname, mname;

Hack Explain:
                                                                       QUERY
PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------

 Sort  (cost=28.13..28.14 rows=3 width=1868) (actual time=0.054..0.054
rows=0 loops=1)
   Sort Key: ct.name, c.lname, c.fname, c.mname
   Sort Method:  quicksort  Memory: 17kB
   ->  HashAggregate  (cost=28.04..28.08 rows=3 width=1864) (actual
time=0.021..0.021 rows=0 loops=1)
         ->  Nested Loop Left Join  (cost=4.27..27.96 rows=3 width=1864)
(actual time=0.018..0.018 rows=0 loops=1)
               ->  Nested Loop  (cost=0.00..16.55 rows=1 width=1864) (actual
time=0.016..0.016 rows=0 loops=1)
                     ->  Index Scan using contacts_company on contacts c
(cost=0.00..8.27 rows=1 width=1760) (actual time=0.013..0.013 rows=0
loops=1)
                           Index Cond: (company_id = 1)
                     ->  Index Scan using contact_types_pkey on
contact_types ct  (cost=0.00..8.27 rows=1 width=152) (never executed)
                           Index Cond: ((ct.code)::text =
(c.contact_type_code)::text)
               ->  Bitmap Heap Scan on attachments a  (cost=4.27..11.37
rows=3 width=12) (never executed)
                     Recheck Cond: (a.contact_id = c.id)
                     ->  Bitmap Index Scan on attachments_contact
(cost=0.00..4.27 rows=3 width=0) (never executed)
                           Index Cond: (a.contact_id = c.id)
 Total runtime: 0.259 ms
(15 rows)

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: add primary key doesn't block?
Next
From: Tom Lane
Date:
Subject: Re: BUG #5294: Sorts on more than just the order-by clause