Thread: extending ORDER BY make query much slower

extending ORDER BY make query much slower

From
"Dan Langille"
Date:
NOTE: I've also put the analyse output at 
http://www.freshports.org/tmp/faster.txt in case it's easier to read 
than the format below.

I've found that adding another field to the ORDER BY clause, times go 
through the roof.

This query is pretty fast.

explain analyse SELECT element.id,        commit_log.id,        commit_log.message_id,        commit_log.committer,
  commit_log.commit_date,        commit_log.description,        commit_log.encoding_losses,
commit_log.message_subject,       element.name,        commit_log_elements.revision_name,
element_pathname(element.id)AS element_pathname   FROM commit_log, commit_log_elements, element  WHERE
commit_log_elements.commit_log_id= commit_log.id    AND commit_log_elements.element_id    = element.id
 
ORDER BY commit_log.commit_date DESC
LIMIT 100;

Limit  (cost=0.00..468.79 rows=100 width=261) (actual 
time=1.80..78.10 rows=100 loops=1)  ->  Nested Loop  (cost=0.00..1248045.68 rows=266228 width=261) 
(actual time=1.78..77.19 rows=101 loops=1)        ->  Nested Loop  (cost=0.00..307745.31 rows=266228 
width=244) (actual time=0.36..5.41 rows=101 loops=1)              ->  Index Scan Backward using commit_log_commit_date

on commit_log  (cost=0.00..34305.03 rows=70543 width=227) (actual 
time=0.30..1.17 rows=32 loops=1)              ->  Index Scan using commit_log_elements_clid on 
commit_log_elements  (cost=0.00..3.46 rows=33 width=17) (actual 
time=0.03..0.07 rows=3 loops=32)                    Index Cond: (commit_log_elements.commit_log_id = 
"outer".id)        ->  Index Scan using element_pkey on element  
(cost=0.00..3.52 rows=1 width=17) (actual time=0.08..0.09 rows=1 
loops=101)              Index Cond: ("outer".element_id = element.id)Total runtime: 79.52 msec

If I add another item to the ORDER BY, the times sky rocket.  I 
haven't let the query run to completion, but here's the analyse 
output:

explain analyse SELECT element.id,        commit_log.id AS my_commit_log_id,        commit_log.message_id,
to_char(commit_log.commit_date- SystemTimeAdjust(), 'DD Mon 
 
YYYY')  AS commit_date,        to_char(commit_log.commit_date - SystemTimeAdjust(), 
'HH24:MI')      AS commit_time,        commit_log.committer,        commit_log.description,
commit_log.encoding_losses,       commit_log.message_subject,        element.name,
commit_log_elements.revision_name,       element_pathname(element.id) AS element_pathname   FROM commit_log,
commit_log_elements,element  WHERE commit_log_elements.commit_log_id = commit_log.id    AND
commit_log_elements.element_id   = element.id
 
ORDER BY commit_log.commit_date DESC,         my_commit_log_id
LIMIT 100;


explain  SELECT element.id,        commit_log.id AS my_commit_log_id,        commit_log.message_id,
commit_log.committer,       commit_log.description,        commit_log.encoding_losses,
commit_log.message_subject,       element.name,        commit_log_elements.revision_name,
element_pathname(element.id)AS element_pathname   FROM commit_log, commit_log_elements, element  WHERE
commit_log_elements.commit_log_id= commit_log.id    AND commit_log_elements.element_id    = element.id
 
ORDER BY commit_log.commit_date DESC,         my_commit_log_id
LIMIT 100;

                                            QUERY PLANLimit  (cost=196855.28..196855.53 rows=100 width=261)  ->  Sort
(cost=196855.28..197520.85rows=266228 width=261)        Sort Key: commit_log.commit_date, commit_log.id        ->  Hash
Join (cost=9478.25..38465.07 rows=266228 
 
width=261)              Hash Cond: ("outer".commit_log_id = "inner".id)              ->  Hash Join
(cost=3503.91..19098.46rows=266228 
 
width=34)                    Hash Cond: ("outer".element_id = "inner".id)                    ->  Seq Scan on
commit_log_elements 
 
(cost=0.00..4703.28 rows=266228 width=17)                    ->  Hash  (cost=2463.73..2463.73 rows=132073 
width=17)                          ->  Seq Scan on element  
(cost=0.00..2463.73 rows=132073 width=17)              ->  Hash  (cost=3133.43..3133.43 rows=70543 width=227)
        ->  Seq Scan on commit_log  (cost=0.00..3133.43 
 
rows=70543 width=227)
(12 rows)

If I restructure the query, I can get times pretty close to the 
original query:

explain analyse SELECT * FROM ( SELECT element.id,        commit_log.id AS my_commit_log_id,
commit_log.message_id,       commit_log.commit_date,        commit_log.committer,        commit_log.description,
commit_log.encoding_losses,       commit_log.message_subject,        element.name,
commit_log_elements.revision_name,       element_pathname(element.id) AS element_pathname   FROM commit_log,
commit_log_elements,element  WHERE commit_log_elements.commit_log_id = commit_log.id    AND
commit_log_elements.element_id   = element.id
 
ORDER BY commit_log.commit_date DESC 
LIMIT 100) AS TEMP
order by commit_date, my_commit_log_id;

Sort  (cost=472.11..472.36 rows=100 width=261) (actual 
time=80.01..80.40 rows=100 loops=1)  Sort Key: commit_date, my_commit_log_id  ->  Subquery Scan "temp"
(cost=0.00..468.79rows=100 width=261) 
 
(actual time=1.66..78.64 rows=100 loops=1)        ->  Limit  (cost=0.00..468.79 rows=100 width=261) (actual 
time=1.63..76.75 rows=100 loops=1)              ->  Nested Loop  (cost=0.00..1248045.68 rows=266228 
width=261) (actual time=1.62..75.85 rows=101 loops=1)                    ->  Nested Loop  (cost=0.00..307745.31 
rows=266228 width=244) (actual time=0.34..5.38 rows=101 loops=1)                          ->  Index Scan Backward using

commit_log_commit_date on commit_log  (cost=0.00..34305.03 rows=70543 
width=227) (actual time=0.27..1.17 rows=32 loops=1)                          ->  Index Scan using 
commit_log_elements_clid on commit_log_elements  (cost=0.00..3.46 
rows=33 width=17) (actual time=0.03..0.07 rows=3 loops=32)                                Index Cond: 
(commit_log_elements.commit_log_id = "outer".id)                    ->  Index Scan using element_pkey on element  
(cost=0.00..3.52 rows=1 width=17) (actual time=0.08..0.09 rows=1 
loops=101)                          Index Cond: ("outer".element_id = 
element.id)Total runtime: 81.04 msec
(12 rows)

Any comments?
-- 
Dan Langille : http://www.langille.org/



Re: extending ORDER BY make query much slower

From
Stephan Szabo
Date:
On Wed, 12 Mar 2003, Dan Langille wrote:

> explain analyse
>   SELECT element.id,
>          commit_log.id AS my_commit_log_id,
>          commit_log.message_id,
>          to_char(commit_log.commit_date - SystemTimeAdjust(), 'DD Mon
> YYYY')  AS commit_date,
>          to_char(commit_log.commit_date - SystemTimeAdjust(),
> 'HH24:MI')      AS commit_time,
>          commit_log.committer,
>          commit_log.description,
>          commit_log.encoding_losses,
>          commit_log.message_subject,
>          element.name,
>          commit_log_elements.revision_name,
>          element_pathname(element.id) AS element_pathname
>     FROM commit_log, commit_log_elements, element
>    WHERE commit_log_elements.commit_log_id = commit_log.id
>      AND commit_log_elements.element_id    = element.id
> ORDER BY commit_log.commit_date DESC,
>          my_commit_log_id
> LIMIT 100;

As a question, is it faster if both are marked DESC?





Re: extending ORDER BY make query much slower

From
Tom Lane
Date:
"Dan Langille" <dan@langille.org> writes:
> I've found that adding another field to the ORDER BY clause, times go 
> through the roof.

Why does that surprise you?  The original query is using an index to
achieve the required ordering, so it can give you the first 100 rows
without bothering to compute the remainder.  The modified query has to
actually compute all the rows, and sort them, before it knows which are
the first 100.

If you had an index matching the second ORDER BY clause, you'd probably
get a plan similar to the first case.
        regards, tom lane


Re: extending ORDER BY make query much slower

From
Dan Langille
Date:
On Wed, 12 Mar 2003, Tom Lane wrote:

> "Dan Langille" <dan@langille.org> writes:
> > I've found that adding another field to the ORDER BY clause, times go
> > through the roof.
>
> Why does that surprise you?

Because I wasn't using the logic you describe below.

> The original query is using an index to
> achieve the required ordering, so it can give you the first 100 rows
> without bothering to compute the remainder.  The modified query has to
> actually compute all the rows, and sort them, before it knows which are
> the first 100.

Where as I was thinking, well, it would sort the columns by the first bit,
then by the second bit... wrong....

Thank you, I was looking at it from my narrow point of view: sort them by
commit_date, but if there's two identical, sort those two by commit-id...
not the big picture solution.

> If you had an index matching the second ORDER BY clause, you'd probably
> get a plan similar to the first case.

Understood.  FWIW, I created an index which uses both items, then did an
ORDER BY commit_date DESC, commit_log_id desc.  Yep, I got good times from
that.

But I don't think I'm going to keep this index around for just this one
query.  It's used often, for the main page of the website, but given that
I can get around it with slight modification to the select, I think I'll
use that instead.


Re: extending ORDER BY make query much slower

From
Dan Langille
Date:
On Wed, 12 Mar 2003, Stephan Szabo wrote:

> As a question, is it faster if both are marked DESC?

FWIW, no.  But see my reply to Tom.