Re: Optimize query: time of "single * IN(many)" > time of "many * - Mailing list pgsql-general

From Paul Janssen
Subject Re: Optimize query: time of "single * IN(many)" > time of "many *
Date
Msg-id BAY1-F25zXBAKpPZ92d00016aca@hotmail.com
Whole thread Raw
List pgsql-general
>Tom Lane writes:
>>Paul Janssen writes:
>>Can anyone help me out with the following situation:
>>    (a) a single query with 550 id's in the IN-clause resulting into 800+
>>seconds;
>>    (b) 550 queries with a single id in the IN-clause resulting into
>>overall time of <60 seconds;
>>The table consists of 950.000 records, and the resultset consists of
>>205.000 records.
>
>>Why is there such an extreme difference in time?
>
>Most likely the planner is opting not to use an indexscan in the first
>case.  Could we see EXPLAIN ANALYZE results for both cases?  Also, try
>"SET enable_seqscan TO OFF" and then repeat EXPLAIN ANALYZE for case (a).
>
>            regards, tom lane

Thanks all for your suggestions!

As suggested, I run EXPLAIN ANALYZE for the initial situation and the
situation
that you suggested ("seqscan OFF"). The results of this change is already a
decrease of total runtime by 78%. Very nice!
In respect to the total runtime of the "many * IN(single)" there was a
differ-
ence of 54x, that's now down to 11x. But still >10 times slower... Hope you
can
use the details below to help me close the gap. Thx!

__ SITUATION 0 : INITIAL__
EXPLAIN ANALYZE "single * IN(many)"
Unique  (cost=2120494.74..2139985.87 rows=129941 width=24) (actual
time=818313.20..820605.09 rows=335311 loops=1)
  ->  Sort  (cost=2120494.74..2123743.26 rows=1299409 width=24) (actual
time=818313.19..819327.09 rows=335311 loops=1)
        Sort Key: a_id, questions_id, answer, order_id, uid
        ->  Append  (cost=0.00..1916403.49 rows=1299409 width=24) (actual
time=421673.91..795948.40 rows=335311 loops=1)
              ->  Subquery Scan "*SELECT* 1"  (cost=0.00..1344006.15
rows=911298 width=24) (actual time=421673.91..556486.08 rows=207809 loops=1)
                    ->  Seq Scan on tbl_scores  (cost=0.00..1344006.15
rows=911298 width=24) (actual time=421673.88..554797.94 rows=207809 loops=1)
                          Filter: ((a_id = 1) OR (a_id = 2) OR ... OR (a_id
= 550))
              ->  Subquery Scan "*SELECT* 2"  (cost=0.00..572397.34
rows=388111 width=24) (actual time=157088.99..238498.40 rows=127502 loops=1)
                    ->  Seq Scan on tbl_scores_extra  (cost=0.00..572397.34
rows=388111 width=24) (actual time=157088.96..237474.23 rows=127502 loops=1)
                          Filter: ((a_id = 1) OR (a_id = 2) OR ... OR (a_id
= 550))
Total runtime: 821062.84 msec

EXPLAIN ANALYZE "many * IN(single)"
Unique  (cost=24501.02..24605.30 rows=695 width=24) (actual
time=51.20..57.93 rows=1349 loops=1)
  ->  Sort  (cost=24501.02..24518.40 rows=6952 width=24) (actual
time=51.20..52.95 rows=1349 loops=1)
        Sort Key: a_id, questions_id, answer, order_id, uid
        ->  Append  (cost=0.00..24057.38 rows=6952 width=24) (actual
time=0.62..38.04 rows=1349 loops=1)
              ->  Subquery Scan "*SELECT* 1"  (cost=0.00..17442.04 rows=4875
width=24) (actual time=0.61..22.39 rows=931 loops=1)
                    ->  Index Scan using tbl_scores_idx_a on tbl_scores
(cost=0.00..17442.04 rows=4875 width=24) (actual time=0.60..16.46 rows=931
loops=1)
                          Index Cond: (a_id = 1233)
              ->  Subquery Scan "*SELECT* 2"  (cost=0.00..6615.34 rows=2076
width=24) (actual time=0.67..12.10 rows=418 loops=1)
                    ->  Index Scan using tbl_scores_extra_idx_a on
tbl_scores  (cost=0.00..6615.34 rows=2076 width=24) (actual time=0.65..9.42
rows=418 loops=1)
                          Index Cond: (a_id = 1233)
Total runtime: 60.59 msec

The subsequent 550 executions of the query lead to...
  * total runtime varies from 0,93..163.62 msec;
  * total runtime sums up to 15107msec.

__SITUATION 1 : SET enable_seqscan TO OFF__
EXPLAIN ANALYZE "single * IN(many)"
Unique  (cost=18425133.86..18444624.99 rows=129941 width=24) (actual
time=174020.84..176301.67 rows=335311 loops=1)
  ->  Sort  (cost=18425133.86..18428382.38 rows=1299409 width=24) (actual
time=174020.82..175090.09 rows=335311 loops=1)
        Sort Key: a_id, questions_id, answer, order_id, uid
        ->  Append  (cost=0.00..18221042.61 rows=1299409 width=24) (actual
time=191.87..159763.68 rows=335311 loops=1)
              ->  Subquery Scan "*SELECT* 1"  (cost=0.00..13088911.15
rows=911298 width=24) (actual time=191.86..97162.20 rows=207809 loops=1)
                    ->  Index Scan using tbl_scores_idx_a, .(548x).. ,
tbl_scores_idx_a on tbl_scores (cost=0.00..13088911.15 rows=911298 width=24)
(actual time=191.84..95416.34 rows=207809 loops=1)
                          Index Cond: ((a_id = 1) OR (a_id = 2) OR ... OR
(a_id = 550))
              ->  Subquery Scan "*SELECT* 2"  (cost=0.00..5132131.46
rows=388111 width=24) (actual time=59.60..61179.24 rows=127502 loops=1)
                    ->  Index Scan using tbl_scores_extra_idx_a, .(548x).. ,
tbl_scores_extra_idx_a on tbl_scores_extra (cost=0.00..5132131.46
rows=388111 width=24) (actual time=59.58..59871.50 rows=127502 loops=1)
                          Index Cond: ((a_id = 1) OR (a_id = 2) OR ... OR
(a_id = 550))
Total runtime: 176785.19 msec

_________________________________________________________________
Chatten met je online vrienden via MSN Messenger. http://messenger.msn.nl/


pgsql-general by date:

Previous
From: David Helgason
Date:
Subject: array faults?
Next
From: jeffrey rivero
Date:
Subject: Re: Natural upgrade path for RedHat 9?