Re: Strange workaround for slow query - Mailing list pgsql-performance

From Yeb Havinga
Subject Re: Strange workaround for slow query
Date
Msg-id 4B977CD3.5030402@gmail.com
Whole thread Raw
In response to Re: Strange workaround for slow query  (sverhagen@wps-nl.com)
Responses Re: Strange workaround for slow query  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
sverhagen@wps-nl.com wrote:
>
> > Thanks - I'm sorry that I was not more specific earlier, but what would
> > be *really* helpful is the output of explain analyze, since that also
> > shows actual time, # rows and # loops of the inner nestloop.
>
> No problem at all.
>
>
> EXPLAIN ANALYZE SELECT * FROM events_events LEFT OUTER JOIN
> events_event_types ON
> eventType_id=events_event_types.id WHERE severity=20 AND (eventType_id
> IN (71)) ORDER BY datetime DESC limit 50;
>
>                 QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=3.23..200.31 rows=50 width=131) (actual
> time=0.070..0.341 rows=50 loops=1)
>    ->  Nested Loop  (cost=3.23..49139.16 rows=12466 width=131) (actual
> time=0.069..0.309 rows=50 loops=1)
>          ->  Index Scan Backward using
> events_events_eventtype_id_datetime_ind on events_events
>  (cost=0.00..48886.61 rows=12466 width=93) (actual time=0.037..0.144
> rows=50 loops=1)
>                Index Cond: (eventtype_id = 71)
>          ->  Materialize  (cost=3.23..3.24 rows=1 width=38) (actual
> time=0.001..0.001 rows=1 loops=50)
>                ->  Seq Scan on events_event_types  (cost=0.00..3.23
> rows=1 width=38) (actual time=0.024..0.029 rows=1 loops=1)
>                      Filter: ((id = 71) AND (severity = 20))
>  Total runtime: 0.415 ms
> (8 rows)
>
> Time: 1.290 ms
>
>
> EXPLAIN ANALYZE SELECT * FROM events_events LEFT OUTER JOIN
> events_event_types ON
> eventType_id=events_event_types.id WHERE severity=70 AND (eventType_id
> IN (71)) ORDER BY datetime DESC limit 50;
>
>                    QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=3.23..200.31 rows=50 width=131) (actual
> time=11641.775..11641.775 rows=0 loops=1)
>    ->  Nested Loop  (cost=3.23..49139.16 rows=12466 width=131) (actual
> time=11641.773..11641.773 rows=0 loops=1)
>          ->  Index Scan Backward using
> events_events_eventtype_id_datetime_ind on events_events
>  (cost=0.00..48886.61 rows=12466 width=93) (actual
> time=0.035..11573.320 rows=50389 loops=1)
>                Index Cond: (eventtype_id = 71)
>          ->  Materialize  (cost=3.23..3.24 rows=1 width=38) (actual
> time=0.000..0.000 rows=0 loops=50389)
>                ->  Seq Scan on events_event_types  (cost=0.00..3.23
> rows=1 width=38) (actual time=0.028..0.028 rows=0 loops=1)
>                      Filter: ((id = 71) AND (severity = 70))
>  Total runtime: 11641.839 ms
> (8 rows)
>
> Time: 11642.902 ms
>
>
> EXPLAIN ANALYZE SELECT * FROM events_events LEFT OUTER JOIN
> events_event_types ON
> eventType_id=events_event_types.id WHERE severity=70 AND (eventType_id
> IN (71, 999)) ORDER BY datetime DESC LIMIT 50;
>
>           QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=27290.26..27290.38 rows=50 width=131) (actual
> time=0.118..0.118 rows=0 loops=1)
>    ->  Sort  (cost=27290.26..27303.17 rows=5164 width=131) (actual
> time=0.117..0.117 rows=0 loops=1)
>          Sort Key: events_events.datetime
>          Sort Method:  quicksort  Memory: 17kB
>          ->  Nested Loop  (cost=22.95..27118.71 rows=5164 width=131)
> (actual time=0.112..0.112 rows=0 loops=1)
>                ->  Seq Scan on events_event_types  (cost=0.00..3.02
> rows=17 width=38) (actual time=0.016..0.041 rows=16 loops=1)
>                      Filter: (severity = 70)
>                ->  Bitmap Heap Scan on events_events
>  (cost=22.95..1589.94 rows=408 width=93) (actual time=0.002..0.002
> rows=0 loops=16)
>                      Recheck Cond: ((events_events.eventtype_id = ANY
> ('{71,999}'::bigint[])) AND (events_events.eventtype_id =
> events_event_types.id))
>                      ->  Bitmap Index Scan on
> test_events_events_eventtype_id_severity_ind  (cost=0.00..22.85
> rows=408 width=0) (actual time=0.001..0.001 rows=0 loops=16)
>                            Index Cond: ((events_events.eventtype_id =
> ANY ('{71,999}'::bigint[])) AND (events_events.eventtype_id =
> events_event_types.id))
>  Total runtime: 0.179 ms
> (12 rows)
>
> Time: 1.510 ms
>
>
> > I'm
> > wondering though why you do a left outer join. From the \d output in
> the
> > previous mail, events_event.eventtype_id has a not null constraint
> and a
> > fk to events_event_types.id, so an inner join would be appropriate.
> > Outer joins limits the amount of join orders the planner considers,
> so a
> > better plan might arise when the join is changed to inner.
>
> I do agree with this assessment. I'm sort of improving the performance
> of an existing implementation of ours, for which I'm not aware why
> they chose for LEFT OUTER. I did, however, test things with INNER as
> well, with the same results, so I decided to stick with what I
> encountered in the existing implementation. But it's on my mind as
> well ;-)
>
Thanks for the formatted output. The difference in speed is caused by
the first query that has to read 50k rows from an index, with filter
expression is only eventtype_id = 71, and the second has the extra
knowledge from the scan of events_event_type in the nestloops outer
loop, which returns 0 rows in all cases and is hence a lot faster, even
though that scan is called 16 times.

But the big question is: why does the planner chooses plan 1 in the
first case, or how to fix that? My $0,02 would be to 'help' the planner
find a better plan. Ofcourse you did ANALYZE, but maybe another idea is
to increase the default_statistics_target if it is still at the default
value of 10. More info on
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html. And
also to 'help' the planner: I'd just change the query to an inner join
in this case, since there cannot be null tuples in the right hand side here.

regards,
Yeb Havinga


pgsql-performance by date:

Previous
From: sverhagen@wps-nl.com
Date:
Subject: Re: Strange workaround for slow query
Next
From: Benoit Delbosc
Date:
Subject: Bad query plan inside EXISTS clause