Re: SQL select query becomes slow when using limit (with no offset) - Mailing list pgsql-performance

From Russell Smith
Subject Re: SQL select query becomes slow when using limit (with no offset)
Date
Msg-id 4A795C09.6060301@pws.com.au
Whole thread Raw
In response to Re: SQL select query becomes slow when using limit (with no offset)  (Kees van Dieren <keesvandieren@gmail.com>)
Responses Re: SQL select query becomes slow when using limit (with no offset)
List pgsql-performance
Kees van Dieren wrote:
> Hi Folks,
>
> Thanks for your response.
>
> I have added the following index (suggested by other post):
>
> CREATE INDEX events_events_cleared_eventtype
>   ON events_events
>   USING btree
>   (eventtype_id, cleared)
>   WHERE cleared = false;
>
> Also with columns in reversed order.
>
> No changes in response time noticed.
>
> Index on cleared column already is there (indices are in sql file
> attached to initial post.). eventtype_id has a foreign key constraint,
> which adds an index automatically I believe?
>
> The explain analyze results for both queries:
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC LIMIT 100
> >>>
> "Limit  (cost=0.00..125.03 rows=100 width=16) (actual
> time=0.046..3897.094 rows=77 loops=1)"
> "  ->  Nested Loop  (cost=0.00..120361.40 rows=96269 width=16) (actual
> time=0.042..3896.881 rows=77 loops=1)"
> "        ->  Index Scan Backward using events_events_datetime_ind on
> events_events  (cost=0.00..18335.76 rows=361008 width=24) (actual
> time=0.025..720.345 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Index Scan using events_event_types_pkey on
> events_event_types  (cost=0.00..0.27 rows=1 width=8) (actual
> time=0.003..0.003 rows=0 loops=360637)"
> "              Index Cond: (events_event_types.id
> <http://events_event_types.id> = events_events.eventtype_id)"
> "              Filter: (events_event_types.severity = 70)"
> "Total runtime: 3897.268 ms"
>
The plan here is guessing that we will find the 100 rows we want pretty
quickly by scanning the dateTime index.  As we aren't expecting to have
to look through many rows to find 100 that match the criteria.  With no
cross column statistics it's more a guess than a good calculation.  So
the guess is bad and we end up scanning 360k rows from the index before
we find what we want.   My skills are not up to giving specific advise
on how to avert this problem.  Maybe somebody else can help there.
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC
> >>>
> "Sort  (cost=20255.18..20495.85 rows=96269 width=16) (actual
> time=1084.842..1084.951 rows=77 loops=1)"
> "  Sort Key: events_events.datetime"
> "  Sort Method:  quicksort  Memory: 20kB"
> "  ->  Hash Join  (cost=2.09..12286.62 rows=96269 width=16) (actual
> time=1080.789..1084.696 rows=77 loops=1)"
> "        Hash Cond: (events_events.eventtype_id =
> events_event_types.id <http://events_event_types.id>)"
> "        ->  Seq Scan on events_events  (cost=0.00..9968.06
> rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Hash  (cost=1.89..1.89 rows=16 width=8) (actual
> time=0.077..0.077 rows=16 loops=1)"
> "              ->  Seq Scan on events_event_types  (cost=0.00..1.89
> rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)"
> "                    Filter: (severity = 70)"
> "Total runtime: 1085.145 ms"
>
> Any suggestions?
This plan is faster as you avoid the index scan.  The planner is
preferring to do a tablescan to find what it needs.  This is much faster
than the 360k random I/O index lookups.  You can force this type of plan
with a subquery and the OFFSET 0 trick, but I'm not sure it's the best
solution.

eg

explain analyze SELECT * FROM
    (SELECT events_events.id <http://events_events.id> FROM events_events
         LEFT JOIN events_event_types on
events_events.eventType_id=events_event_types.id
<http://events_event_types.id>
        WHERE events_event_types.severity=70
                     AND not events_events.cleared
        ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100

Regards

Russell

pgsql-performance by date:

Previous
From: Matthew Wakeling
Date:
Subject: Re: GiST, caching, and consistency
Next
From: Robert Haas
Date:
Subject: Re: GiST, caching, and consistency