Thread: Query gets slow when where clause increases

Query gets slow when where clause increases

From
James Antill
Date:
 I have a problem where a query gets _much_ slower if I add statements
to the where clause than if I just get everything and filter in the
code. I expected the query to be faster as the where clause gets
bigger (and thus the data to return gets lower).

 I've put inline the SQL and explain analyze of both the general and
specific queries. I assume the problem is something to do with the
fact that in the specific query "ticket_groups" is uses an index and
is assumed to return 5 rows, but actually returns 604 and in the
generic case it doesn't use an index and is assumed to return 3.5-4
thousand and does? Is this right, and if so is it possible to get
postgres to re-think using an index in this case (or possible up it's
estimated row count)?
 Any help appreciated.

----------------------- GENERIC -------------------------------------

SELECT   g.assigned_to, g.opened AS escalated, t.tid, t.opened,
         e.userid AS commiter, e.eid, e.performed_on, e.status
  FROM (events e JOIN (ticket_groups g JOIN tickets t USING(tid)) USING(tid))
   WHERE (g.gid = '37' AND
          (e.performed_on <= CAST('1088480270' AS bigint)) AND
          ((t.modified >= CAST('1086875491' AS bigint)) OR
           (t.status != '4' AND
            t.status != '5' AND
            t.status != '9')))
   ORDER BY e.userid, t.tid, e.performed_on;

 Sort  (cost=28017.06..28054.97 rows=15166 width=58) (actual time=2057.25..2076.12 rows=26594 loops=1)
   Sort Key: e.userid, t.tid, e.performed_on
   ->  Merge Join  (cost=1251.59..26963.93 rows=15166 width=58) (actual time=231.81..1972.29 rows=26594 loops=1)
         Merge Cond: ("outer".tid = "inner".tid)
         ->  Index Scan using idx_tid_events on events e  (cost=0.00..18943.61 rows=268784 width=26) (actual
time=11.48..1358.15rows=268803 loops=1) 
               Filter: (performed_on <= 1088480270::bigint)
         ->  Materialize  (cost=7160.00..7160.00 rows=1725 width=32) (actual time=217.14..237.94 rows=26592 loops=1)
               ->  Merge Join  (cost=1251.59..7160.00 rows=1725 width=32) (actual time=63.14..214.75 rows=983 loops=1)
                     Merge Cond: ("outer".tid = "inner".tid)
                     ->  Index Scan using idx_tickets_tid on tickets t  (cost=0.00..5820.13 rows=18823 width=12)
(actualtime=2.97..135.57 rows=6020 loops=1) 
                           Filter: (((status <> 4::smallint) OR (modified >= 1086875491::bigint)) AND ((status <>
5::smallint)OR (modified >= 1086875491::bigint)) AND ((status <> 9::smallint) OR (modified >= 1086875491::bigint))) 
                     ->  Sort  (cost=1251.59..1261.38 rows=3915 width=20) (actual time=60.13..62.96 rows=3699 loops=1)
                           Sort Key: g.tid
                           ->  Seq Scan on ticket_groups g  (cost=0.00..1017.94 rows=3915 width=20) (actual
time=0.05..53.02rows=3699 loops=1) 
                                 Filter: (gid = 37)
 Total runtime: 2100.75 msec

--------------------- SPECIFIC -------------------------------------

SELECT   g.assigned_to, g.opened AS escalated, t.tid, t.opened,
         e.userid AS commiter, e.eid, e.performed_on, e.status
  FROM (events e JOIN (ticket_groups g JOIN tickets t USING(tid)) USING(tid))
   WHERE (g.gid = '37' AND
          (e.performed_on <= CAST('1088480270' AS bigint)) AND
          ((t.modified >= CAST('1086875491' AS bigint)) OR
           (t.status != '4' AND
            t.status != '5' AND
            t.status != '9')) AND (g.assigned_to IS NOT NULL AND g.assigned_to='1540') AND e.userid='1540')
   ORDER BY e.userid, t.tid, e.performed_on;

 Sort  (cost=5079.17..5079.18 rows=1 width=58) (actual time=218121.00..218122.02 rows=1441 loops=1)
   Sort Key: e.userid, t.tid, e.performed_on
   ->  Nested Loop  (cost=0.00..5079.16 rows=1 width=58) (actual time=85.28..218115.36 rows=1441 loops=1)
         Join Filter: ("outer".tid = "inner".tid)
         ->  Nested Loop  (cost=0.00..305.53 rows=1 width=46) (actual time=0.22..261.78 rows=2420 loops=1)
               ->  Index Scan using idx_ticket_groups_assigned on ticket_groups g  (cost=0.00..241.76 rows=5 width=20)
(actualtime=0.13..12.67 rows=604 loops=1) 
                     Index Cond: (assigned_to = 1540)
                     Filter: ((gid = 37) AND (assigned_to IS NOT NULL))
               ->  Index Scan using idx_tid_events on events e  (cost=0.00..12.50 rows=1 width=26) (actual
time=0.11..0.38rows=4 loops=604) 
                     Index Cond: (e.tid = "outer".tid)
                     Filter: ((performed_on <= 1088480270::bigint) AND (userid = 1540))
         ->  Seq Scan on tickets t  (cost=0.00..4538.35 rows=18823 width=12) (actual time=0.16..83.53 rows=6020
loops=2420)
               Filter: (((status <> 4::smallint) OR (modified >= 1086875491::bigint)) AND ((status <> 5::smallint) OR
(modified>= 1086875491::bigint)) AND ((status <> 9::smallint) OR (modified >= 1086875491::bigint))) 
 Total runtime: 218123.24 msec


--
James Antill -- james@and.org
Need an efficient and powerful string library for C?
http://www.and.org/vstr/

Re: Query gets slow when where clause increases

From
Dennis Bjorklund
Date:
On Tue, 29 Jun 2004, James Antill wrote:

>                ->  Index Scan using idx_ticket_groups_assigned on ticket_groups g  (cost=0.00..241.76 rows=5
width=20)(actual time=0.13..12.67 rows=604 loops=1) 
>                      Index Cond: (assigned_to = 1540)

Here the planner estimated that it would find 5 rows, but it did find 604.
I take that as a sign that you have not ran VACUUM ANALYZE recently?

If you done that, then maybe you need to change the statistics target for
that column. Before you set it on that column you could try to just alter
the default statistics target for one session like this:

SET default_statistics_target TO 100;
ANALYZE;

and then see if you get a better plan when you run the query afterwards.

If it helps you can either set the default_statistics_target in
postgresql.conf or set it just for some column using ALTER TABLE.

--
/Dennis Björklund