Re: complex query performance assistance request

From: Tom Lane
Subject: Re: complex query performance assistance request
Date: ,
Msg-id: 20830.1124741718@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: complex query performance assistance request  (John Mendenhall)
Responses: Re: complex query performance assistance request  (John Mendenhall)
List: pgsql-performance

Tree view

complex query performance assistance request  (John Mendenhall, )
 Re: complex query performance assistance request  (John Mendenhall, )
  Re: complex query performance assistance request  (Tom Lane, )
   Re: complex query performance assistance request  (John Mendenhall, )
    Re: complex query performance assistance request  (Tom Lane, )
     Re: complex query performance assistance request  (John Mendenhall, )

John Mendenhall <> writes:
> The issue is the query plan is different, and thus,
> not up to the performance we need.

No, the issue is that you've got eight times as much data in the
production server; so it's hardly surprising that it takes about
eight times longer.

The production query is spending most of its time on the subplan
attached to the contacts table:

>                      ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..161018.18 rows=20120
width=85)(actual time=2.769..6188.886 rows=1548 loops=1) 
>                            Filter: ((lead_deleted IS NULL) AND (subplan))
>                            SubPlan
>                              ->  Nested Loop  (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0
loops=40262)

0.129 * 40262 = 5193.798, so about five seconds in the subplan and
another one second in the indexscan proper.  The problem is that the
subplan (the EXISTS clause) is iterated for each of 40262 rows of
contacts --- basically, every contacts row that has null lead_deleted.

On the dev server the same scan shows these numbers:

>                                              ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) 
>                                                    Filter: ((lead_deleted IS NULL) AND (subplan))
>                                                    SubPlan
>                                                      ->  Nested Loop  (cost=0.00..6.75 rows=2 width=10) (actual
time=0.103..0.103rows=0 loops=5576) 

Here the subplan is iterated only 5576 times for 574 total msec.  It's
still the bulk of the runtime though; the fact that the upper levels
of the plan are a bit different has got little to do with where the time
is going.

I'd suggest trying to get rid of the EXISTS clause --- can you refactor
that into something that joins at the top query level?

Or, if this is 7.4 or later (and you should ALWAYS mention which version
you are using in a performance question, because it matters), try to
convert the EXISTS into an IN.  "x IN (subselect)" is planned much better
than "EXISTS(subselect-using-x)" these days.

            regards, tom lane


pgsql-performance by date:

From: Philip Pinkerton
Date:
Subject: pgbench
From: Alan Stange
Date:
Subject: unused item pointers?