Home > mailing lists

Re: complex query performance assistance request - Mailing list pgsql-performance

From	John Mendenhall
Subject	Re: complex query performance assistance request
Date	August 22, 2005 18:55:41
Msg-id	20050822210751.GA32479@calvin.surfutopia.net Whole thread Raw
In response to	Re: complex query performance assistance request (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: complex query performance assistance request
List	pgsql-performance

Tree view

Tom,

> No, the issue is that you've got eight times as much data in the
> production server; so it's hardly surprising that it takes about
> eight times longer.
>
> The production query is spending most of its time on the subplan
> attached to the contacts table:
>
> >                      ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..161018.18 rows=20120
width=85)(actual time=2.769..6188.886 rows=1548 loops=1) 
> >                            Filter: ((lead_deleted IS NULL) AND (subplan))
> >                            SubPlan
> >                              ->  Nested Loop  (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0
loops=40262)
>
> 0.129 * 40262 = 5193.798, so about five seconds in the subplan and
> another one second in the indexscan proper.  The problem is that the
> subplan (the EXISTS clause) is iterated for each of 40262 rows of
> contacts --- basically, every contacts row that has null lead_deleted.
>
> On the dev server the same scan shows these numbers:
>
> >                                              ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) 
> >                                                    Filter: ((lead_deleted IS NULL) AND (subplan))
> >                                                    SubPlan
> >                                                      ->  Nested Loop  (cost=0.00..6.75 rows=2 width=10) (actual
time=0.103..0.103rows=0 loops=5576) 
>
> I'd suggest trying to get rid of the EXISTS clause --- can you refactor
> that into something that joins at the top query level?
>
> Or, if this is 7.4 or later (and you should ALWAYS mention which version
> you are using in a performance question, because it matters), try to
> convert the EXISTS into an IN.  "x IN (subselect)" is planned much better
> than "EXISTS(subselect-using-x)" these days.

We are using version 7.4.6.

The number of contacts in the dev env is 37080.
The number of contacts in the production env is 40307.
The amount of data is statistically about the same.

However, the number of lead_requests are much different.
The dev env has 1438 lead_requests, the production env
has 15554 lead_requests.  Each contacts row can have
multiple lead_requests, each lead_requests entry can
have an open or closed status.  We are trying to select
the contacts with an open lead_request.

Would it be best to attempt to rewrite it for IN?
Or, should we try to tie it in with a join?  I would
probably need to GROUP so I can just get a count of those
contacts with open lead_requests.  Unless you know of a
better way?

Thanks for your assistance.  This is helping a lot.
BTW, what does the Materialize query plan element mean?

Thanks again.

JohnM

--
John Mendenhall
john@surfutopia.net
surf utopia
internet services

pgsql-performance by date:

From: Tom Lane
Date: 22 August 2005, 18:13:03
Subject: Re: complex query performance assistance request

From: "Jignesh Shah"
Date: 22 August 2005, 19:37:36
Subject: Re: MemoryContextSwitchTo during table scan?

Re: complex query performance assistance request - Mailing list pgsql-performance

Previous

Next