Re: Performance trouble finding records through related records - Mailing list pgsql-performance

From Andy Colson
Subject Re: Performance trouble finding records through related records
Date
Msg-id 4D6F054A.2090208@squeakycode.net
Whole thread Raw
In response to Re: Performance trouble finding records through related records  (sverhagen <sverhagen@wps-nl.com>)
Responses Re: Performance trouble finding records through related records  (sverhagen <sverhagen@wps-nl.com>)
List pgsql-performance
On 03/02/2011 06:12 PM, sverhagen wrote:
> Thanks for your help already!
> Hope you're up for some more :-)
>
>
> Andy Colson wrote:
>>
>> First off, excellent detail.
>>
>> Second, your explain analyze was hard to read... but since you are not
>> really interested in your posted query, I wont worry about looking at
>> it... but... have you seen:
>>
>> http://explain.depesz.com/
>>
>
> Thanks for that. Using it below :-)
>
>
> Andy Colson wrote:
>>
>> If you run the individual queries, without the union, are the part's slow
>> too?
>>
>
> Only problem is the second part. So that part can safely be isolated. Also
> the following does not play a role at this point: WHERE events2.eventtype_id
> IN
> (100,103,105,...
>
> Then I went ahead and denormalized the transactionId on both ends, so that
> both events_events records and events_eventdetails records have the
> transactionId (or NULL). That simplifies the query to this:
>
>     SELECT events_events.* FROM events_events WHERE transactionid IN (
>         SELECT transactionid FROM events_eventdetails customerDetails
>         WHERE customerDetails.keyname='customer_id'
>         AND substring(customerDetails.value,0,32)='1957'
>         AND transactionid IS NOT NULL
>     ) ORDER BY id LIMIT 50;
>
> To no avail. Also changing the above WHERE IN into implicit/explicit JOIN's
> doesn't make more than a marginal difference. Should joining not be very
> efficient somehow?
>
> http://explain.depesz.com/s/Pnb
>
> The above link nicely shows the hotspots, but I am at a loss now as how to
> approach them.
>
>
> Andy Colson wrote:
>>
>> Looked like your row counts (the estimate vs the actual) were way off,
>> have you analyzed lately?
>>
>
> Note sure what that means.
> Isn't all the maintenance nicely automated through my config?
>
>

In the explain analyze you'll see stuff like:
Append  (cost=0.00..3256444445.93 rows=115469434145 width=52) (actual time=0.304..58763.738 rows=222 loops=1)

This is taken from your first email.  Red flags should go off when the row counts are not close.  The first set is the
planner'sguess.  The second set is what actually happened.  The planner thought there would be 115,469,434,145 rows..
butturned out to only be 222.  That's usually caused by bad stats. 

> Isn't all the maintenance nicely automated through my config?
>

I'd never assume.  But the numbers in the plan you posted:

> http://explain.depesz.com/s/Pnb

look fine to me (well, the row counts), and I didnt look to much at that plan in the first email, so we can probably
ignoreit. 


> Andy Colson wrote:
>>
>> I could not tell from the explain analyze if an index was used, but I
>> notice you have a ton of indexes on events_events table.
>>
>
> Yes, a ton of indexes, but still not the right one :-)

But... many indexes will slow down update/inserts.  And an index on an unselective field can cause more problems than
itwould help.  Especially if the stats are off.  If PG has lots and lots of options, it'll take longer to plan querys
too. If it picks an index to use, that it thinks is selective, but in reality is not, you are in for a world of hurt. 

For your query, I think a join would be the best bet, can we see its explain analyze?

-Andy

pgsql-performance by date:

Previous
From: Selva manickaraja
Date:
Subject: Re: Performance Test for PostgreSQL9
Next
From: Greg Smith
Date:
Subject: Re: Performance Test for PostgreSQL9