Re: [PERFORM] Inaccurate Explain Cost - Mailing list pgsql-general

From Edson Richter
Subject Re: [PERFORM] Inaccurate Explain Cost
Date
Msg-id BLU0-SMTP46616B9B80F299E088F5165CF9C0@phx.gbl
Whole thread Raw
In response to Re: [PERFORM] Inaccurate Explain Cost  (Shaun Thomas <sthomas@optionshouse.com>)
Responses Re: [PERFORM] Inaccurate Explain Cost
List pgsql-general
Em 26/09/2012 17:03, Shaun Thomas escreveu:
> On 09/26/2012 01:38 PM, Robert Sosinski wrote:
>
>> I seem to be getting an inaccurate cost from explain.  Here are two
>> examples for one query with two different query plans:
>
> Well, there's this:
>
> Nested Loop  (cost=0.00..151986.53 rows=2817 width=4) (actual
> time=163.275..186869.844 rows=43904 loops=1)
>
> If anything's a smoking gun, that is. I could see why you'd want to
> turn off nested loops to get better execution time. But the question
> is: why did it think it would match so few rows in the first place?
> The planner probably would have thrown away this query plan had it
> known it would loop 20x more than it thought.
>
> I think we need to know what your default_statistics_target is set at,
> and really... all of your relevant postgresql settings.
>
> Please see this:
>
> http://wiki.postgresql.org/wiki/Slow_Query_Questions
>
> But you also may need to look a lot more into your query itself. The
> difference between a 2 or a 3 minute query isn't going to help you
> much. Over here, we tend to spend more of our time turning 2 or 3
> minute queries into 20 or 30ms queries. But judging by your date
> range, getting the last 2-months of data from a table that large
> generally won't be fast by any means.
>
> That said, looking at your actual query:
>
> SELECT COUNT(DISTINCT eu.id)
>   FROM exchange_uploads eu
>   JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
>   LEFT JOIN uploads u ON u.id = eu.upload_id
>   LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
>   LEFT JOIN exchanges e ON e.id = ud.exchange_id
>  WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
>    AND ud.office_id = 6;
>
> Doesn't need half of these joins. They're left joins, and never used
> in the query results or where criteria. You could just use this:

Interesting. I've similar situation, where user can choose a set of
filters, and then the query must have several left joins "just in case"
(user need in the filer).
I know other database that is able to remove unnecessary outer joins
from queries when they are not relevant and for instance become faster.
Can't PostgreSQL do the same?

Regards,

Edson.

>
> SELECT COUNT(DISTINCT eu.id)
>   FROM exchange_uploads eu
>   JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
>  WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
>    AND ud.office_id = 6;
>
> Though I presume this is just a count precursor to a query that
> fetches the actul results and does need the left join. Either way, the
> index scan from your second example matches 3.3M rows by using the
> created_at index on exchange_uploads. That's not really very
> restrictive, and so you have two problems:
>
> 1. Your nested loop stats from office_id are somehow wrong. Try
> increasing your stats on that column, or just
> default_statistics_target in general, and re-analyze.
> 2. Your created_at criteria above match way too many rows, and will
> also take a long time to process.
>
> Those are your two actual problems. We can probably get your query to
> run faster, but those are pretty significant hurdles.
>
>



pgsql-general by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: [PERFORM] Inaccurate Explain Cost
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: [PERFORM] Inaccurate Explain Cost