Re: [PERFORM] Inaccurate Explain Cost - Mailing list pgsql-general
From | Edson Richter |
---|---|
Subject | Re: [PERFORM] Inaccurate Explain Cost |
Date | |
Msg-id | BLU0-SMTP46616B9B80F299E088F5165CF9C0@phx.gbl Whole thread Raw |
In response to | Re: [PERFORM] Inaccurate Explain Cost (Shaun Thomas <sthomas@optionshouse.com>) |
Responses |
Re: [PERFORM] Inaccurate Explain Cost
|
List | pgsql-general |
Em 26/09/2012 17:03, Shaun Thomas escreveu: > On 09/26/2012 01:38 PM, Robert Sosinski wrote: > >> I seem to be getting an inaccurate cost from explain. Here are two >> examples for one query with two different query plans: > > Well, there's this: > > Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual > time=163.275..186869.844 rows=43904 loops=1) > > If anything's a smoking gun, that is. I could see why you'd want to > turn off nested loops to get better execution time. But the question > is: why did it think it would match so few rows in the first place? > The planner probably would have thrown away this query plan had it > known it would loop 20x more than it thought. > > I think we need to know what your default_statistics_target is set at, > and really... all of your relevant postgresql settings. > > Please see this: > > http://wiki.postgresql.org/wiki/Slow_Query_Questions > > But you also may need to look a lot more into your query itself. The > difference between a 2 or a 3 minute query isn't going to help you > much. Over here, we tend to spend more of our time turning 2 or 3 > minute queries into 20 or 30ms queries. But judging by your date > range, getting the last 2-months of data from a table that large > generally won't be fast by any means. > > That said, looking at your actual query: > > SELECT COUNT(DISTINCT eu.id) > FROM exchange_uploads eu > JOIN upload_destinations ud ON ud.id = eu.upload_destination_id > LEFT JOIN uploads u ON u.id = eu.upload_id > LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id > LEFT JOIN exchanges e ON e.id = ud.exchange_id > WHERE eu.created_at >= '2012-07-27 21:21:57.363944' > AND ud.office_id = 6; > > Doesn't need half of these joins. They're left joins, and never used > in the query results or where criteria. You could just use this: Interesting. I've similar situation, where user can choose a set of filters, and then the query must have several left joins "just in case" (user need in the filer). I know other database that is able to remove unnecessary outer joins from queries when they are not relevant and for instance become faster. Can't PostgreSQL do the same? Regards, Edson. > > SELECT COUNT(DISTINCT eu.id) > FROM exchange_uploads eu > JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id) > WHERE eu.created_at >= '2012-07-27 21:21:57.363944' > AND ud.office_id = 6; > > Though I presume this is just a count precursor to a query that > fetches the actul results and does need the left join. Either way, the > index scan from your second example matches 3.3M rows by using the > created_at index on exchange_uploads. That's not really very > restrictive, and so you have two problems: > > 1. Your nested loop stats from office_id are somehow wrong. Try > increasing your stats on that column, or just > default_statistics_target in general, and re-analyze. > 2. Your created_at criteria above match way too many rows, and will > also take a long time to process. > > Those are your two actual problems. We can probably get your query to > run faster, but those are pretty significant hurdles. > >
pgsql-general by date: