Re: Suggestions wanted for 7.2.4 query - Mailing list pgsql-performance

From Tom Lane
Subject Re: Suggestions wanted for 7.2.4 query
Date
Msg-id 4612.1052068989@sss.pgh.pa.us
Whole thread Raw
In response to Re: Suggestions wanted for 7.2.4 query  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Suggestions wanted for 7.2.4 query
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
>> This at least gives you some glimmer of a chance that the restriction on
>> event_day can be used to avoid computing the entire join represented by
>> sv_events.  With the exists() form, there's no chance...

> Hmmm.

I have to take that back (must have been out too late last night ;-)).
The EXISTS subquery *is* getting pushed down to become a restriction on
events alone; that's what the "SubPlan" is.  However, it'd still be
worth looking for another way to express it, because the planner is
pretty clueless about the selectivity of EXISTS restrictions.  That's
what's causing it to drastically overestimate the number of rows taken
from "events" (14812 vs 1919), which in turn drives it away from using
the nestloop-with-inner-indexscan join style for joining to "cases".

> Are you saying that the planner being vague about what will be
> returned from the EXISTS clause is what's triggering the seq scan on
> "cases"?

Right.  The nestloop/indexscan style only wins if there are not too many
outer rows.  If the EXISTS constraint actually did succeed for 14812
"events" rows, the planner would probably be making the right choice to
use a hash join.

BTW, have you tried lowering the value of "random_page_cost"?  Looking
at the relative costs in these examples makes me think most of your
tables are cached in memory.  Of course, if that's not true during
day-to-day production then you need to be wary about reducing the setting.

            regards, tom lane


pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: More tablescanning fun
Next
From: Josh Berkus
Date:
Subject: Re: Suggestions wanted for 7.2.4 query