Re: Suggestions wanted for 7.2.4 query - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Suggestions wanted for 7.2.4 query
Date
Msg-id 200305040907.03306.josh@agliodbs.com
Whole thread Raw
In response to Re: Suggestions wanted for 7.2.4 query  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Suggestions wanted for 7.2.4 query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom,

> > SELECT sv_events.*, FALSE AS fuzzy_team FROM sv_events WHERE  EXISTS (
> > SELECT event_id FROM event_days
> >  WHERE event_days.event_id = sv_events.event_id AND  (event_day BETWEEN
> > ('2003-04-08'::TIMESTAMP WITHOUT TIME ZONE)
> >   AND ('2003-06-17 23:59'::TIMESTAMP WITHOUT TIME ZONE) ) );
>
> Is event_days.event_id unique?  If so, try

Regrettably, no.   Event_days is an iterative list of all of the days covered
by the event.   What's unique is event_days_pk, which is event_id, event_day.
If I did a direct join to event_days, multi-day events would appear on the
search results more than once .... which we *don't* want.

> This at least gives you some glimmer of a chance that the restriction on
> event_day can be used to avoid computing the entire join represented by
> sv_events.  With the exists() form, there's no chance...

Hmmm.  There are other ways I can get at the date limit for sv_events; I'll
try that.  Unfortunately, those ways require a seq scan on events, so I'm not
sure we have a net gain here (that is, I can't imagine that  a two-column
date calculation between two parameters could be indexed)

 However, by my reading, 75% of the cost of the query is the unindexed join
between "events" and "cases".    Are you saying that the planner being vague
about what will be returned from the EXISTS clause is what's triggering the
seq scan on "cases"?

--
Josh Berkus
Aglio Database Solutions
San Francisco


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Suggestions wanted for 7.2.4 query
Next
From: "Jim C. Nasby"
Date:
Subject: Re: More tablescanning fun