Re: Recognizing range constraints (was Re: Plan for - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Recognizing range constraints (was Re: Plan for
Date
Msg-id 1112829886.16721.1104.camel@localhost.localdomain
Whole thread Raw
In response to Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Wed, 2005-04-06 at 18:09 -0400, Tom Lane wrote:
> I wrote:
> > Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl> writes:
> >> SELECT COUNT(*) FROM
> >> data_main AS dm,
> >> postcodes AS p
> >> WHERE dm.range BETWEEN p.range_from AND p.range_till
>
> > Planner error ... because it doesn't have any good way to estimate the
> > number of matching rows, it thinks that way is a bit more expensive than
> > data_main as the outside, but in reality it seems a good deal cheaper:
>
> BTW, it would get the right answer if it had recognized the WHERE clause
> as a range restriction --- it still doesn't know exactly what fraction
> of rows will match, but its default estimate is a great deal tighter for
> "WHERE x > something AND x < somethingelse" than it is for two unrelated
> inequality constraints.  Enough tighter that it would have gone for the
> correct plan.
>
> The problem is that it doesn't recognize the WHERE as a range constraint
> on dm.range.

> Can anyone suggest a more general rule?  Do we need for example to
> consider whether the relation membership is the same in two clauses
> that might be opposite sides of a range restriction?  It seems like
>
>     a.x > b.y AND a.x < b.z

Not sure we need a more general rule. There's only three ways to view
this pair of clauses:
i) its a range constraint i.e. BETWEEN
ii) its the complement of that i.e. NOT BETWEEN
iii) its a mistake, but we're not allowed to take that path

Arjen's query and your generalisation of it above is a common type of
query - using a lookup of a reference data table with begin/end
effective dates. It would be very useful if this was supported.

> probably can be treated as a range restriction on a.x for this purpose,
> but I'm much less sure that the same is true of
>
>     a.x > b.y AND a.x < c.z

I can't think of a query that would use such a construct, and might even
conclude that it was very poorly normalised model. I would suggest that
this is much less common in practical use.

Best Regards, Simon Riggs


pgsql-performance by date:

Previous
From: Adam Palmblad
Date:
Subject: Tweaking a C Function I wrote
Next
From: Neil Conway
Date:
Subject: Re: Tweaking a C Function I wrote