Re: IN, BETWEEN, spec compliance, and odd operator names - Mailing list pgsql-hackers

From Tom Lane
Subject Re: IN, BETWEEN, spec compliance, and odd operator names
Date
Msg-id 20145.1219675410@sss.pgh.pa.us
Whole thread Raw
In response to Re: IN, BETWEEN, spec compliance, and odd operator names  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: IN, BETWEEN, spec compliance, and odd operator names  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sun, Aug 24, 2008 at 09:24:23PM -0400, Tom Lane wrote:
>> My own feeling is that we should avoid imputing particular semantics
>> to particular operator names, and so these constructs should always be
>> defined by reference to operators found in a default opclass for the
>> datatype, rather than by specific operator names.

> ISTM the problem is that there's no easy way to refer to "operators
> found in a default opclass", so perhaps we could invent a construct:

> A OPERATOR(btree,2) B

Huh?  I don't understand why you think we need to expose this to users.
A user would presumably just write the name of the operator he wants,
if he's writing out a direct operator call.

To me the issue is what we consider IN and BETWEEN and similar
constructs to "mean", which in a datatype world boils down to choosing
which of the datatype's operators to implement the construct with.

> The problem is inferring the type, if A and B are
> different types, which operator class do you use?

Yeah, the cross-type problem occurred to me this morning too.  For
instance consider
int4_var BETWEEN int8_var AND numeric_var

The current implementation of BETWEEN acts fairly sanely in this case;
although it ends up choosing two entirely unrelated operators (int48_ge
and numeric_le, which are not in any of the same operator families),
which is not great for subsequent optimization purposes.  I'm not sure
about how we'd make a good choice using an opclass-driven approach.
Would we want to insist that both operators are found in the same
family?  Perhaps so, because otherwise it's not real clear that you've
created a meaningful range constraint.  Yet it's definitely possible
that such a requirement would cause the query to fail where it used to
work (for some value of "work").

Another way to approach it would be to consider the problem as being
similar to overloaded-function resolution, ie think of it as trying
to match an implicit or explicit function between(anyelement,
anyelement, anyelement).  But that would fail to take into account
whether there are actually any suitable comparison operators for
whichever common type it chooses.

There's also an issue of not wanting to coerce variables unnecessarily.
In the above example, suppose int4_var has an index.  The derived
clause int4_var int48_ge int8_var will be indexable using that index,
whereas int4_var::numeric numeric_le numeric_var won't be,  If we coerce
all three inputs to numeric to enforce that we have consistent semantics
on both ends of the range check, then neither end will be indexable;
which seems like a step backwards.

So it's all a bit harder than it looks.  Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: IN, BETWEEN, spec compliance, and odd operator names
Next
From: Alvaro Herrera
Date:
Subject: Re: can't stop autovacuum by HUP'ing the server