Thread: Tightening selection of default sort/group operators

Tightening selection of default sort/group operators

From
Tom Lane
Date:
I noticed that the system is really pretty shaky about how it chooses
the datatype-specific operators to implement sorting and grouping.
In the GROUP BY case, for example, the parser looks up an operator
named '<' for the column datatype, and then sometime later the executor
looks up an operator named '=' for that datatype, and we blithely assume
that these operators play together and have the expected semantics.
This seems dangerous in a world of user-definable operators.  (I think
it's already broken by the standard datatype "tinterval", in fact,
because tinterval's "=" operator doesn't have the semantics of full
equality.)

What I'm thinking of doing instead is always looking up the "=" operator
by name, and accepting this as actually being equality if it is marked
mergejoinable or hashjoinable or has eqsel() as its restriction
selectivity estimator (oprrest).  If we are looking for a "<" operator
to implement sorting/grouping, then we require "=" to be mergejoinable,
and we use its lsortop operator (regardless of name).

The only standard datatypes for which this would change the behavior
are tinterval, path, lseg, and line --- none of which could be sorted/grouped
correctly with the available operators, anyhow.  User-defined datatypes
would stop working as sort/group columns unless the author were careful
to mark the equality operator as mergejoinable, but that's a simple
addition to the operator definition.

Comments, objections?
        regards, tom lane


Re: Tightening selection of default sort/group operators

From
Peter Eisentraut
Date:
Tom Lane writes:

> What I'm thinking of doing instead is always looking up the "=" operator
> by name, and accepting this as actually being equality if it is marked
> mergejoinable or hashjoinable or has eqsel() as its restriction
> selectivity estimator (oprrest).  If we are looking for a "<" operator
> to implement sorting/grouping, then we require "=" to be mergejoinable,
> and we use its lsortop operator (regardless of name).

My first thought is that this seems to be an awefully backwards way to
define operator semantic metadata.  I think we either have to flag
operators explicitly ("this is the less-than operator"), or we just
require that < <= = >= > have certain semantics.  I could be happy with
both.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Tightening selection of default sort/group operators

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> My first thought is that this seems to be an awefully backwards way to
> define operator semantic metadata.

Why?  The property we are interested in is that two operators '<' and
'=' will work for grouping --- ie, if you order by '<' and then combine
adjacent values for which '=' succeeds, you will get sane results.
A link between the two pg_operator entries seems a perfectly sensible
way to represent that.  The problem I've got is that the code doesn't
(or didn't, till this afternoon) make use of the available information.

> I think we either have to flag operators explicitly ("this is the
> less-than operator"), or we just require that < <= = >= > have certain
> semantics.  I could be happy with both.

I'm not totally thrilled with assuming that '=' is the name of the
equality operator.  It would be cleaner, probably, to add a column to
pg_type to point to the datatype's equality operator.  However, doing
that would pretty much break every existing user-defined type (since
they'd not know they need to specify this additional info) and there are
some circularity problems as well (operator won't exist yet when you do
CREATE TYPE).

Given those problems, I'm willing to stick with the existing assumption
that '=' names an equality operator for grouping.  The main point of
this change is to avoid getting burnt by using unrelated '=' and '<'
operators in a context where they need to play together.
        regards, tom lane