Thread: Constraint Type Coercion issue?

Constraint Type Coercion issue?

From
Josh Berkus
Date:
Folks,

Bob Ippolito found this while testing Bizgres.

It *seems* like our smarter type coercion rules do not apply when 
constraints are being generated.  That is, the types of constants in 
constraints, if not coerced, still default to the old "dumb" casting where 
the type of the comparing column isn't checked.

This is visible if you run a simple test on constraint exclusion:

CE not used
----------------------------
set constraint_exclusion=on;
create table a ( a bigint, b text );
create table a1 () inherits (a);
create table a2 () inherits (a);
create table a3 () inherits (a);
alter table a1 add constraint a1_a check ( a between 1 and 3);
alter table a2 add constraint a2_a check ( a between 4 and 6);
alter table a3 add constraint a3_a check ( a between 7 and 9);
insert into a1 values ( 1, 'B' );
insert into a2 values ( 5, 'F' );
insert into a3 values ( 8, 'G' );
explain analyze select * from a where a.a between 5 and 6;

CE used
---------------------------------
create table a ( a bigint, b text );
create table a1 () inherits (a);
create table a2 () inherits (a);
create table a3 () inherits (a);
alter table a1 add constraint a1_a check ( a between 1::BIGINT and 
3::BIGINT);
alter table a2 add constraint a2_a check ( a between 4::BIGINT and 
6::BIGINT);
alter table a3 add constraint a3_a check ( a between 7::BIGINT and 
9::BIGINT);
insert into a1 values ( 1, 'B' );
insert into a2 values ( 5, 'F' );
insert into a3 values ( 8, 'G' );
explain analyze select * from a where a.a between 5 and 6;


So, is this a real bug in constraints or does the problem lie somewhere 
else?   Is it fixable?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Constraint Type Coercion issue?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> So, is this a real bug in constraints or does the problem lie somewhere 
> else?   Is it fixable?

Not readily.  The problem is here:
    * We must find a btree opclass that contains both operators, else the    * implication can't be determined.  Also,
thepred_op has to be of    * default subtype (implying left and right input datatypes are the    * same); otherwise
it'sunsafe to put the pred_const on the left side    * of the test.  Also, the opclass must contain a suitable test
*operator matching the clause_const's type (which we take to mean    * that it has the same subtype as the original
clause_operator).

The predtest code depends on btree operator classes to tell it the
semantics of comparisons, and the operator class infrastructure just
doesn't support making cross-type inferences very well.  Given, say,int8var <= int4const
we'd like to determine whetherint8var <= otherint4const
is implied (or refuted), but to do this we need to compare the two
int4 constants, for which we need the int4 vs int4 comparison operator,
which has no relationship whatever to the int8 operator class in which
we find the int8 <= int4 operators that are present in the clauses.

There are some related cases in btree index search startup that would
be nice to fix too.

I've been thinking about this off and on, and would like to solve it
in the 8.2 time frame, but it's not happening for 8.1.  At a minimum
it'll require some significant changes in our concept of what an
operator class is.  The half-jelled ideas I have involve inventing
families of operator classes, so that we could for instance represent
the idea that "int2_ops, int4_ops, and int8_ops are all compatible,
and you can get consistent results from any of these operators".
There are some related problems involving mergejoin and the ability
to deal with reverse-sort indexes that also need to be dealt with,
and seem to require extensions to the operator class concept.
        regards, tom lane


Re: Constraint Type Coercion issue?

From
Martijn van Oosterhout
Date:
On Wed, Sep 14, 2005 at 02:23:29PM -0400, Tom Lane wrote:
> I've been thinking about this off and on, and would like to solve it
> in the 8.2 time frame, but it's not happening for 8.1.  At a minimum
> it'll require some significant changes in our concept of what an
> operator class is.  The half-jelled ideas I have involve inventing
[snip]

How much discussion has there been on this? I've been working my way
through COLLATE support and indexes and realised that what I really
want is to allow the comparison functions in operator classes to be
three argument functions. The two things to compare and the collate
order. A descending index is really just another collate order, albeit
one easily imposed from the outside.

Although numbers tend not to have many interesting collate orders,
complex numbers do, as do obviously strings. To some extent, collate
implies a sort of parameterised operator class...

Definitly 8.2 stuff, and it's not simple stuff either...

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Constraint Type Coercion issue?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> How much discussion has there been on this?

None yet; I had a few half-baked ideas but nothing worth presenting to
the list.

> To some extent, collate
> implies a sort of parameterised operator class...

Hmm.  But an index couldn't support more than one collation order
AFAICS.  It'd probably make more sense to create operators and an
operator class for each collation you want to support; the mapping
to a call of a common support function would be embedded inside the
operator definition.  Otherwise we have to pass around an additional
parameter through an awful lot of places...
        regards, tom lane


Re: Constraint Type Coercion issue?

From
Martijn van Oosterhout
Date:
On Wed, Sep 14, 2005 at 05:28:42PM -0400, Tom Lane wrote:
> > To some extent, collate
> > implies a sort of parameterised operator class...
>
> Hmm.  But an index couldn't support more than one collation order
> AFAICS.  It'd probably make more sense to create operators and an
> operator class for each collation you want to support; the mapping
> to a call of a common support function would be embedded inside the
> operator definition.  Otherwise we have to pass around an additional
> parameter through an awful lot of places...

Well yes, but given the number of possible locales, creating one class
for each seems excessive. And each class would have to create 5
operators (with underlying functions) and 1 comparitor function. Unless
you could shortcut something like:

CREATE OPERATOR CLASS ...  OPERATOR 1 <(text,text,'en_US')
...  FUNCTION 1 mycompare(text,text,'en_US')
...  COLLATE en_us;

Otherwise you end up with lots of functions which have be created on
the fly as the user decides what collate orders he wants. Invoking SQL
functions in the btree index create cycle doesn't seem efficient.

You would have to come up with new names for the operators each time
because the argument types are going to be the same. Although I guess
you could call it OPERATOR(<en_us), it's not like people are going to
use it directly.

Maybe it should be that we allow users to specify three argument
operators, and have an extra entry in the operator class which defines
the extra argument to pass.

It's not easy, like you say, there are a lot of places where an extra
argument would need to be passed...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Constraint Type Coercion issue?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Well yes, but given the number of possible locales, creating one class
> for each seems excessive. And each class would have to create 5
> operators (with underlying functions) and 1 comparitor function. Unless
> you could shortcut something like:

> CREATE OPERATOR CLASS ...
>    OPERATOR 1 <(text,text,'en_US')
> ...
>    FUNCTION 1 mycompare(text,text,'en_US')
> ...
>    COLLATE en_us;

The thing that's still fairly unclear to me is whether the collation
information is attached to the operators/functions or to the data.
I recall there's been some discussion of sticking collation IDs into
individual text Datums, which is a completely different path than what
you are positing above.  Does the SQL spec mandate one or the other of
these approaches?  If it does, do we want to believe it?  (The more I
read of SQL2003, the less impressed I get...)
        regards, tom lane


Re: Constraint Type Coercion issue?

From
Martijn van Oosterhout
Date:
On Wed, Sep 14, 2005 at 10:42:36PM -0400, Tom Lane wrote:
> The thing that's still fairly unclear to me is whether the collation
> information is attached to the operators/functions or to the data.
> I recall there's been some discussion of sticking collation IDs into
> individual text Datums, which is a completely different path than what
> you are positing above.  Does the SQL spec mandate one or the other of
> these approaches?  If it does, do we want to believe it?  (The more I
> read of SQL2003, the less impressed I get...)

The standard doesn't care I think. My reading is that the COLLATE
status is determined at parse time. From there you can plan however you
like.

AFAIUI, collate is a parameter/property of fields and domains and
affects operators and function, not the data. It only applies to
comparisons, not the output. You could add it as a property to the
data. I wrote a module, taggedtypes [1], which basically implemented
this. My main issue with it is that for the '<' operator, the same
collate property has to be on both arguments or it has to bail. The
only reason why you can attach COLLATE to fields and domains is to give
a default in case the user doesn't specify anything. But if the COLLATE
is given explicitly, it overrides anything.

By way of example:

CREATE TABLE t (a text collate c1, b text collate c2);

select * from t where a < b;              -- ERROR: Indeterminate collate
select * from t where a < b COLLATE c3;   -- Order by c3

My worry about adding the collate to the Datum is that your execution
tree becomes more complex. The two types, with and without COLLATE data
are not interchangable and you would have to add or remove them
continuously.

In the above example, you would have to create an executor node whose
sole purpose is to add the collate bit to a and b before passing it to
the '<' operator. An index can only support one collation at a time
also.

I don't think there is an easy way out...

[1] http://svana.org/kleptog/pgsql/taggedtypes.html
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.