Implicit coercions, choosing types for constants, etc (yet again) - Mailing list pgsql-hackers

From Tom Lane
Subject Implicit coercions, choosing types for constants, etc (yet again)
Date
Msg-id 21045.1036954737@sss.pgh.pa.us
Whole thread Raw
Responses Re: Implicit coercions, choosing types for constants, etc
Re: Implicit coercions, choosing types for constants, etc
List pgsql-hackers
In the last episode of this long-running issue, we decided that life
would be better if we make small-enough integer constants be initially
typed as int2 rather than int4, and then auto-promote them up to int4
when necessary.  This would solve problems with, for example,SELECT ... WHERE smallintcol = 42
not using an index.

I tried doing this and immediately got a bunch of failures in the
regression tests.  It looks like we still have more work to do in
thinking out how the system should deal with this stuff.

One instructive example:

regression=# SELECT SUBSTRING('1234567890' FROM 3);substring
-----------34567890
(1 row)

regression=# SELECT SUBSTRING('1234567890' FROM 3::int2);substring
-----------3
(1 row)

Why did that happen?  Well, the parser has to choose among four possible
candidates for 2-parameter substring():
bit              | pg_catalog | substring | bit, integerbytea            | pg_catalog | substring | bytea, integertext
          | pg_catalog | substring | text, integertext             | pg_catalog | substring | text, text
 

When the second argument is initially typed as 'integer', the fourth
candidate is immediately dropped because it has no exact matches, and
then the third candidate ultimately wins because the unknown-literal
input is preferentially assumed to be of string category.  But when
the second argument is initially 'int2', we fail to eliminate any
candidates on the exact-match test, and then the fourth candidate
wins on the test that counts the number of preferred types at positions
requiring coercion --- text is a preferred type and integer isn't.
So we end up using the pattern-match interpretation of substring(),
even though we had to apply an int2-to-text implicit coercion to do it.

I think this is pretty clearly a bug, or at least undesirable behavior,
even if we weren't thinking of changing the initial typing of constants.
You can get this behavior today in existing releases, if you happen to
use a smallint variable as the second argument of substring().

One way to fix this is to make cross-category coercions to text not be
implicit casts.  (I've opined before that that's a bad idea, but not
gotten a lot of support for changing it.  Now that we have a distinction
between implicit and assignment casts, perhaps we could consider making
coercions to text be assignment casts, as a compromise?)

Alternatively we could try to rejigger the rules followed by
func_select_candidate(), which seem a little ad-hoc anyway.  I don't
have a better proposal to make at the moment, though.

Here's another problem:

regression=# select to_hex(42);to_hex
--------2a
(1 row)

regression=# select to_hex(42::int2);
ERROR:  Function to_hex(smallint) does not exist       Unable to identify a function that satisfies the given argument
types      You may need to add explicit typecasts
 

The problem here is that the parser can't choose between to_hex(integer)
and to_hex(bigint).  Neither integer nor bigint is a preferred type, so
there's no basis to prefer one over the other.

I suppose we could fix this particular case by eliminating
to_hex(integer), or adding to_hex(smallint), but that seems a very
unsatisfying answer.  I am wondering about adding some notion of
"conversion distance" associated with casts, and preferring choices
that require a smaller conversion distance; perhaps this could replace
the concept of "preferred type", too.  But again I don't have a specific
proposal to make.  Any thoughts?

The test case that was actually in the regression tests was
 select to_hex(256*256*256 - 1) AS "ffffff";
! ERROR:  Function to_hex(smallint) does not exist
!     Unable to identify a function that satisfies the given argument types
!     You may need to add explicit typecasts

Even had the parser resolved the overloaded to_hex call, this test would
have failed, because int2 multiplication overflows:

regression=# select 256::int2*256::int2*256::int2;?column?
----------       0
(1 row)

I am thinking that it might be good to eliminate all the basic
arithmetic operators on int2, so that you get int4 as the minimum
width for arithmetic.  But this cannot work unless we have some concept
like conversion distance, or the parser will just fail to make a choice
between int4, int8, etc alternatives.

In short: we still have a bunch of shortcomings in the type-resolution
process, and we can't allow small integers to be typed as int2 until
we fix 'em.

Comments?  Ideas?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: c/sql
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] MemSet inline for newNode