Implicit coercions, choosing types for constants, etc (yet again) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Implicit coercions, choosing types for constants, etc (yet again) |
Date | |
Msg-id | 21045.1036954737@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Implicit coercions, choosing types for constants, etc
Re: Implicit coercions, choosing types for constants, etc |
List | pgsql-hackers |
In the last episode of this long-running issue, we decided that life would be better if we make small-enough integer constants be initially typed as int2 rather than int4, and then auto-promote them up to int4 when necessary. This would solve problems with, for example,SELECT ... WHERE smallintcol = 42 not using an index. I tried doing this and immediately got a bunch of failures in the regression tests. It looks like we still have more work to do in thinking out how the system should deal with this stuff. One instructive example: regression=# SELECT SUBSTRING('1234567890' FROM 3);substring -----------34567890 (1 row) regression=# SELECT SUBSTRING('1234567890' FROM 3::int2);substring -----------3 (1 row) Why did that happen? Well, the parser has to choose among four possible candidates for 2-parameter substring(): bit | pg_catalog | substring | bit, integerbytea | pg_catalog | substring | bytea, integertext | pg_catalog | substring | text, integertext | pg_catalog | substring | text, text When the second argument is initially typed as 'integer', the fourth candidate is immediately dropped because it has no exact matches, and then the third candidate ultimately wins because the unknown-literal input is preferentially assumed to be of string category. But when the second argument is initially 'int2', we fail to eliminate any candidates on the exact-match test, and then the fourth candidate wins on the test that counts the number of preferred types at positions requiring coercion --- text is a preferred type and integer isn't. So we end up using the pattern-match interpretation of substring(), even though we had to apply an int2-to-text implicit coercion to do it. I think this is pretty clearly a bug, or at least undesirable behavior, even if we weren't thinking of changing the initial typing of constants. You can get this behavior today in existing releases, if you happen to use a smallint variable as the second argument of substring(). One way to fix this is to make cross-category coercions to text not be implicit casts. (I've opined before that that's a bad idea, but not gotten a lot of support for changing it. Now that we have a distinction between implicit and assignment casts, perhaps we could consider making coercions to text be assignment casts, as a compromise?) Alternatively we could try to rejigger the rules followed by func_select_candidate(), which seem a little ad-hoc anyway. I don't have a better proposal to make at the moment, though. Here's another problem: regression=# select to_hex(42);to_hex --------2a (1 row) regression=# select to_hex(42::int2); ERROR: Function to_hex(smallint) does not exist Unable to identify a function that satisfies the given argument types You may need to add explicit typecasts The problem here is that the parser can't choose between to_hex(integer) and to_hex(bigint). Neither integer nor bigint is a preferred type, so there's no basis to prefer one over the other. I suppose we could fix this particular case by eliminating to_hex(integer), or adding to_hex(smallint), but that seems a very unsatisfying answer. I am wondering about adding some notion of "conversion distance" associated with casts, and preferring choices that require a smaller conversion distance; perhaps this could replace the concept of "preferred type", too. But again I don't have a specific proposal to make. Any thoughts? The test case that was actually in the regression tests was select to_hex(256*256*256 - 1) AS "ffffff"; ! ERROR: Function to_hex(smallint) does not exist ! Unable to identify a function that satisfies the given argument types ! You may need to add explicit typecasts Even had the parser resolved the overloaded to_hex call, this test would have failed, because int2 multiplication overflows: regression=# select 256::int2*256::int2*256::int2;?column? ---------- 0 (1 row) I am thinking that it might be good to eliminate all the basic arithmetic operators on int2, so that you get int4 as the minimum width for arithmetic. But this cannot work unless we have some concept like conversion distance, or the parser will just fail to make a choice between int4, int8, etc alternatives. In short: we still have a bunch of shortcomings in the type-resolution process, and we can't allow small integers to be typed as int2 until we fix 'em. Comments? Ideas? regards, tom lane
pgsql-hackers by date: