Thread: Implicit coercions, choosing types for constants, etc (yet again)
In the last episode of this long-running issue, we decided that life would be better if we make small-enough integer constants be initially typed as int2 rather than int4, and then auto-promote them up to int4 when necessary. This would solve problems with, for example,SELECT ... WHERE smallintcol = 42 not using an index. I tried doing this and immediately got a bunch of failures in the regression tests. It looks like we still have more work to do in thinking out how the system should deal with this stuff. One instructive example: regression=# SELECT SUBSTRING('1234567890' FROM 3);substring -----------34567890 (1 row) regression=# SELECT SUBSTRING('1234567890' FROM 3::int2);substring -----------3 (1 row) Why did that happen? Well, the parser has to choose among four possible candidates for 2-parameter substring(): bit | pg_catalog | substring | bit, integerbytea | pg_catalog | substring | bytea, integertext | pg_catalog | substring | text, integertext | pg_catalog | substring | text, text When the second argument is initially typed as 'integer', the fourth candidate is immediately dropped because it has no exact matches, and then the third candidate ultimately wins because the unknown-literal input is preferentially assumed to be of string category. But when the second argument is initially 'int2', we fail to eliminate any candidates on the exact-match test, and then the fourth candidate wins on the test that counts the number of preferred types at positions requiring coercion --- text is a preferred type and integer isn't. So we end up using the pattern-match interpretation of substring(), even though we had to apply an int2-to-text implicit coercion to do it. I think this is pretty clearly a bug, or at least undesirable behavior, even if we weren't thinking of changing the initial typing of constants. You can get this behavior today in existing releases, if you happen to use a smallint variable as the second argument of substring(). One way to fix this is to make cross-category coercions to text not be implicit casts. (I've opined before that that's a bad idea, but not gotten a lot of support for changing it. Now that we have a distinction between implicit and assignment casts, perhaps we could consider making coercions to text be assignment casts, as a compromise?) Alternatively we could try to rejigger the rules followed by func_select_candidate(), which seem a little ad-hoc anyway. I don't have a better proposal to make at the moment, though. Here's another problem: regression=# select to_hex(42);to_hex --------2a (1 row) regression=# select to_hex(42::int2); ERROR: Function to_hex(smallint) does not exist Unable to identify a function that satisfies the given argument types You may need to add explicit typecasts The problem here is that the parser can't choose between to_hex(integer) and to_hex(bigint). Neither integer nor bigint is a preferred type, so there's no basis to prefer one over the other. I suppose we could fix this particular case by eliminating to_hex(integer), or adding to_hex(smallint), but that seems a very unsatisfying answer. I am wondering about adding some notion of "conversion distance" associated with casts, and preferring choices that require a smaller conversion distance; perhaps this could replace the concept of "preferred type", too. But again I don't have a specific proposal to make. Any thoughts? The test case that was actually in the regression tests was select to_hex(256*256*256 - 1) AS "ffffff"; ! ERROR: Function to_hex(smallint) does not exist ! Unable to identify a function that satisfies the given argument types ! You may need to add explicit typecasts Even had the parser resolved the overloaded to_hex call, this test would have failed, because int2 multiplication overflows: regression=# select 256::int2*256::int2*256::int2;?column? ---------- 0 (1 row) I am thinking that it might be good to eliminate all the basic arithmetic operators on int2, so that you get int4 as the minimum width for arithmetic. But this cannot work unless we have some concept like conversion distance, or the parser will just fail to make a choice between int4, int8, etc alternatives. In short: we still have a bunch of shortcomings in the type-resolution process, and we can't allow small integers to be typed as int2 until we fix 'em. Comments? Ideas? regards, tom lane
On Sun, 10 Nov 2002, Tom Lane wrote: > In the last episode of this long-running issue, we decided that life > would be better if we make small-enough integer constants be initially > typed as int2 rather than int4, and then auto-promote them up to int4 > when necessary. What kind of type system do postgresql (or SQL in general) use? I don't know much about the postgresql codebase but I do know something about type inference of functional languages. It seems to me that one would like to have a polymorphic typesystem with constraints to handle overloading, subtyping or whatever is needed in SQL. > This would solve problems with, for example, > SELECT ... WHERE smallintcol = 42 > not using an index. Using a suitable typesystem 42 could be said to have a type like isInt z => z which should be read that z is the type and isInt z is a constraint on z saying that z is an integer type (that means for example that z can never be instantiated to Bool). Then the use of smallintcol = 42 where smallintcol has type int2 and where equality = is overloaded for types Int2 -> Int2 -> Bool, Int4 -> Int4 -> Bool (and so on) would force 42 to be the type Int2 in this case, since the first argument of = had type Int2. Is there anywhere I can read about the typesystem in SQL in general and postgresql in particular? There are a number of type systems in the functional world with properties like this. Some very advanced and some simpler. I have a feeling from reading this list that the type inference in postgresql as a bit ad hook. But i've not read the source at all so it might be unfair to say such a thing. In the functional language Haskell there is also a defaulting rule that is used if you end up with constants like 42 still with type isInt z => z. If the type was left like this it just imply that any numeric type for 42 would do. In this case maybe z is defaulted to Int4. In most cases they way you use the constant would force it to be a particular type. Of the examples you gave in the letter I don't see anything that shouldn't work with a more advanced typesystem like this. But I'm sure there are other strange constructs in postgresql that I don't know about. I would love to make some small toy implementation to try out things but in the nearest future I don't have time for that. But this interests me so maybe I'll give it a try some day (like next summer :-). At least I would like to know more about how it works in postgresql today. It's possible that what I talk about do not apply to SQL or that postgresql already implements similar things. -- /Dennis
Dennis Björklund <db@zigo.dhs.org> writes: > It seems to me that one would like to have a polymorphic typesystem with > constraints to handle overloading, subtyping or whatever is needed in > SQL. What we have now is indeed pretty ad-hoc, but a full-blown type inference system might be overkill. Not sure. It would be interesting to look into it. > Is there anywhere I can read about the typesystem in SQL in general and > postgresql in particular? There's the user's guide http://developer.postgresql.org/docs/postgres/typeconv.html and there's the source code (see src/backend/parser/, particularly parse_coerce.c, parse_func.c, parse_oper.c). Not much in between I'm afraid, but the source code is reasonably well-commented. regards, tom lane
Tom Lane writes: > One way to fix this is to make cross-category coercions to text not be > implicit casts. (I've opined before that that's a bad idea, but not > gotten a lot of support for changing it. Now that we have a distinction > between implicit and assignment casts, perhaps we could consider making > coercions to text be assignment casts, as a compromise?) I thought we had agreed to make those explicit. In fact, I vaguely recall you not liking that notion ... > I suppose we could fix this particular case by eliminating > to_hex(integer), or adding to_hex(smallint), but that seems a very > unsatisfying answer. I am wondering about adding some notion of > "conversion distance" associated with casts, and preferring choices > that require a smaller conversion distance; perhaps this could replace > the concept of "preferred type", too. But again I don't have a specific > proposal to make. Any thoughts? A couple of months ago I played around with the notion of adding a numeric preference ("distance", as you call it) to the casts, but in the end this solved only a small number of cases and created a big mess at the same time. When you have to pick arbitrary distances, any algorithm will give you arbitrary answers, after all. I think we can construct a type precedence list using the existing catalog information. Considering the example of choosing between int2->int4 and int2->int8, the system could notice that there is an implicit cast int4->int8 (and no implicit cast the other way), so int4 is "less" than int8. (If there were an implicit cast int8->int4, then the system would have to consider int4 and int8 to be equivalent and picking one at random would be reasonable.) -- Peter Eisentraut peter_e@gmx.net