Thread: lexing small ints as int2
Hi, I'm researching if smallint can be made a higher-class citizen of our type system than currently. Does anyone know where to find the discussion refered to here? http://archives.postgresql.org/pgsql-hackers/2008-10/msg01485.php I did some searches on the archives but no keywords I search for are yielding useful results. -- Álvaro Herrera <alvherre@alvh.no-ip.org>
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I'm researching if smallint can be made a higher-class citizen of our type > system than currently. > Does anyone know where to find the discussion refered to here? > http://archives.postgresql.org/pgsql-hackers/2008-10/msg01485.php I think this was the last time I tried it: http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php At the time, the main motivation for worrying about it was that cases like "WHERE smallintcol = 42" couldn't be indexed, because 42 is int4 not int2. We've since fixed that by allowing cross-type operators to be indexable. I also notice that one of the failure cases I cited might no longer be an issue now that we don't have implicit casts to text, but that change isn't going to do anything for the other cases. On the whole I'm still afraid that changing the initial typing of integer constants is going to break a lot of code while buying not much. Do you have a specific reason for reopening the issue? Or is your concern something different? regards, tom lane
Excerpts from Tom Lane's message of vie sep 03 19:36:06 -0400 2010: > > Does anyone know where to find the discussion refered to here? > > http://archives.postgresql.org/pgsql-hackers/2008-10/msg01485.php > > I think this was the last time I tried it: > http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php Interesting, thanks. I also had some vague thought about conversion distance today, thinking it could help solve the problem. > At the time, the main motivation for worrying about it was that cases > like "WHERE smallintcol = 42" couldn't be indexed, because 42 is int4 > not int2. We've since fixed that by allowing cross-type operators > to be indexable. Yeah, that's no longer an issue fortunately. > I also notice that one of the failure cases I cited might no longer be > an issue now that we don't have implicit casts to text, but that change > isn't going to do anything for the other cases. Right. > On the whole I'm still afraid that changing the initial typing of > integer constants is going to break a lot of code while buying not much. > Do you have a specific reason for reopening the issue? Or is your > concern something different? The problem I'm facing is functions declared to take type smallint not working unless the integer literal has an explicit cast. Currently the best answer is simply to avoid using smallint in functions, but this isn't completely satisfying. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Sep 3, 2010 at 9:19 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > The problem I'm facing is functions declared to take type smallint not > working unless the integer literal has an explicit cast. Currently the > best answer is simply to avoid using smallint in functions, but this > isn't completely satisfying. Maybe the lexer isn't the right place to fix this. The problem here (or so I gather) is that if I say foo(1), then 1 is an integer and we'll do an "implicit" cast to bigint, real, double precision, numeric, oid, or reg*, but the cast to smallint is assignment-only. But I wonder if we shouldn't allow implicit casting anyway when there is a unique best match. If the only foo(x) function is foo(smallint) and the user tries to call foo with one argument, what else can it mean? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > Maybe the lexer isn't the right place to fix this. The problem here > (or so I gather) is that if I say foo(1), then 1 is an integer and > we'll do an "implicit" cast to bigint, real, double precision, > numeric, oid, or reg*, but the cast to smallint is assignment-only. > But I wonder if we shouldn't allow implicit casting anyway when there > is a unique best match. If the only foo(x) function is foo(smallint) > and the user tries to call foo with one argument, what else can it > mean? Well, the devil is in the details. A key point here is that as things stand, there isn't a "unique best match" for this example. foo(smallint) isn't an allowable match at all. We'd have to first define some way of making it an allowable match, say that assignment casts are an allowed way of matching to a function's or operator's arguments; and then define some rules that make sure that that new behavior doesn't break all the cases that work now. For instance, if assignment casts are less desirable than implicit casts or exact matches, how much less desirable? Is, say, a match involving four exact type matches and one assignment cast better or worse than one involving one exact match and four implicit casts? The rules we have for this now are already pretty ad-hoc, and I'm afraid they'll get worse fast when there are several levels of match. This ties in to the comment Peter made about the "conversion distance" idea in that 2002 thread: there's no obvious principled way to assign the distances, and arbitrarily-chosen distances will lead to arbitrary behaviors. Anyway, if you think you can come up with something workable, have at it. I'm just here to tell you it's not as easy as it looks. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of vie sep 03 19:36:06 -0400 2010: >> On the whole I'm still afraid that changing the initial typing of >> integer constants is going to break a lot of code while buying not much. >> Do you have a specific reason for reopening the issue? Or is your >> concern something different? > The problem I'm facing is functions declared to take type smallint not > working unless the integer literal has an explicit cast. Currently the > best answer is simply to avoid using smallint in functions, but this > isn't completely satisfying. Agreed, but there's a very small limit to how much I'm willing to break to fix that, because it seems like just an annoyance rather than any significant functionality or performance limitation. I'm not certain whether that 2002 message described all the types of problems I saw in the regression test failures. (It might be worth trying the experiment again, even if that was a complete catalog then.) But for the sake of argument let's suppose that we just have these two problems to fix: 1. Not failing when a smallint constant is used and there are both int4 and int8 potential matches. I experimented just now with whether this could be fixed by labeling int4 as a preferred type. That turns out to have some bad consequences though: the regression tests show failures on cases like VALUES (1),(1.2) because select_common_type thinks it should resolve to int4 rather than numeric. With the current semantics of preferred types, it seems to be a bad idea to label something a preferred type unless everything in its type category can be implicitly cast to it. (Which again brings up the question of whether we need the concept at all...) We could maybe get around that by rejiggering the way that the type resolution rules make use of preferred types, but the further we go the more likely it is we'll break applications. 2. Not causing cases like "select 256*256" to throw overflow errors when they didn't before. In 2002 I suggested attacking this by dropping the int2 arithmetic operators. Today I'd suggest keeping them but making them deliver int4, so that they can't report overflow. Either way, there's some risk of changing the behavior seen by applications that do arithmetic on int2 columns. For example, foo(int2a + int2b) will currently call foo(int2) if it exists, but that would no longer happen. That might well be a corner-y enough case to pass as acceptable collateral damage --- except that if the whole reason for doing this is to eliminate surprising failures to call functions taking int2, breaking a case like that that used to work hardly seems like an acceptable tradeoff. So it all looks pretty messy, and any fix carries considerable risk of changing behaviors that existing applications depend on. Without some new idea(s) that I'm not seeing now, it doesn't seem like it'd be worth taking the risk. regards, tom lane
On Sat, Sep 4, 2010 at 12:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Maybe the lexer isn't the right place to fix this. The problem here >> (or so I gather) is that if I say foo(1), then 1 is an integer and >> we'll do an "implicit" cast to bigint, real, double precision, >> numeric, oid, or reg*, but the cast to smallint is assignment-only. >> But I wonder if we shouldn't allow implicit casting anyway when there >> is a unique best match. If the only foo(x) function is foo(smallint) >> and the user tries to call foo with one argument, what else can it >> mean? > > Well, the devil is in the details. A key point here is that as things > stand, there isn't a "unique best match" for this example. foo(smallint) > isn't an allowable match at all. We'd have to first define some way of > making it an allowable match, say that assignment casts are an allowed > way of matching to a function's or operator's arguments; and then define > some rules that make sure that that new behavior doesn't break all the > cases that work now. For instance, if assignment casts are less > desirable than implicit casts or exact matches, how much less desirable? > Is, say, a match involving four exact type matches and one assignment > cast better or worse than one involving one exact match and four > implicit casts? The rules we have for this now are already pretty > ad-hoc, and I'm afraid they'll get worse fast when there are several > levels of match. Yeah, that's a problem. At the same time, I can't help feeling that we ought to have a goal that if there is only one function with a particular name that is compatible with the specified number of arguments, we ought to have a fairly good reason not to call it. I am not too sure that the distinction between implicit casts and assignment casts is all that useful; it's not too easy to remember which casts are of which type, nor is it obvious that the current catalog entries follow any particularly consistent rule. One idea I had was to ditch that distinction and instead generalize typispreferred to a 1-byte integer, typpreference. When there is more than one candidate, we look for one that is strictly better than all the others, meaning that it is better than any other candidate in at least one argument position and at least as good in every argument position. Better in a given argument position means that either an exact match rather than not, or a higher typpreference value as among types in the same category. Then you could fix the smallint case by giving all the typcategory-N data types ascending typpreference values: int2, int4, int8, float4, float8, numeric. We'll prefer to cast up rather than down, but we'll cast try to cast down if the alternative is to fail outright. If there's no candidate that dominates all the others, then we complain of ambiguity and give up. I guess you could also keep the notion of implicit vs. assignment casts but just make more of them implicit and rely on differing typpreference values to disambiguate. One big problem in this area is that it's pretty easy to change things in a way that's not backward compatible, so it's not really worth changing anything at all unless we're pretty confident that we've got it right. The whole pg_cast catalog looks to me like kind of a mess. For example, one unintended consequence of lexing small integers as int2 would be that foo(0::bool) would start failing. Why? Because there's an explicit cast from int to bool, but not from smallint or bigint. There's not much principle there, just expediency. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > I am not too sure that the distinction between implicit casts and > assignment casts is all that useful; We've been there and done that; it doesn't work. The current scheme was invented specifically because a two-way design didn't work. http://archives.postgresql.org/pgsql-hackers/2002-09/msg00900.php regards, tom lane
On Sun, Sep 5, 2010 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I am not too sure that the distinction between implicit casts and >> assignment casts is all that useful; > > We've been there and done that; it doesn't work. The current scheme > was invented specifically because a two-way design didn't work. > > http://archives.postgresql.org/pgsql-hackers/2002-09/msg00900.php Well, sure, if you remove the distinction between implicit and assignment casts *without doing anything else*, it's not going to work. But that's not what I proposed. And as Peter said in one of his responses: "Finally, I believe this paints over the real problems, namely the inadequate and hardcoded type category preferences and the inadequate handling of numerical constants. Both of these issues have had adequate approaches proposed in the past and would solve this an a number of other issues." I agree. We pride ourselves on having an extensible database product, but our current type system is fairly hostile to extension. The typispreferred stuff works OK for deciding between two types (which is not coincidentally the number of distinct values that can be represented by a Boolean column) but after that it breaks down pretty quickly. If you're adding specialized types to represent zoo animals or constellations or six-dimensional polyhedra, it works OK, but if you try to add addition stringy or numbery things, there are problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company