Thread: lexing small ints as int2

lexing small ints as int2

From
Alvaro Herrera
Date:
Hi,

I'm researching if smallint can be made a higher-class citizen of our type
system than currently.

Does anyone know where to find the discussion refered to here?
http://archives.postgresql.org/pgsql-hackers/2008-10/msg01485.php

I did some searches on the archives but no keywords I search for are
yielding useful results.

-- 
Álvaro Herrera <alvherre@alvh.no-ip.org>


Re: lexing small ints as int2

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I'm researching if smallint can be made a higher-class citizen of our type
> system than currently.

> Does anyone know where to find the discussion refered to here?
> http://archives.postgresql.org/pgsql-hackers/2008-10/msg01485.php

I think this was the last time I tried it:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php

At the time, the main motivation for worrying about it was that cases
like "WHERE smallintcol = 42" couldn't be indexed, because 42 is int4
not int2.  We've since fixed that by allowing cross-type operators
to be indexable.

I also notice that one of the failure cases I cited might no longer be
an issue now that we don't have implicit casts to text, but that change
isn't going to do anything for the other cases.

On the whole I'm still afraid that changing the initial typing of
integer constants is going to break a lot of code while buying not much.
Do you have a specific reason for reopening the issue?  Or is your
concern something different?
        regards, tom lane


Re: lexing small ints as int2

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of vie sep 03 19:36:06 -0400 2010:

> > Does anyone know where to find the discussion refered to here?
> > http://archives.postgresql.org/pgsql-hackers/2008-10/msg01485.php
> 
> I think this was the last time I tried it:
> http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php

Interesting, thanks.  I also had some vague thought about conversion
distance today, thinking it could help solve the problem.

> At the time, the main motivation for worrying about it was that cases
> like "WHERE smallintcol = 42" couldn't be indexed, because 42 is int4
> not int2.  We've since fixed that by allowing cross-type operators
> to be indexable.

Yeah, that's no longer an issue fortunately.

> I also notice that one of the failure cases I cited might no longer be
> an issue now that we don't have implicit casts to text, but that change
> isn't going to do anything for the other cases.

Right.

> On the whole I'm still afraid that changing the initial typing of
> integer constants is going to break a lot of code while buying not much.
> Do you have a specific reason for reopening the issue?  Or is your
> concern something different?

The problem I'm facing is functions declared to take type smallint not
working unless the integer literal has an explicit cast.  Currently the
best answer is simply to avoid using smallint in functions, but this
isn't completely satisfying.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: lexing small ints as int2

From
Robert Haas
Date:
On Fri, Sep 3, 2010 at 9:19 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> The problem I'm facing is functions declared to take type smallint not
> working unless the integer literal has an explicit cast.  Currently the
> best answer is simply to avoid using smallint in functions, but this
> isn't completely satisfying.

Maybe the lexer isn't the right place to fix this.  The problem here
(or so I gather) is that if I say foo(1), then 1 is an integer and
we'll do an "implicit" cast to bigint, real, double precision,
numeric, oid, or reg*, but the cast to smallint is assignment-only.
But I wonder if we shouldn't allow implicit casting anyway when there
is a unique best match.  If the only foo(x) function is foo(smallint)
and the user tries to call foo with one argument, what else can it
mean?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: lexing small ints as int2

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Maybe the lexer isn't the right place to fix this.  The problem here
> (or so I gather) is that if I say foo(1), then 1 is an integer and
> we'll do an "implicit" cast to bigint, real, double precision,
> numeric, oid, or reg*, but the cast to smallint is assignment-only.
> But I wonder if we shouldn't allow implicit casting anyway when there
> is a unique best match.  If the only foo(x) function is foo(smallint)
> and the user tries to call foo with one argument, what else can it
> mean?

Well, the devil is in the details.  A key point here is that as things
stand, there isn't a "unique best match" for this example.  foo(smallint)
isn't an allowable match at all.  We'd have to first define some way of
making it an allowable match, say that assignment casts are an allowed
way of matching to a function's or operator's arguments; and then define
some rules that make sure that that new behavior doesn't break all the
cases that work now.  For instance, if assignment casts are less
desirable than implicit casts or exact matches, how much less desirable?
Is, say, a match involving four exact type matches and one assignment
cast better or worse than one involving one exact match and four
implicit casts?  The rules we have for this now are already pretty
ad-hoc, and I'm afraid they'll get worse fast when there are several
levels of match.

This ties in to the comment Peter made about the "conversion distance"
idea in that 2002 thread: there's no obvious principled way to assign
the distances, and arbitrarily-chosen distances will lead to arbitrary
behaviors.

Anyway, if you think you can come up with something workable, have at
it.  I'm just here to tell you it's not as easy as it looks.
        regards, tom lane


Re: lexing small ints as int2

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of vie sep 03 19:36:06 -0400 2010:
>> On the whole I'm still afraid that changing the initial typing of
>> integer constants is going to break a lot of code while buying not much.
>> Do you have a specific reason for reopening the issue?  Or is your
>> concern something different?

> The problem I'm facing is functions declared to take type smallint not
> working unless the integer literal has an explicit cast.  Currently the
> best answer is simply to avoid using smallint in functions, but this
> isn't completely satisfying.

Agreed, but there's a very small limit to how much I'm willing to break
to fix that, because it seems like just an annoyance rather than any
significant functionality or performance limitation.

I'm not certain whether that 2002 message described all the types of
problems I saw in the regression test failures.  (It might be worth
trying the experiment again, even if that was a complete catalog then.)
But for the sake of argument let's suppose that we just have these
two problems to fix:

1.  Not failing when a smallint constant is used and there are
both int4 and int8 potential matches.

I experimented just now with whether this could be fixed by labeling
int4 as a preferred type.  That turns out to have some bad consequences
though: the regression tests show failures on cases like VALUES (1),(1.2)
because select_common_type thinks it should resolve to int4 rather than
numeric.  With the current semantics of preferred types, it seems to be
a bad idea to label something a preferred type unless everything in its
type category can be implicitly cast to it.  (Which again brings up the
question of whether we need the concept at all...)  We could maybe get
around that by rejiggering the way that the type resolution rules make
use of preferred types, but the further we go the more likely it is
we'll break applications.

2.  Not causing cases like "select 256*256" to throw overflow errors
when they didn't before.

In 2002 I suggested attacking this by dropping the int2 arithmetic
operators.  Today I'd suggest keeping them but making them deliver int4,
so that they can't report overflow.  Either way, there's some risk
of changing the behavior seen by applications that do arithmetic on
int2 columns.  For example, foo(int2a + int2b) will currently call
foo(int2) if it exists, but that would no longer happen.  That might
well be a corner-y enough case to pass as acceptable collateral damage
--- except that if the whole reason for doing this is to eliminate
surprising failures to call functions taking int2, breaking a case
like that that used to work hardly seems like an acceptable tradeoff.

So it all looks pretty messy, and any fix carries considerable risk
of changing behaviors that existing applications depend on.  Without
some new idea(s) that I'm not seeing now, it doesn't seem like it'd
be worth taking the risk.
        regards, tom lane


Re: lexing small ints as int2

From
Robert Haas
Date:
On Sat, Sep 4, 2010 at 12:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Maybe the lexer isn't the right place to fix this.  The problem here
>> (or so I gather) is that if I say foo(1), then 1 is an integer and
>> we'll do an "implicit" cast to bigint, real, double precision,
>> numeric, oid, or reg*, but the cast to smallint is assignment-only.
>> But I wonder if we shouldn't allow implicit casting anyway when there
>> is a unique best match.  If the only foo(x) function is foo(smallint)
>> and the user tries to call foo with one argument, what else can it
>> mean?
>
> Well, the devil is in the details.  A key point here is that as things
> stand, there isn't a "unique best match" for this example.  foo(smallint)
> isn't an allowable match at all.  We'd have to first define some way of
> making it an allowable match, say that assignment casts are an allowed
> way of matching to a function's or operator's arguments; and then define
> some rules that make sure that that new behavior doesn't break all the
> cases that work now.  For instance, if assignment casts are less
> desirable than implicit casts or exact matches, how much less desirable?
> Is, say, a match involving four exact type matches and one assignment
> cast better or worse than one involving one exact match and four
> implicit casts?  The rules we have for this now are already pretty
> ad-hoc, and I'm afraid they'll get worse fast when there are several
> levels of match.

Yeah, that's a problem.   At the same time, I can't help feeling that
we ought to have a goal that if there is only one function with a
particular name that is compatible with the specified number of
arguments, we ought to have a fairly good reason not to call it.

I am not too sure that the distinction between implicit casts and
assignment casts is all that useful; it's not too easy to remember
which casts are of which type, nor is it obvious that the current
catalog entries follow any particularly consistent rule.  One idea I
had was to ditch that distinction and instead generalize
typispreferred to a 1-byte integer, typpreference.  When there is more
than one candidate, we look for one that is strictly better than all
the others, meaning that it is better than any other candidate in at
least one argument position and at least as good in every argument
position.  Better in a given argument position means that either an
exact match rather than not, or a higher typpreference value as among
types in the same category.  Then you could fix the smallint case by
giving all the typcategory-N data types ascending typpreference
values: int2, int4, int8, float4, float8, numeric.  We'll prefer to
cast up rather than down, but we'll cast try to cast down if the
alternative is to fail outright.  If there's no candidate that
dominates all the others, then we complain of ambiguity and give up.

I guess you could also keep the notion of implicit vs. assignment
casts but just make more of them implicit and rely on differing
typpreference values to disambiguate.  One big problem in this area is
that it's pretty easy to change things in a way that's not backward
compatible, so it's not really worth changing anything at all unless
we're pretty confident that we've got it right. The whole pg_cast
catalog looks to me like kind of a mess.  For example, one unintended
consequence of lexing small integers as int2 would be that
foo(0::bool) would start failing.  Why?  Because there's an explicit
cast from int to bool, but not from smallint or bigint.  There's not
much principle there, just expediency.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: lexing small ints as int2

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I am not too sure that the distinction between implicit casts and
> assignment casts is all that useful;

We've been there and done that; it doesn't work.  The current scheme
was invented specifically because a two-way design didn't work.

http://archives.postgresql.org/pgsql-hackers/2002-09/msg00900.php
        regards, tom lane


Re: lexing small ints as int2

From
Robert Haas
Date:
On Sun, Sep 5, 2010 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I am not too sure that the distinction between implicit casts and
>> assignment casts is all that useful;
>
> We've been there and done that; it doesn't work.  The current scheme
> was invented specifically because a two-way design didn't work.
>
> http://archives.postgresql.org/pgsql-hackers/2002-09/msg00900.php

Well, sure, if you remove the distinction between implicit and
assignment casts *without doing anything else*, it's not going to
work.  But that's not what I proposed.

And as Peter said in one of his responses: "Finally, I believe this
paints over the real problems, namely the inadequate and hardcoded
type category preferences and the inadequate handling of numerical
constants.  Both of these issues have had adequate approaches proposed
in the past and would solve this an a number of other issues."  I
agree.  We pride ourselves on having an extensible database product,
but our current type system is fairly hostile to extension.  The
typispreferred stuff works OK for deciding between two types (which is
not coincidentally the number of distinct values that can be
represented by a Boolean column) but after that it breaks down pretty
quickly.  If you're adding specialized types to represent zoo animals
or constellations or six-dimensional polyhedra, it works OK, but if
you try to add addition stringy or numbery things, there are problems.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company