Thread: Strict-typing benefits/costs

Strict-typing benefits/costs

From

Ken Johanson

Date:

14 February 2008, 18:55:39

I continue to hope that 8.4 later (or 8.3 patched) will provide behavior
  / compatibility controls..

I personally have requested others in past, but one outstanding one now,
seems to be an ability to turn back-on implicit type conversion. I'm
struggling to see Mysql, Mssql, Oracle, and others following PG and
turning off their implicit conversion. People will have to choose
whether to re-code (explicit casts) their apps for sake of PG.

Granted, some of the other databases have bugs in their CAST
implementations (see http://bugs.mysql.com/bug.php?id=34562 &
http://bugs.mysql.com/bug.php?id=34564).... bugs which makes writing PG
8.3 portable code arduous or impossible when dealing with legacy table
designs and program which counted on implicit casts. But even when mysql
gets fixed, one will ask: "should I re-code my apps just so they will
work with PG>=8.3 (and test the code on other DBs), or should I only
support PG<8.3?

From
http://www.postgresql.org/docs/8.3/static/release-8-3.html:

"these expressions formerly worked:
   1) substr(current_date, 1, 4)
   2) 23 LIKE '2%'
but will now draw "function does not exist" and "operator does not
exist" errors respectively. Use an explicit cast instead.
   3) current_date < 2017-11-17
"

(the section also prominently cites a non-portable cast syntax)

Questions:

For case 1, regarding type safety: we know use of LIKE (and SUBSTR)
requires *implicit or explicit* conversion to a text type. Why require
that explicitly? Closet answer I can see, is that for (example #1)
text-formed dates in some servers are not iso8601 format. But, even
explicit datetime->text cast has this same hazard -- it doesn't require
a format argument.

For case #2, explicit or implicit again have same risk: that base-10 and
with/out decimal, should be the default format.

For case #3, I see "in the presence of automatic casts both sides were
cast to text", but can that not be fixed? Operand 2 yields an integer,
and integer and date compares should failfast, they are not comparable.
(I believe sql requires delimited iso8601 fields, so both 20080414120000
and '20080414120000' should failfast)

int compare(datetime a, object b)
{
   if (typeof(b=="charsequence")
     return compare(cast(b as datetime), a);
   if (typeof(b)=="date")
     return compare(cast(b as datetime), a);
   if (typeof(b)=="time") //allowed?
     return compare(cast(b as datetime), a);
   throw "cannot compare "+a.getClass()+" and "+b.getClass();
}

WHERE (current_date < 2017-11-17) -> FAIL
WHERE (current_date < CAST(2017-11-17 AS datetime)) -> FAIL
WHERE (current_date < '2017-11-17') -> PASS
WHERE (current_date < CAST('2017-11-17' AS datetime)) -> PASS

Sure, we know stricter typing will help performance, by encouraging
more-correct design. For example it may be prudent to ALWAYS failfast
when trying to join key/indexed table columns of differing types (one
char to another's int). But on non-key where conditions? Or make that a
behavior option. Perhaps An SQLWarning could be set when mismatches
occur, yet make a best effort at autocast.

A char to int key comparison affects performance horridly and indicates
(though not always) a potential mis-design, yet it MAY be safe to
autocast the text to numeric before compare, because the narrowing
conversion will assert that only base-10 chars exist.

So, are there other examples of why, esp for runtime/type safety, we
should force explicit conversion?

Re: Strict-typing benefits/costs

From

Tom Lane

Date:

14 February 2008, 19:34:06

Ken Johanson <pg-user@kensystem.com> writes:
> For case 1, regarding type safety: we know use of LIKE (and SUBSTR)
> requires *implicit or explicit* conversion to a text type. Why require
> that explicitly?

Because it's way too easy to burn yourself with implicit conversions.
Cases comparable to the one mentioned (current_date < 2017-11-17
silently doing something very different than the user expected) have
been cropping up every month or two for *years* --- try trolling the
PG list archives for awhile for examples.  After you've wasted a day or
three chasing a problem like that, or pehaps had your app fail in the
field because of a problem like that, you'll realize that having to
write a few explicit casts is a small price to pay for not having such
booby-traps in the system.

As for the lack of a compatibility switch, we would probably have
provided one if we could do so reasonably; but a large part of the
change consisted of altering the initial contents of pg_cast and some
other system catalogs.  There isn't any good way to flip that on and
off.

            regards, tom lane

Re: Strict-typing benefits/costs

From

Jeff Davis

Date:

14 February 2008, 20:07:53

On Thu, 2008-02-14 at 15:55 -0700, Ken Johanson wrote:
> Granted, some of the other databases have bugs in their CAST
> implementations (see http://bugs.mysql.com/bug.php?id=34562 &
> http://bugs.mysql.com/bug.php?id=34564).... bugs which makes writing PG
> 8.3 portable code arduous or impossible when dealing with legacy table
> designs and program which counted on implicit casts. But even when mysql
> gets fixed, one will ask: "should I re-code my apps just so they will
> work with PG>=8.3 (and test the code on other DBs), or should I only
> support PG<8.3?

If postgresql were to revert to 8.2 implicit casting behavior, would
that actually improve compatibility with other DBMSs? Every DBMS
probably has it's own rules for implicit casting, different from every
other DBMS.

So are you sure it wouldn't just introduce more compatibility problems
somewhere else? Or worse, it could hide the problems during
migration/testing, and they could surface after you put it into
production.

> From
> http://www.postgresql.org/docs/8.3/static/release-8-3.html:
>
> "these expressions formerly worked:
>    1) substr(current_date, 1, 4)
>    2) 23 LIKE '2%'
> but will now draw "function does not exist" and "operator does not
> exist" errors respectively. Use an explicit cast instead.
>    3) current_date < 2017-11-17
> "
>
> (the section also prominently cites a non-portable cast syntax)
>
> Questions:
>
> For case 1, regarding type safety: we know use of LIKE (and SUBSTR)
> requires *implicit or explicit* conversion to a text type. Why require

Not this substr() function:

create function substr(date, int, int) returns text as $$ begin return
'foo'; end; $$ language plpgsql;

Are you saying we make special cases for all of the "obvious" functions,
and they should behave differently from user-defined functions?

> For case #3, I see "in the presence of automatic casts both sides were
> cast to text", but can that not be fixed? Operand 2 yields an integer,
> and integer and date compares should failfast, they are not comparable.
> (I believe sql requires delimited iso8601 fields, so both 20080414120000
> and '20080414120000' should failfast)
>
> int compare(datetime a, object b)
> {
>    if (typeof(b=="charsequence")
>      return compare(cast(b as datetime), a);
>    if (typeof(b)=="date")
>      return compare(cast(b as datetime), a);
>    if (typeof(b)=="time") //allowed?
>      return compare(cast(b as datetime), a);
>    throw "cannot compare "+a.getClass()+" and "+b.getClass();
> }
>
> WHERE (current_date < 2017-11-17) -> FAIL
> WHERE (current_date < CAST(2017-11-17 AS datetime)) -> FAIL
> WHERE (current_date < '2017-11-17') -> PASS
> WHERE (current_date < CAST('2017-11-17' AS datetime)) -> PASS
>
> Sure, we know stricter typing will help performance, by encouraging

I don't think performance was the goal of removing implicit type casts.

> more-correct design. For example it may be prudent to ALWAYS failfast
> when trying to join key/indexed table columns of differing types (one
> char to another's int). But on non-key where conditions? Or make that a
> behavior option. Perhaps An SQLWarning could be set when mismatches
> occur, yet make a best effort at autocast.

A few comments:
* Keys should not behave differently from non-keys in comparisons.
* I think the distinction you're trying to make is the casting of
literals versus the casting of variables. All of the examples you gave
involved literals.
* If it's making its "best effort" to autocast, and it fails, how could
it merely issue a warning? If it can't find a match it needs to error,
because there's nothing it can do to continue even if we wanted it to.
* It can almost always find a match, the question is whether it is the
match that the user intended or not.

Regards,
    Jeff Davis

Re: Strict-typing benefits/costs

From

Tom Lane

Date:

15 February 2008, 01:57:21

Jeff Davis <pgsql@j-davis.com> writes:
> Are you saying we make special cases for all of the "obvious" functions,
> and they should behave differently from user-defined functions?

It's possible to special-case any particular function you really feel
you need this behavior for.  We did special-case || (the string
concatenation operator), and there was some discussion of also putting
in a built-in special case for LIKE, but we desisted from sliding any
further down that slippery slope.  Since it's possible for users to
install such hacks for themselves, as in the example here,
http://archives.postgresql.org/pgsql-general/2007-11/msg00538.php
there didn't seem to be a convincing case for weakening the type
checking for everybody.

            regards, tom lane

Re: Strict-typing benefits/costs

From

Ken Johanson

Date:

15 February 2008, 19:08:26

Jeff Davis wrote:
> If postgresql were to revert to 8.2 implicit casting behavior, would
> that actually improve compatibility with other DBMSs? Every DBMS
> probably has it's own rules for implicit casting, different from every
> other DBMS.
>
> So are you sure it wouldn't just introduce more compatibility problems
> somewhere else? Or worse, it could hide the problems during
> migration/testing, and they could surface after you put it into
> production.

In my opinion the autocasting behaviors of the database are probably
more consistent and fined tuned, than their explicit cast function. Or
in the least, they may actually *save* mistakes that (lay) programmers
would make (by adding casts for the sake of PG).

Re: Strict-typing benefits/costs

From

Ken Johanson

Date:

15 February 2008, 19:11:23

Tom Lane wrote:
>
> It's possible to special-case any particular function you really feel
> you need this behavior for.  We did special-case || (the string
> concatenation operator), and there was some discussion of also putting
> in a built-in special case for LIKE, but we desisted from sliding any
> further down that slippery slope.  Since it's possible for users to
> install such hacks for themselves, as in the example here,
> http://archives.postgresql.org/pgsql-general/2007-11/msg00538.php
> there didn't seem to be a convincing case for weakening the type
> checking for everybody.
>

Tom, is it accurate to assume that newer PG versions will further
tighten type-strictness (say, '2008-01-01' presently being comparable to
a datetime)? Also, do you know of any other vendors that are heading in
this direction (removing by default the autocasts)?

Re: Strict-typing benefits/costs

From

Michael Glaesemann

Date:

16 February 2008, 11:07:18

On Feb 15, 2008, at 18:11 , Ken Johanson wrote:

> Tom, is it accurate to assume that newer PG versions will further
> tighten type-strictness (say, '2008-01-01' presently being
> comparable to a datetime)? Also, do you know of any other vendors
> that are heading in this direction (removing by default the
> autocasts)?

'2008-01-01' does not indicate some kind of string: it's just an
untyped literal. Postgres will determine its type in context. For
example:

test=# select 'today is ' || '2008-02-16';
       ?column?
---------------------
  today is 2008-02-16
(1 row)

|| is a string concatenation operator, so '2008-02-16' is treated as
a string: it's not cast from date to text: from Postgres' eyes, it
never was a date.

test=# select '2008-02-15' > CURRENT_DATE;
  ?column?
----------
  f
(1 row)

Here, we're comparing against a date type, so Postgres treats
'2008-02-15' as a date.

Michael Glaesemann
grzm seespotcode net

Re: Strict-typing benefits/costs

From

Tom Lane

Date:

16 February 2008, 12:01:24

Michael Glaesemann <grzm@seespotcode.net> writes:
> test=# select '2008-02-15' > CURRENT_DATE;

> Here, we're comparing against a date type, so Postgres treats
> '2008-02-15' as a date.

It might be worth pointing out that this is not magic,
but an application of the general rule mentioned at step 2a here:
http://www.postgresql.org/docs/8.3/static/typeconv-oper.html

In any case where Postgres is trying to interpret a binary operator,
and one input has a known type while the other is an unknown-type
literal constant, the preferred interpretation will be that the
constant has the same type as the known-type input.

In a case like 'today is ' || '2008-02-16', *both* inputs are
initially unknown-type literals.  There is a fallback heuristic
that prefers to resolve such cases as type text, which is why
you get text concatenation rather than a "couldn't resolve
operator" error.

In no case does Postgres look at the content of an unknown literal
to determine its type.  '2008-02-16' is not treated differently
from 'foobar'.

            regards, tom lane

Re: Strict-typing benefits/costs

From

Ken Johanson

Date:

16 February 2008, 12:18:55

Michael Glaesemann wrote:
>
> On Feb 15, 2008, at 18:11 , Ken Johanson wrote:
>
>> Tom, is it accurate to assume that newer PG versions will further
>> tighten type-strictness (say, '2008-01-01' presently being comparable
>> to a datetime)? Also, do you know of any other vendors that are
>> heading in this direction (removing by default the autocasts)?
>
> '2008-01-01' does not indicate some kind of string: it's just an untyped
> literal. Postgres will determine its type in context.

Exactly, it is performing a context based auto conversion, what some
will call a cast.

select 5<'6' -> true
select 5>'6' -> false
select 15<'60' -> true
select 15>'60' -> false

So one can argue that is is convenient, and safe, to perform the same
implicit/auto conversion for many functions which no longer do that. And
that even if looses-typing / auto cast it allows/encourages bad design,
that does not mean that the all designs cases will be bad. Some users
prefer convenience over type safety, and some of those same users *will*
produce error free code.

On the other hand, should we go the extra mile and failfast when
comparing 5 and '6'? No, because there is clearly only one appropriate
conversion path (cast string to numeric) for the context. Or, some might
argue we should not allow that comparison.

select position('5' in 5)
select position('.' in 5.1)
select position('2008' in current_timestamp)

Numbers and datetime in sql have exactly prescribed standard char
representations (even if others dbs don't use them for datetimes). So
one can argue implicit conversion to char IS safe for these types and
any char-consuming functions.

Re: Strict-typing benefits/costs

From

Tom Lane

Date:

16 February 2008, 12:47:24

Ken Johanson <pg-user@kensystem.com> writes:
> select 5<'6' -> true
> select 5>'6' -> false
> select 15<'60' -> true
> select 15>'60' -> false

These examples miss the point, because they'd give the same answer
whether you think the values are text or integer.  Consider instead
these cases:

regression=# select 7 > '60';             -- int > int
 ?column?
----------
 f
(1 row)

regression=# select '7' > '60';           -- text > text
 ?column?
----------
 t
(1 row)

regression=# select 7 > '08';             -- int > int
 ?column?
----------
 f
(1 row)

regression=# select '7' > '08';           -- text > text
 ?column?
----------
 t
(1 row)

All of a sudden it seems much more important to be clear about
what data type is involved, no?

> Numbers and datetime in sql have exactly prescribed standard char
> representations (even if others dbs don't use them for datetimes).

See the datestyle parameter before you maintain that Postgres
should assume that.

            regards, tom lane

Re: Strict-typing benefits/costs

From

Ken Johanson

Date:

16 February 2008, 13:04:47

Tom Lane wrote:
> Ken Johanson <pg-user@kensystem.com> writes:
>> select 5<'6' -> true
>> select 5>'6' -> false
>> select 15<'60' -> true
>> select 15>'60' -> false
>
> These examples miss the point, because they'd give the same answer
> whether you think the values are text or integer.  Consider instead
> these cases:
>
> regression=# select 7 > '60';             -- int > int
>  ?column?
> ----------
>  f
> (1 row)
>
> regression=# select '7' > '60';           -- text > text
>  ?column?
> ----------
>  t
> (1 row)
>
> regression=# select 7 > '08';             -- int > int
>  ?column?
> ----------
>  f
> (1 row)
>
> regression=# select '7' > '08';           -- text > text
>  ?column?
> ----------
>  t
> (1 row)
>
> All of a sudden it seems much more important to be clear about
> what data type is involved, no?

Agreed, so should we disallow 7 > '08'? Because that is (tell me if you
disagree), much more hazardous than allowing, say TRIM(7) or
POSITION('7' IN 7). Or for non-failfast comparison of two columns of
dissimilar types (say bigint, integer, real, char).

select 'ba'>'ab' -> true
select 'ab'>'ba' -> false
select '0.5'=.5 -> true (is char comparator or numeric to laymen?)
select '7a'<'070' -> true (is char comparator or numeric to laymen?)
select '7a'<70 -> failfast, good.

>
>> Numbers and datetime in sql have exactly prescribed standard char
>> representations (even if others dbs don't use them for datetimes).
>
> See the datestyle parameter before you maintain that Postgres
> should assume that.
>
>

I agree. Unless the date style is know to always be iso8601, which is
not true owed to datestyle. Unless sql spec allows for it, could this be
an argument for removing the datestyle implict (non-iso8601) feature?

Re: Strict-typing benefits/costs

From

Tom Lane

Date:

16 February 2008, 22:40:46

Ken Johanson <pg-user@kensystem.com> writes:
> Tom Lane wrote:
>> These examples miss the point, because they'd give the same answer
>> whether you think the values are text or integer. ...

> Agreed, so should we disallow 7 > '08'?

Maybe, but the usability ramifications would be enormous --- you'd
also be talking about breaking sale_date > '2008-01-20' and other
cases where "leave off the quotes" is not an available option.

In practice the current handling of one-known-and-one-unknown-input
has not seemed to cause problems for people; it almost always does
what they're expecting it to do.  It's the cases where the values are
of *known and incompatible* types that silently casting to make them
match has proven to be dangerous.

>>> Numbers and datetime in sql have exactly prescribed standard char
>>> representations (even if others dbs don't use them for datetimes).
>>
>> See the datestyle parameter before you maintain that Postgres
>> should assume that.

> I agree. Unless the date style is know to always be iso8601, which is
> not true owed to datestyle. Unless sql spec allows for it, could this be
> an argument for removing the datestyle implict (non-iso8601) feature?

I doubt we'll be doing that.  It would not fix the problem anyway,
since now that I think about it, your argument doesn't hold for
numbers either.  Consider

regression=# select 2.0 / 3.0;
        ?column?
------------------------
 0.66666666666666666667
(1 row)

regression=# select 10 ^ 14;
    ?column?
-----------------
 100000000000000
(1 row)

regression=# select 10 ^ 15;
 ?column?
----------
    1e+15
(1 row)

regression=# select 1.230e15;
     ?column?
------------------
 1230000000000000
(1 row)

There are a large number of details here that are not prescribed by the
SQL standard (much less actually standardized across systems).  They
make little or no difference in the numeric domain, but they sure do as
soon as you start doing textual operations.

            regards, tom lane