Thread: DISTINCT with NULLs and INT fields

DISTINCT with NULLs and INT fields

From
"George Pavlov"
Date:
What does The SQL Standard say about this one?

create table foo (a varchar, b int);

insert into foo (a, b) select null, null from bar;
-- no problem

insert into foo (a, b) select distinct null, null from bar;
-- ERROR:  column "b" is of type integer but expression is of type text
-- HINT:  You will need to rewrite or cast the expression.

It seems that applying DISTINCT makes the NULL be of type TEXT... I just
tried the exact same thing in Ms. SQL Server 2005 and it works with no
errors. Looks like SQL Server makes the NULLs be INTs, but is probably
better at doing the implicit conversion from INT. Any thoughts on what
the "correct" behavior should be here?

George


Re: DISTINCT with NULLs and INT fields

From
Tom Lane
Date:
"George Pavlov" <gpavlov@mynewplace.com> writes:
> What does The SQL Standard say about this one?

> insert into foo (a, b) select distinct null, null from bar;
> -- ERROR:  column "b" is of type integer but expression is of type text

According to the SQL spec that query is illegal on its face --- the spec
does not allow a bare NULL keyword except in contexts where the datatype
is immediately evident.  You'd have had to writeSELECT DISTINCT CAST(NULL AS varchar), CAST(NULL AS int)
to be fully spec-conformant.

Postgres handles NULL the same way as untyped literals ('foo'), so we
tend to prefer resolving to type text when things are ambiguous.  In
this case the DISTINCT operation forces a type choice (if you don't know
what type the data is, you can hardly decide whether two values are equal
or not) and so by the time the INSERT sees it, it looks like a SELECT
of two text columns.

> Looks like SQL Server makes the NULLs be INTs, but is probably
> better at doing the implicit conversion from INT.

"Better" is in the eye of the beholder.  It surprises me not at all that
Microsoft would be lax about implicit coercions, but that doesn't make
it a good idea to coerce anything to anything else without complaint.
You might as well not have a type system at all, if you're going to
destroy its ability to detect mistakes that way.
        regards, tom lane


Re: DISTINCT with NULLs and INT fields

From
"George Pavlov"
Date:
> "Better" is in the eye of the beholder.

sorry for the value-laden term. "laxer" is more appropriate, of course!
the funny thing is that had they cast the NULLs to TEXT it would have
failed there too (they do not do implicit TEXT to INT).

> It surprises me not  at all that
> Microsoft would be lax about implicit coercions, but that doesn't make
> it a good idea to coerce anything to anything else without complaint.
> You might as well not have a type system at all, if you're going to
> destroy its ability to detect mistakes that way.

indeed! but, wait, doesn't our favorite dbms do some implicit casting
too? continuing with my table foo (a varchar, b int):

test=# delete from foo;
DELETE 2
test=# insert into foo values (4,4);
INSERT 0 1
test=# insert into foo values ('4','4');
INSERT 0 1
test=# insert into foo values ('oh','no');
ERROR:  invalid input syntax for integer: "no"
test=# select * from foo;a | b
---+---4 | 44 | 4
(2 rows)


Re: DISTINCT with NULLs and INT fields

From
Tom Lane
Date:
"George Pavlov" <gpavlov@mynewplace.com> writes:
> indeed! but, wait, doesn't our favorite dbms do some implicit casting
> too? continuing with my table foo (a varchar, b int):

> test=# delete from foo;
> DELETE 2
> test=# insert into foo values (4,4);
> INSERT 0 1
> test=# insert into foo values ('4','4');
> INSERT 0 1
> test=# insert into foo values ('oh','no');
> ERROR:  invalid input syntax for integer: "no"

Sure, but in this example the required type of the value is clear from
immediate context (ie, the INSERT).  This is one of the cases where
the SQL spec requires a bare NULL to be accepted.  (BTW, 'no' and '4'
in this example are *not* values of type text; they are untyped literals
which we eventually resolve as varchar or int.)

The other case you mentioned is one where we are going out on a limb a
bit:
INSERT INTO foo SELECT NULL, NULL;

In this case we allow the SELECT to not force a type choice, so that the
INSERT sees the raw untyped values, but I think that this query is
probably illegal per spec --- I believe the spec requires a SELECT to
deliver well-defined data types.  Too lazy to go look up chapter and
verse at the moment.
        regards, tom lane


Re: DISTINCT with NULLs and INT fields

From
"George Pavlov"
Date:
> Sure, but in this example the required type of the value is clear from
> immediate context (ie, the INSERT).  This is one of the cases where
> the SQL spec requires a bare NULL to be accepted.  (BTW, 'no' and '4'
> in this example are *not* values of type text; they are
> untyped literals which we eventually resolve as varchar or int.)

hmmm... with the risk of boring everyone to tears:

test=# insert into foo values (4::int,4::int);
INSERT 0 1
test=# insert into foo values (4::text,4::text);
ERROR:  column "b" is of type integer but expression is of type text
HINT:  You will need to rewrite or cast the expression.
test=# insert into foo values (cast(4 as int),cast(4 as int));
INSERT 0 1
test=# insert into foo values (cast(4 as text),cast(4 as text));
ERROR:  column "b" is of type integer but expression is of type text
HINT:  You will need to rewrite or cast the expression.

(i actually think it is important to understand how this stuff works and
thanks for explaining!)


Re: DISTINCT with NULLs and INT fields

From
Alvaro Herrera
Date:
George Pavlov wrote:

> test=# insert into foo values (4::int,4::int);
> INSERT 0 1
> test=# insert into foo values (4::text,4::text);
> ERROR:  column "b" is of type integer but expression is of type text
> HINT:  You will need to rewrite or cast the expression.
> test=# insert into foo values (cast(4 as int),cast(4 as int));
> INSERT 0 1
> test=# insert into foo values (cast(4 as text),cast(4 as text));
> ERROR:  column "b" is of type integer but expression is of type text
> HINT:  You will need to rewrite or cast the expression.

The last sentence (and the second one as well -- they are exactly
equivalent) exemplifies that there isn't an implicit typecase from text
to integer.  If you use single quotes instead of an explicit cast, the
literal is assumed to be of type "unknown", which _can_ be casted
automatically to integer.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: DISTINCT with NULLs and INT fields

From
"George Pavlov"
Date:
> > test=# insert into foo values (4::int,4::int);
> > INSERT 0 1

> > test=# insert into foo values (4::text,4::text);
> > ERROR:  column "b" is of type integer but expression is of type text
> > HINT:  You will need to rewrite or cast the expression.

> > test=# insert into foo values (cast(4 as int),cast(4 as int));
> > INSERT 0 1

> > test=# insert into foo values (cast(4 as text),cast(4 as text));
> > ERROR:  column "b" is of type integer but expression is of type text
> > HINT:  You will need to rewrite or cast the expression.
>
> The last sentence (and the second one as well -- they are exactly
> equivalent) exemplifies that there isn't an implicit typecase
> from text
> to integer.  If you use single quotes instead of an explicit cast, the
> literal is assumed to be of type "unknown", which _can_ be casted
> automatically to integer.

yes, #1 is the same as #3, and #2 is the same as #4. i was trying to
provide a counter example to tom's statement that pgsql does not do type
coercions. the interesting thing here is that 4::int gets into a text
field whereas 4::text does not get into an integer field. seems to me
like there is an implicit int-to-text cast (without a symmetrical
text-to-int one) unless 4::int is somehow *still* considered an 'untyped
literal' (4 and '4' seem untyped to me, but 4::int does not)??


Re: DISTINCT with NULLs and INT fields

From
Tom Lane
Date:
"George Pavlov" <gpavlov@mynewplace.com> writes:
> the interesting thing here is that 4::int gets into a text
> field whereas 4::text does not get into an integer field. seems to me
> like there is an implicit int-to-text cast (without a symmetrical
> text-to-int one)

Yeah, there is.  You can easily see the list of implicit casts for
yourself:
select castsource::regtype, casttarget::regtype from pg_castwhere castcontext = 'i' and castsource != casttarget;

There's 90 of them in current CVS tip, and most of 'em are unsurprising
within-type-category casts, such as implicit promotion of int to bigint.
However we have about a dozen implicit casts to text from the numeric
and datetime type categories.  Personally I would dearly love to get
rid of these, because they are accidents waiting to happen (and they
do regularly bite people, see the mail list archives for evidence).
But it seems people expect to be able to do things likenumber || ' string'
without explicitly casting the number to text.
        regards, tom lane