Thread: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail

The following bug has been logged online:

Bug reference:      5490
Logged by:          Farid Zidan
Email address:      farid@zidsoft.com
PostgreSQL version: 8.4.1
Operating system:   Windows XP 32-bit
Description:        Using distinct for select list causes insert of
timestamp string literal to fail
Details:

If you use keyword 'distinct' for the source select of the insert statement
the insert fails. Insert succeeds if 'distinct' is not used in select list.
Example,

create table test_insert (
col1 char(8) not null,
col2 TIMESTAMP not null default CURRENT_TIMESTAMP);

insert into
test_insert
(col1, col2) values
('a', '2010-04-30 00:00:00');

insert into test_insert
(col1, col2)
select distinct
'b',
'2010-04-30 00:00:00'


ERROR:  column "col2" is of type timestamp without time zone but expression
is of type text
LINE 16: '2010-04-30 00:00:00'
         ^
HINT:  You will need to rewrite or cast the expression.

********** Error **********

ERROR: column "col2" is of type timestamp without time zone but expression
is of type text
SQL state: 42804
Hint: You will need to rewrite or cast the expression.
Character: 282
"Farid Zidan" <farid@zidsoft.com> writes:
> If you use keyword 'distinct' for the source select of the insert statement
> the insert fails. Insert succeeds if 'distinct' is not used in select list.

This isn't a bug, it's a consequence of the fact that you're not
specifying the types of the literal constants.  DISTINCT forces
the parser to assign a data type to the constants (otherwise there
is no way to understand what duplicate-elimination means) and what
it will fall back to is "text".  Try attaching an explicit cast,
eg
    '2010-04-30 00:00:00'::timestamp

            regards, tom lane
"Farid Zidan" <farid@zidsoft.com> wrote:

> insert into test_insert
> (col1, col2)
> select distinct
> 'b',
> '2010-04-30 00:00:00'
>
>
> ERROR:  column "col2" is of type timestamp without time zone but
> expression is of type text
> LINE 16: '2010-04-30 00:00:00'
>          ^
> HINT:  You will need to rewrite or cast the expression.

Try using a timestamp literal instead of a bare literal:

insert into test_insert
(col1, col2)
select distinct
'b',
timestamp '2010-04-30 00:00:00'

This is actually working as intended in all the cases you showed, so
it isn't a bug.  If we were strictly complying with the SQL
standard, your first example would also fail, but we are more
lenient than the standard where we can be, to allow an unadorned
literal to be an UNKNOWN type until something causes it to be
resolved, to allow people to omit the type decoration in many cases.
To determine that something is a distinct value, you have to
determine a type for it (otherwise you won't know if '2010-04-30
00:00:00' is the same as '2010-04-30 00:00:00.0', for example), so
if you don't tell it otherwise, it will assume text -- leading to
the behavior you saw.

-Kevin
>Farid Zidan <farid@zidsoft.com> wrote:

>> If we were strictly complying with the SQL standard,

> Considering the statement works in all the 9 DBMS systems+ that I
> have tested so far as mentioned above, I would say PostgreSQL is
> not compliant with SQL standard in this regard.

The SQL standard is a document published by the International
Standards Organization (ISO) and also adopted by the American
National Standards Institute (ANSI).  Those documents don't require
a query in either of the forms you presented to work.  Because of
the convenience factor, most database products have non-standard
extensions to omit type specification in some places.  PostgreSQL's
extensions are oriented more toward user-installable data types
(such as geometric shapes or global coordinates), so the particulars
of our non-standard extensions differ so that use of those features
is as easy as practicable.  That does result in some non-standard
extensions which work in other products not working in PostgreSQL.

I think you'll find that the syntax I suggested (using the standard
timestamp literal instead of a bare character string literal) will
work in all of the databases you mentioned; if you want portable
code, it is best to follow the standard rather than some inferred
popular convention.

I hope this helps.

-Kevin
>Farid Zidan <farid@zidsoft.com> wrote:

> can be eliminated by appropriately handling the distinct keyword
> and does not have to occur.

Based on previous discussions around our approaching data types, I
don't think any of the regular PostgreSQL developers are likely to
agree with you; but if you see a way to make it work, feel free to
submit a patch.  See this page for the process:

http://wiki.postgresql.org/wiki/Submitting_a_Patch

> The ISO-datetime string literal format I am using the most
> general/standard for datetime/timestamp and is not the issue here.

The format in your string literal is the portable one; however, a
timestamp literal requires the TIMESTAMP keyword ahead of the string
literal, which you have chosen to omit.  Did you try the query with
a proper timestamp literal, as I suggested, against all these
databases?  If using standard syntax works, why not use it?

> The 'distinct' keyword is causing the error.

No, non-standard syntax is causing the error in the case of
DISTINCT, because our extension to the standard does not cover that
case, even though it covers the other.  There are good reasons for
that, which you'll probably discover in short order if you work on a
patch for the issue.

-Kevin
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">



On 6/4/2010 11:53 AM, Tom Lane wrote:

  DISTINCT forces
the parser to assign a data type to the constants (otherwise there
is no way to understand what duplicate-elimination means) and what
it will fall back to is "text"

I am including the column list for the insert, so parser knows col2
data type is TIMESTAMP and it has to convert from text to timestamp to
do the insert.

It should be able to do that without generating an error. It is the
same select list, the same data types, nothing has changed except using
the 'distinct' keyword to eliminate duplicates. The parse behavior
after duplicates have been eliminated should be the same as when
'distinct' is not used.

Whether 'distinct' is used or not should not affect the semantics of
the insert statement (it should only remove duplicate rows).

I have used this statement in Firebrid, MS SQL Server, Oracle, MySQL,
SQLAnywhere, DB2, Derby, Informix, etc, and all of them do not generate
an error
because I need to use 'distinct' to eliminate duplicates from being
inserted.


  If we were strictly complying with the SQL
standard,

Considering the statement works in all the 9 DBMS systems+ that I have
tested so far as mentioned above, I would say PostgreSQL is not
compliant with SQL standard in this regard.

I guess, what I am saying, is that what the parser is doing is not the
desired behavior. I understand there are technical things going on
behind
the scene, but that's what needs to be fixed to ensure PostgreSQL
compatibility with SQL standard and interoperability with generic sql
statements.

best regards,
Farid

On 6/4/2010 11:57 AM, Kevin Grittner wrote:
<blockquote cite="mid:4C08DC140200002500031F7C@gw.wicourts.gov"
 type="cite">
  "Farid Zidan" <farid@zidsoft.com> wrote:



    insert into test_insert
(col1, col2)
select distinct
'b',
'2010-04-30 00:00:00'


ERROR:  column "col2" is of type timestamp without time zone but
expression is of type text
LINE 16: '2010-04-30 00:00:00'
         ^
HINT:  You will need to rewrite or cast the expression.



Try using a timestamp literal instead of a bare literal:

insert into test_insert
(col1, col2)
select distinct
'b',
timestamp '2010-04-30 00:00:00'

This is actually working as intended in all the cases you showed, so
it isn't a bug.  If we were strictly complying with the SQL
standard, your first example would also fail, but we are more
lenient than the standard where we can be, to allow an unadorned
literal to be an UNKNOWN type until something causes it to be
resolved, to allow people to omit the type decoration in many cases.
To determine that something is a distinct value, you have to
determine a type for it (otherwise you won't know if '2010-04-30
00:00:00' is the same as '2010-04-30 00:00:00.0', for example), so
if you don't tell it otherwise, it will assume text -- leading to
the behavior you saw.

-Kevin





--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">


Hello Kevin,

My bug report is about using 'distinct' in the select list which is
causing a side-effect. That's why I classify this as a bug. Distinct
should not have unintended side-effects.

This side-effect is implementation-dependent and is manifested in the
current PostgreSQL query processing but can be eliminated by
appropriately handling the distinct keyword and does not have to occur.


The ISO-datetime string literal format I am using the most
general/standard for datetime/timestamp and is not the issue here. The
'distinct' keyword is causing the error.

Farid

On 6/4/2010 12:52 PM, Kevin Grittner wrote:
<blockquote cite="mid:4C08E8F10200002500031FAD@gw.wicourts.gov"
 type="cite">

    Farid Zidan <farid@zidsoft.com> wrote:






      If we were strictly complying with the SQL standard,






    Considering the statement works in all the 9 DBMS systems+ that I
have tested so far as mentioned above, I would say PostgreSQL is
not compliant with SQL standard in this regard.



The SQL standard is a document published by the International
Standards Organization (ISO) and also adopted by the American
National Standards Institute (ANSI).  Those documents don't require
a query in either of the forms you presented to work.  Because of
the convenience factor, most database products have non-standard
extensions to omit type specification in some places.  PostgreSQL's
extensions are oriented more toward user-installable data types
(such as geometric shapes or global coordinates), so the particulars
of our non-standard extensions differ so that use of those features
is as easy as practicable.  That does result in some non-standard
extensions which work in other products not working in PostgreSQL.

I think you'll find that the syntax I suggested (using the standard
timestamp literal instead of a bare character string literal) will
work in all of the databases you mentioned; if you want portable
code, it is best to follow the standard rather than some inferred
popular convention.

I hope this helps.

-Kevin





--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">



Hello Kevin,

I strongly disagree with your analysis of this issue. Like I said, this
syntax works with 9 different databases, so obviously whatever
PosgreSQL query procesor is doing in this case is not the desired
behavior.

To ensure PosgreSQL success, the query processor must behave in a
compliant manner with established standards whether those standards are
set by SQL ISO specs or are de facto standards.

It is too much asking developers to change their sql to overcome
implementation-dependent side-effects of PostgreSQL query processor. If
a simple SQL statement works on 9+ different databases, then it should
also work in PostreSQL with no need for developers to special-code for
PostgreSQL. Very basic feature is converting a string literal to a
datetime/timestamp value and developers should not do any
special coding to accomplish this simple conversion. '2010-04-30
00:00:00' should convert to timestamp in PostgreSQL with no other flags
or syntax
decoration (it already does except when 'distinct' is used).

Compatibility is very high on desired features for a DBMS and is a
requirement for smooth porting of applications from other databases to
PostreSQL and cross-dbms applications. It really boils down to making
it work, technical details are what developers love and I am sure
PostgreSQL developers can make this simple sql insert work on PostreSQL
just like all the other developers have done for the other DBMSs.

Anyway, I have reported this issue because I encountered it and it
negatively impacts my project. I don't
expect it to be fixed right now, that's something that PostgreSQL
developers can debate and prioritize. I only ask that this issue is
identified, since it does not work in my case when the target dbms is
PostgreSQL and I am sure it can impact other developers projects and it
would need to be addressed at
some point in the future with a solution where it just work like it
does in all the other DBMSs.

Farid


On 6/4/2010 1:36 PM, Kevin Grittner wrote:
<blockquote cite="mid:4C08F33E0200002500031FBA@gw.wicourts.gov"
 type="cite">

    Farid Zidan <farid@zidsoft.com> wrote:





    can be eliminated by appropriately handling the distinct keyword
and does not have to occur.



Based on previous discussions around our approaching data types, I
don't think any of the regular PostgreSQL developers are likely to
agree with you; but if you see a way to make it work, feel free to
submit a patch.  See this page for the process:

http://wiki.postgresql.org/wiki/Submitting_a_Patch



    The ISO-datetime string literal format I am using the most
general/standard for datetime/timestamp and is not the issue here.



The format in your string literal is the portable one; however, a
timestamp literal requires the TIMESTAMP keyword ahead of the string
literal, which you have chosen to omit.  Did you try the query with
a proper timestamp literal, as I suggested, against all these
databases?  If using standard syntax works, why not use it?



    The 'distinct' keyword is causing the error.



No, non-standard syntax is causing the error in the case of
DISTINCT, because our extension to the standard does not cover that
case, even though it covers the other.  There are good reasons for
that, which you'll probably discover in short order if you work on a
patch for the issue.

-Kevin





--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
On Fri, Jun 4, 2010 at 7:18 PM, Farid Zidan <farid@zidsoft.com> wrote:
> If a simple SQL statement works on 9+ different databases

For what it's worth are you sure it works as you expect in these other
databases?

I suspect what they're doing is doing a DISTINCT of the text values
and then converting the results. That's not the same as what you're
suggesting it do (insert distinct timestamp values) since different
text values can represent the same timestamp. For example look at what
this does:

 select cast(x as timestamp with time zone) from (select distinct x
from (values ('2010-01-01 12:00PM UTC'), ('2010-01-01 7:00AM EST')) as
x(x)) as y;

If you inserted those values into a table with a timestamp with time
zone column you would get duplicate values even with the distinct.

This is the problem with depending on non-standard extensions. You're
never really sure that they're working. They be working on some
systems but doing something unexpected on other systems.

--
greg
>Farid Zidan <farid@zidsoft.com> wrote:

> but when it gets to use the resultset of the subquery in the
> insert it "forgets" how to convert '2010-04-30 00:00:00' to
> timestamp value

Not really.  In versions prior to 8.3 it did automagically convert
like that.  PostgreSQL has some pretty fancy features involving
custom data types where this magic caused problems, so a deliberate
decision was taken to no longer provide automatic casts from text to
other data types.

> (but forgets only when 'distinct' is used in the subquery!)

That is because (as I tried to explain earlier, but apparently
didn't do a good job of communicating), an unadorned literal in
single quotes is *not* taken to be a character string in PostgreSQL.
Its type is held as "unknown" until it is forced to be resolved in
some operation.  This allows easier coding of custom data types, but
does create a few deviations from standard behavior in corner cases,
and breaks from the non-standard "conventional" behavior of many
other databases.  Because of this design choice, for example, the
FAA can more easily write the code they use to map their runways and
other airport facilities.

The cost is that in situations such as you describe, you need to
force the type before it is used in the comparisons necessary to
determine a distinct value.  The only way to get the behavior you
want without breaking a great many useful cases, would be to
determine where the result was going to be later used, and use that
information to force the type to something other than text (the
default, when no other information is available).  That would be a
*major* and destabilizing change.

For those reasons, the chance of getting *anybody* here to consider
this a bug are close to nil.  The choice to more conveniently handle
advanced cases at the expense of occasionally needing to specify a
type is unlikely to be reversed, to put it mildly.

I can't help but wonder why you resist using the standard syntax.
The reason the standard exists is to help those trying to write
portable code, so they don't have to count on the vagaries of
"parallel evolution."

-Kevin
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">


Hello Greg,


  I suspect what they're doing is doing a DISTINCT of the text values
and then converting the results. That's not the same as what you're
suggesting it do (insert distinct timestamp values) since different
text values can represent the same timestamp. For example look at what


That's a good point. I think you are correct. When the query parser is
in the nested subselect it only sees string literals for the timestamp
column values (does not know it is a timestamp yet). However, when it
gets to do the insert it then must convert the string literals to
timestamp values because at that point it knows that the string literal
is to be inserted into a timestamp column.

Since I am using a constant string literal for the timestamp
it really does not matter when the conversion takes place.

select distinct
<col1>,
<col2>,
..
'2010-04-30 00:00:00'
from
....

the timestamp string literal is a constant and really does not
affect the distinct resultset in anyway. I do need to stamp all the
inserts with a specific timestamp value and that's why I am using a
constant string literal. If I used an expression such as
current_timestamp/(ODBC {fn now()}, then that would factor into the
distinct clause and pollute the distinctness of subquery reulsultset.

Here is actual statements I am running and like I said they work for
all 9+ DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax
for the current user ID):
insert into
in_sync_node_toolbar
(node_no, sync_cd, toolbar_cd,
ctrl_ins_dtm, ctrl_upd_dtm, ctrl_usr_id)
select distinct
isnr.node_no,
case
  when isr.rs_type_cd = 'TABLELS' then 'CMPTS'
  when isr.rs_type_cd = 'PROCLS' then 'CMPPROCS'
  when isr.rs_type_cd = 'SEQLS' then 'CMPSEQS'
  else null
end,
'TBCSCPT1',
'2010-04-30 00:00:00',
'2010-04-30 00:00:00',
{fn user() }
from
in_sync_node_resultset isnr,
in_sync_object_pattern isop,
in_sync_resultset isr
where
(isnr.rs_oid = isr.rs_oid or
 isnr.rs_oid_other = isr.rs_oid) and
isr.rs_oid = isop.rs_oid and
isr.rs_type_cd in ('TABLELS', 'PROCLS', 'SEQLS');

insert into
in_sync_node_toolbar
(node_no, sync_cd, toolbar_cd,
ctrl_ins_dtm, ctrl_upd_dtm, ctrl_usr_id)
select distinct
isnr.node_no,
case
  when isr.rs_type_cd = 'TBLVIEW' then 'CMPTABLE'
  when isr.rs_type_cd = 'PROC' then 'CMPPROC'
  when isr.rs_type_cd = 'SEQ' then 'CMPSEQ'
  else null
end,
'TBCSCPT1',
'2010-04-30 00:00:00',
'2010-04-30 00:00:00',
{fn user() }
from
in_sync_node_resultset isnr,
in_sync_object iso,
in_sync_resultset isr
where
(isnr.rs_oid = isr.rs_oid or
 isnr.rs_oid_other = isr.rs_oid) and
isr.rs_oid = iso.rs_oid and
isr.rs_type_cd in ('TBLVIEW', 'PROC', 'SEQ');

This is the problem with depending on non-standard extensions. You're
never really sure that they're working. They be working on some
systems but doing something unexpected on other systems.

All the other DBMSs doing is a select distinct on the subquery that has
the constant timestamp string literals. There is nothing
non-standard or ambiguous there. As far as the DBMS is concerned the
constant string expression is just a string literal and can represent
anything.

Now the issue is that when the other DBMSs get to do the insert part
they are able, as one would expect, to convert the subquery resultset
string literal column to a timestamp column. I think PostreSQL is doing
the first part (subquery with distinct clause correctly), but when it
gets to use the resultset of the subquery in the insert it "forgets"
how to convert
'2010-04-30 00:00:00' to timestamp value (but forgets only when
'distinct' is used in the subquery!)

Farid

On 6/4/2010 4:18 PM, Greg Stark wrote:
<blockquote
 cite="mid:AANLkTilLvx4m4TlHxeFERL60Xubiz0IhTjUXqTkoH6Le@mail.gmail.com"
 type="cite">
  On Fri, Jun 4, 2010 at 7:18 PM, Farid Zidan <farid@zidsoft.com> wrote:


    If a simple SQL statement works on 9+ different databases



For what it's worth are you sure it works as you expect in these other
databases?

I suspect what they're doing is doing a DISTINCT of the text values
and then converting the results. That's not the same as what you're
suggesting it do (insert distinct timestamp values) since different
text values can represent the same timestamp. For example look at what
this does:

 select cast(x as timestamp with time zone) from (select distinct x
from (values ('2010-01-01 12:00PM UTC'), ('2010-01-01 7:00AM EST')) as
x(x)) as y;

If you inserted those values into a table with a timestamp with time
zone column you would get duplicate values even with the distinct.

This is the problem with depending on non-standard extensions. You're
never really sure that they're working. They be working on some
systems but doing something unexpected on other systems.




--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">


Hello Kevin,


  I can't help but wonder why you resist using the standard syntax.


I am using the standard syntax. Single quote in sql denotes a string.
so '2010-04-30 00:00:00' is string literal. That's universal. Now you
want me to use PG-specific timestamps and that's like I said is not
standard/cross-dbms.

I have just finished testing with Ingre 9.2 and it works there too.
That's 10 DBMSs systems that use single quotes to denote a string
literal and can covert ISO-standard datetime string literal to
timestamp.

You can't not interpret string literals one way in one statement and
just because user uses the word 'distinct' decide to switch paradigms.
That's not good design or planning. Of course you can decide to do
whatever you want, just do not expect developers to start
special-coding just for PostreSQL because you decide to cast correctly
or not correctly depending on whim.

Let me reiterate the example, maybe it was too terse and you did not
read it carefully,

create table test_insert (
col1 char(8) not null,
col2 TIMESTAMP not null default CURRENT_TIMESTAMP);

>create the test table. No issue.

insert into
test_insert
(col1, col2) values
('a', '2010-04-30 00:00:00');

>Works like expected, PG correctly converts standard ISO-datetime
string literal to timestamp. No issue.

insert into test_insert
(col1, col2)
select
'b',
'2010-04-30 00:00:00'

>That works too. No issue.

insert into test_insert
(col1, col2)
select distinct
'b',
'2010-04-30 00:00:00'

>Does not work. That's a bug.

Now this not rocket science, it's simple insert statement where we do
not want duplicates inserted. Works on 10 other DBMSs.

FAA stuff and other is not related to this bug. I would think the FAA
and other organizations want a standard-compliant DBMS system that
knows how to convert a simple ISO-formatted valid string literal to a
timestamp value in more than one variation of sql statement.

You can ignore this bug report and do whatever you want, just do not
say this is an accepted, standard or desired behavior of the server or
is by design. It's not by design that the error happens it is by faulty
handling of the distinct keyword.

I think you have all the information you need to debate and resolve
this issue. If you need any other information you can contact me and I
will be happy to oblige.

Farid


On 6/4/2010 5:40 PM, Kevin Grittner wrote:
<blockquote cite="mid:4C092C650200002500032027@gw.wicourts.gov"
 type="cite">
  I can't help but wonder why you resist using the standard syntax.
The reason the standard exists is to help those trying to write
portable code, so they don't have to count on the vagaries of
"parallel evolution."



--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
On Fri, 4 Jun 2010, Farid Zidan wrote:

> Here is actual statements I am running and like I said they work for all 9+
> DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the
> current user ID):
>
> '2010-04-30 00:00:00',
> '2010-04-30 00:00:00',
> {fn user() }
>

If you're into using standard ODBC escapes for portability, shouldn't you
be using {ts '2010-04-30 00:00:00'}?

http://msdn.microsoft.com/en-us/library/ms712360%28VS.85%29.aspx

Kris Jurka
On Fri, Jun 4, 2010 at 11:15 PM, Farid Zidan <farid@zidsoft.com> wrote:
> Now this not rocket science, it's simple insert statement where we do not
> want duplicates inserted. Works on 10 other DBMSs.
>

I find usually when one person is arguing something is complex and
someone else is arguing it's simple it's the person who's claiming
it's simple who is wrong.

The other databases are not, I believe, preventing duplicates from
being inserted as you describe. They are removing duplicates from the
string constants and then silently converting to a different datatype
before inserting. When postgres removed these default casts to text it
turned up many instances where users had buggy code and Postgres had
been hiding from them by silently using string operators which was not
what users were expecting. In other words, while it might not matter
in this case, in general if you code in this style your code is buggy
and these other database implementations are not doing you any favours
by making it appear to work correctly most of the time.

--
greg
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Jun 04, 2010 at 06:15:09PM -0400, Farid Zidan wrote:

[...]
> Let me reiterate the example, maybe it was too terse and you did not
> read it carefully,<br>

No. I think most readers here have understood your problem perfectly.
Don't underestimate the folks here.

[...]

> Now this not rocket science, it's simple insert statement where we do
> not want duplicates inserted. Works on 10 other DBMSs.<br>

Except on those "other 10 DBMSs" you are most probably getting
(silently!) something different as you'd expect (DSTINCT interpreted as
text, whereas you are "seeing" timestamps). How is that better?

> FAA stuff and other is not related to this bug. I would think the FAA
> and other organizations want a standard-compliant DBMS system that
> knows how to convert a simple ISO-formatted valid string literal to a
> timestamp value in more than one variation of sql statement.<br>

Except that the behaviour of those "other 10 DBMSs" is *beyond
standard*, the standard just rules the case where you state explicitly
the type of the constant. You will find multitude of cases where DMBSs
differ on those cases beyond standard -- that's due to different design
decisions.

What Kevin was trying to convey is that PostgreSQL's design decisions
allow its users to do things other DBMSs can't -- and that's the price
we'll have to pay. Note that behaviour is still within the standard
(and not, as you seem to suggest), so not really a problem: you can
write the query in a way which will work on "all those 11 DBMSs": just
stick to the standard.

> You can ignore this bug report and do whatever you want, just do not
> say this is an accepted, standard or desired behavior of the server or
> is by design. It's not by design that the error happens it is by faulty
> handling of the distinct keyword.<br>

Accepted -- by whom? Standard -- which standard? (because it is not
required by ISO/ANSI, and there is no other "SQL standard" that I'm
aware of).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFMCdHgBcgs9XrR2kYRAjfsAJ0WVvm3AiFfN2jqIc24dqHVbyXM0QCeJqiQ
I31OBlckZ7go48bXZx+YRpQ=
=a7Pw
-----END PGP SIGNATURE-----
On 05/06/10 06:15, Farid Zidan wrote:
> insert into test_insert
> (col1, col2)
> select *distinct*
> 'b',
> '2010-04-30 00:00:00'
>
>>Does not work. That's a bug.

Not really.

select distinct * from (VALUES
('b','2010-04-30 00:00:00'),
('b','2010-04-30  00:00:00'),
('b','20100430 000000')
) AS x(a,b);

Does that produce the result you expected? It certainly didn't
deduplicate the timestamps, yet it's doing exactly the correct thing.

So this won't work:

create table test_insert (
col1 char(8) not null,
col2 TIMESTAMP not null default CURRENT_TIMESTAMP,
UNIQUE(col2)
);

insert into test_insert
(col1, col2)
select a, b::timestamp from (
select distinct * from (VALUES
('b','2010-04-30 00:00:00'),
('b','2010-04-30  00:00:00'),
('b','20100430 000000')
) AS x(a,b)) AS y;


... which is why your example is unsafe, and even if it appears to work
on other databases it is buggy. Instead, write:

insert into test_insert
(col1, col2)
select distinct
'b',
CAST('2010-04-30 00:00:00' AS timestamp);

... which will be safe on any database, is (AFAIK) perfectly standard,
and is fuss free.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">



I only use ODBC escape sequences when necessary. Obviously I want to
use standard sql syntax as much as possible. {fn user() } is handy
because it works in all the databases that I work with and there is no
substitute standard sql function for getting current userid that is
cross-dbms.

I also use {fn now()} which works across most ODBC drivers, but I can't
in this case because I need to use a constant timestamp value so as not
to change distinctness of the subquery that is the source for the
insert.

The datetime ISO-standard string format I am using works in all the
databases I use 14+ (including PG), except in this case where
'distinct' is used with subquery in PG.

Also not all PG clients use ODBC, so other PG clients will encounter
this issue using standard ISO datetime string format when not using
ODBC. I don't want to limit users to using ODBC for loading/updating
the database by running sql scripts (which is what the sql for this
issue is used for) so almost all of the database update/load scripts
use generic sql where timestamp/datetime values are are written as ISO
datetime format strings same format as '2010-04-30 00:00:00'

BTW, I have also tested the sql in question with SQLite, MS Access, MS
Excel and Sybase Adaptive Server 15 and it works with no error, so now
that's 14 different DBMSs that have no issue with the ISO standard
string format and distinct keyword.

I guess I can find some workaround for this to work with ODBC just for
the specific sql statements causing errors with PG, but that does not
resolve the issue for PG clients not using ODBC.

Like I said, I am reporting this issue so it can be identified and
hopefully addressed at some point in the future, it is not critical for
me for it to work right now, but that would be nice otherwise user will
see a bunch of one-time errors and lose some ease of use but otherwise
will not be too badly affected.

Farid

On 6/4/2010 9:42 PM, Kris Jurka wrote:
<blockquote
 cite="mid:alpine.BSO.2.00.1006042140040.6416@leary.csoft.net"
 type="cite">

On Fri, 4 Jun 2010, Farid Zidan wrote:


  Here is actual statements I am running and
like I said they work for all 9+

DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the

current user ID):


'2010-04-30 00:00:00',

'2010-04-30 00:00:00',

{fn user() }




If you're into using standard ODBC escapes for portability, shouldn't
you be using {ts '2010-04-30 00:00:00'}?


http://msdn.microsoft.com/en-us/library/ms712360%28VS.85%29.aspx


Kris Jurka





--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">


Greg,

Obviously I do not agree. When 14 different databases by 14 different
DBMS vendors from the largest to the smallest in the market can do a
simple thing as a using a subquery that has distinct keyword and your
DBMS can't, I would say your DBMS is at fault and is not better, rather
is lacking in this respect. I am not expecting favors from the DBMS by
its doing what I expect it to do.

I do not want to beat an already dead horse, but if you review my
example, you will see that it is very simple, PG already does
conversion correctly from ISO string to timestamp column for inserting
so you can't say we removed all conversions and that is a good thing,
it is not. Basic feature of DBMS is allowing data entry into different
data type columns using plain string literals. PG already does that and
all other DBMS do that as well. For reference, although ODBC is not a
DBMS, ODBC specification requires that an ODBC driver can
convert all source DBMS data types from/to chars. This is not
by accident, it is a necessity and is by design. I can understand that
having multiple data formats for conversion to native data types from
text can cause bugs and that's why we have established standards such
as ISO for datetime/timestamp string formats and PG supports the
conversion already.

The issue is the PG is not doing it correctly when 'distinct' keyword
is used in the select statement. There is nothing buggy with using ISO
datetime string literals to insert into a table timestamp column. There
is no behind the scene magic going on.

1 Execute subquery: string literals are just that can be 'aa', 'bb',
'2010-04-30 00:00:00', whatever, it does not matter what the string
literal is.

2 Eliminate duplicates

3 Now a string literal is being inserted into a timestamp column, you
have a string literal and you are asked to insert into a timestamp
colum -> convert string literal to timestamp and do the insert

As you can see there is nothing buggy or heinous here, just simple
select with distinct keyword in step 1, 2 and conversion from string
literal to timestamp value in step 3

There is no ambiguity or magic to happen. Obviously in PG case there is
some design or fault somewhere in this use-case when distinct keyword
is used and is processed in step 2, that's all.

Farid

On 6/4/2010 10:41 PM, Greg Stark wrote:
<blockquote
 cite="mid:AANLkTinvpWLi3CoBWYgNPxVdFbbLhEHZuYdp9buvDubV@mail.gmail.com"
 type="cite">
  On Fri, Jun 4, 2010 at 11:15 PM, Farid Zidan <farid@zidsoft.com> wrote:


    Now this not rocket science, it's simple insert statement where we do not
want duplicates inserted. Works on 10 other DBMSs.




I find usually when one person is arguing something is complex and
someone else is arguing it's simple it's the person who's claiming
it's simple who is wrong.

The other databases are not, I believe, preventing duplicates from
being inserted as you describe. They are removing duplicates from the
string constants and then silently converting to a different datatype
before inserting. When postgres removed these default casts to text it
turned up many instances where users had buggy code and Postgres had
been hiding from them by silently using string operators which was not
what users were expecting. In other words, while it might not matter
in this case, in general if you code in this style your code is buggy
and these other database implementations are not doing you any favours
by making it appear to work correctly most of the time.




--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
<meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">


Craig

I am not asking you to re-write my sql so the bug will not show. I am
presenting you with sql that fails and shows the bug. If every time
someone reported a bug you ask them to re-write their sql so the bug is
not hit, that would not eliminate the bug.

Also, you are using different timestamp string literals in your
subquery. I am using the same constant datetime string literal
in my example that the query processor does not need to cast to
timestamp or anything to do the distinct part and eliminate duplicates.
insert into test_insert
(col1, col2)
select distinct
'b',
cast('2010-04-30 00:00:00' as timestamp)


>This works as expected. However is not an option because it is not
generic sql. In PG timestamp data type is called 'timestamp' but in
another DBMS it may be called  'datetime', etc.


  ... which is why your example is unsafe, and even if it appears to work
on other databases it is buggy. Instead, write:

My example is safe and is cross-dbms. I am not doing anything
extra-ordinary just select distinct where a constant string expression
is used in the select list.
select distinct
'b',
'2010-04-30 00:00:00'


Why is the sql above unsafe? It is not. It is simple select statement
with two constant string expressions and distinct keyword. Now use the
result of the sql above as source for inserting into test_table (col1,
col2):
insert into test_insert
(col1, col2)
select distinct
'b',
'2010-04-30 00:00:00'


There is nothing unsafe here. You have a resultset that has one row
with the values 'b',
'2010-04-30 00:00:00' being used to insert int col1, col2. Why would
you say that's unsafe? '2010-04-30 00:00:00' is an ISO string literal
being inserted into col2 whose data type is timestamp, perfectly safe.

Farid

On 6/5/2010 3:26 AM, Craig Ringer wrote:
<blockquote cite="mid:4C09FC41.4000304@postnewspapers.com.au"
 type="cite">
  On 05/06/10 06:15, Farid Zidan wrote:


    insert into test_insert
(col1, col2)
select *distinct*
'b',
'2010-04-30 00:00:00'



      Does not work. That's a bug.




Not really.

select distinct * from (VALUES
('b','2010-04-30 00:00:00'),
('b','2010-04-30  00:00:00'),
('b','20100430 000000')
) AS x(a,b);

Does that produce the result you expected? It certainly didn't
deduplicate the timestamps, yet it's doing exactly the correct thing.

So this won't work:

create table test_insert (
col1 char(8) not null,
col2 TIMESTAMP not null default CURRENT_TIMESTAMP,
UNIQUE(col2)
);

insert into test_insert
(col1, col2)
select a, b::timestamp from (
select distinct * from (VALUES
('b','2010-04-30 00:00:00'),
('b','2010-04-30  00:00:00'),
('b','20100430 000000')
) AS x(a,b)) AS y;


... which is why your example is unsafe, and even if it appears to work
on other databases it is buggy. Instead, write:

insert into test_insert
(col1, col2)
select distinct
'b',
CAST('2010-04-30 00:00:00' AS timestamp);

... which will be safe on any database, is (AFAIK) perfectly standard,
and is fuss free.




--

Signature

www.zidsoft.com
CompareData:  compare
and synchronize SQL DBMS data visually <font
 size="-1">between two databases
using ODBC drivers
Farid Zidan <farid@zidsoft.com> writes:
> I am not asking you to re-write my sql so the bug will not show. I am
> presenting you with sql that fails and shows the bug. If every time
> someone reported a bug you ask them to re-write their sql so the bug
> is not hit, that would not eliminate the bug.

You're not reading us.
Your example makes no sense at all, and hides the problem you have. You
are failing to understand the situation because of this.

If you want to understand the problem you have, please go test your 14
databases with the meaningful tests that have been offered to you.

Until and unless you do so, I don't think posting here again on this
issue has any value, for you nor for us. You're abusing the time of lots
of people who are both busy and helpful, so please try to understand the
advices and analysis they've been offering to you.

Regards,
--
dim
Dimitri

This is the last you will hear from me about this issue. I would have
stopped repsonding long time ago, but I kept getting responses that
required reply. Do not knock on the door if you don't want someone to
answer the door.

There is no point in spending more of my time explaining to you simple
things you seem incapable of understanding. You make no technical point,
you make a personal opinion. You are entitled to your opinion. Software
is not built on personal opinions, software is built on good
technical/logical analysis of issues of which you offer none.

I really did not expect a debate. I reported an issue that is clearly a
bug. I expected "yes, we see, we will address this at some point, thank
you." I really don't mind explaining the issue, but what I do mind is
people like you who don't have a technical understanding of the issue
and nonetheless appoint themselves to speak for PostgreSQL/MySQL or any
organization about things they do not understand.

Dmitri, it is good you don't work for me, for if you did you would not
have a job. The plight of the software industry is people who when
confronted with a logical argument revert to ad hominem response because
they have no valid technical response.

And of course you should speak only for yourself because you are not
PostgreSQL and you don't not represent PostgreSQL or anbody else in the
collaborative software development spirit. The arrogance you show is
appalling.

> Until and unless you do so, I don't think posting here again on this
> issue has any value, for you nor for us. You're abusing the time of lots
> of people who are both busy and helpful, so please try to understand the
> advices and analysis they've been offering to you
I see no technical analysis in your response. If you too busy to engage
in logical debate, you should remove yourself from the bug list.
Software is built on logical analysis. You are too busy, do not participate.

Farid

On 6/5/2010 4:16 PM, Dimitri Fontaine wrote:
> Farid Zidan<farid@zidsoft.com>  writes:
>
>> I am not asking you to re-write my sql so the bug will not show. I am
>> presenting you with sql that fails and shows the bug. If every time
>> someone reported a bug you ask them to re-write their sql so the bug
>> is not hit, that would not eliminate the bug.
>>
> You're not reading us.
> Your example makes no sense at all, and hides the problem you have. You
> are failing to understand the situation because of this.
>
> If you want to understand the problem you have, please go test your 14
> databases with the meaningful tests that have been offered to you.
>
> Until and unless you do so, I don't think posting here again on this
> issue has any value, for you nor for us. You're abusing the time of lots
> of people who are both busy and helpful, so please try to understand the
> advices and analysis they've been offering to you.
>
> Regards,
>

--
www.zidsoft.com <http://www.zidsoft.com/> CompareData: compare and
synchronize SQL DBMS data visually between two databases using ODBC drivers

Re: Re: BUG #5490: INSERT doesn't force cast from text to timestamp

From
Craig Ringer
Date:
On 5/06/2010 9:39 PM, Farid Zidan wrote:
> Craig
>
> I am not asking you to re-write my sql so the bug will not show. I am
> presenting you with sql that fails and shows the bug.

Before writing this off completely, please read on. I suspect I may've
misunderstood your argument, and on further examination think there's
something that bears design consideration here. It's not a bug, as it's
working exactly as designed, but it _is_ something where the design
might benefit from a tweak.

In the mean time, you have a system that doesn't work how you want it do
- design choice or bug, workaround or bug-hiding, the effect is the
same. You must already be handing differing names of timestamp/datetime
types in your DDL, so why not use the same mapping in your DML?

As for the root of the issue: Here's why implicitly casting those
literals to timestamps would be a bad idea for solid, technical reasons:

- The semantics of a nested/sub query should not be affected by the
calling context, ie the surrounding query. To have a function affected
by what the caller is doing would be bizarre and confusing; so it is
with a subquery. You cannot test functional units or rely on any kind of
consistent behaviour if calling context changes callee behaviour.

- A SELECT as a value-supplier to an INSERT is a type of subquery

- If your SELECT ran differently in the context of the INSERT to how it
ran standalone, that would not only be incredibly confusing but also
clearly a bug.


... so we clearly can't use type information from the surrounding INSERT
to determine the data type of the literals used in the SELECT. With no
type information to the contrary they must be interpreted as text. So,
after SELECT evaluation your query goes from:

insert into test_insert
(col1, col2)
select distinct
'b',
'2010-04-30 00:00:00';

to effectively:

insert into test_insert
(col1, col2)
values ('b'::text, '2010-04-30 00:00:00'::text);

Now, when executed with Pg both those queries result in:

ERROR:  column "col2" is of type timestamp without time zone but
expression is of type text

showing that your issue isn't actually with DISTINCT at all, but with
Pg's unwillingness to *implicitly* cast a value of explict text type to
another type.

This is the real core of your complaint. You want Pg to cast from 'text'
to 'timestamp' on INSERT. Pg doesn't. In  the general case (not specific
to INSERT) there are some pretty good reasons for that, but for
INSERTs... should it? In the context of an INSERT, where the desired
data type is explicitly specified and obvious, I'm not sure, but I
suspect that it should.

Anyone familiar with the details of the implicit-text-cast bugs want to
pipe up with whether it'd be safe to force a cast in the context of an
INSERT?


( As for why I adjusted your example: Your example query was
unrealistically simplistic and clearly couldn't be what you were using
in your app. Using 'DISTINCT' with one value is pointless. So, my
example added some dummy values to illustrate why it might be unsafe to
use it how you're trying to. As it turns out, in your app's case you can
guarantee input formatting consistency, so it's safe for you so long as
you stick very closely to timestamp formatting specifics, but I'm sure
you can see that a database's behaviour depending on the formatting of
timestamps is probably not something that most people with most apps
would be happy with. Even if INSERT did convert the results of the
SELECT DISTINCT subquery to timestamps, it'd still be pretty unsafe.)

--
Craig Ringer
Farid Zidan <farid@zidsoft.com> writes:
> I see no technical analysis in your response.

I made a clear reference to what others have been saying, and that you
have been refusing to read. I don't see any point in getting technical
again, as I'm sure you will refuse to understand what is happening to
you here.

But as you insist, let's try once more:

 - your consider a bug where PostgreSQL wants to know more about the
   data type you're using in the query in order to be able to enforce a
   DISTINCT restriction

 - other products are happy to solve the DISTINCT restriction without
   any hint as far as what the datatype really is

 - the error message is perfectly clear about what PostgreSQL needs from
   you

 - the reason why PostgreSQL wants you to give it details is clear to:
   what means DISTINCT depends on the datatype, you can easily have two
   different text representations of the same timestamptz, for example

 - it could be considered a possible area of improvement in the system
   that the SELECT part of the INSERT INTO ... SELECT ... could
   determine the SELECT output columns type by looking at the INSERT
   target table definition

 - it would be asking for a new feature to be considered, not for a bug
   to be solved: the system currently works exactly as designed.

Now if you keep wanting not to understand how the system is currently
designed, that makes it your problem, no one else.

Regards,
--
dim
On 7/06/2010 3:51 AM, Dimitri Fontaine wrote:

>   - other products are happy to solve the DISTINCT restriction without
>     any hint as far as what the datatype really is

... and so is Pg. That's not the problem; Pg complains after resolving
the SELECT DISTINCT, when it finds that it's trying to insert values of
type 'text' into a column with type 'timestamp'.

You'll get exactly the same error if you replace the OP's SELECT
DISTINCT subquery with a VALUES list that explicitly specifies TEXT type.

The other clue as to what's happening is that if you run the SELECT part
of the query standalone, it executes fine, treating the passed values as
'text'.

Personally, I do think this is a bit of a wart. I know why the explicit
casts around text were removed, but in the case of INSERT I'm not sure
the current behaviour is desirable.

I initially thought the OP was asking for Pg to infer the type of the
timestamp literals from the surrounding INSERT, and for that reason was
very much against the idea. After realizing that what they really expect
is for the SELECT to interpret the literals as 'text' (just as it does)
then Pg to implicitly cast the 'text' query results to 'timestamp', I
can see why they want it and why they're frustrated with the current
behaviour.

>   - the error message is perfectly clear about what PostgreSQL needs from
>     you

Apparently not, as you seem to have misunderstood it ;-)

>   - the reason why PostgreSQL wants you to give it details is clear to:
>     what means DISTINCT depends on the datatype, you can easily have two
>     different text representations of the same timestamptz, for example

That's not why Pg reports an error. If it was, then the following query
would not be legal:

SELECT DISTINCT x.* FROM (VALUES ('a'),('b'),('c')) AS x;

... since there's no explicit type info provided.

Pg follows the SQL rules and interprets literals as text if there's no
explicit type info provided and no immediate context that requires a
particular type. So the above work fine, treating 'a', 'b', and 'c' as
if they were written:

    (TEXT 'a'), (TEXT 'b'), (TEXT 'c')

>   - it could be considered a possible area of improvement in the system
>     that the SELECT part of the INSERT INTO ... SELECT ... could
>     determine the SELECT output columns type by looking at the INSERT
>     target table definition

I'd say that'd be a ghastly "solution". It'd change the meaning of the
SELECT based on the surrounding INSERT. Imagine trying to figure out
what was going on with a query that wasn't doing what you wanted when
you couldn't run it standalone and know the results were the same!

--
Craig Ringer
I know it's not what you want (Pg behaving how you expect out of the
box) but creating implicit casts to the desired types will resolve your
immediate issue. You still have to run some Pg-specific code, but it can
be restricted to your DDL where there's (presumably) already plenty.

See:

http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/

http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html

http://wiki.postgresql.org/images/d/d1/Pg83-implicit-casts.sql

--
Craig Ringer
Craig Ringer <craig@postnewspapers.com.au> writes:
>>   - other products are happy to solve the DISTINCT restriction without
>>     any hint as far as what the datatype really is
>
> ... and so is Pg. That's not the problem; Pg complains after resolving the
> SELECT DISTINCT, when it finds that it's trying to insert values of type
> text' into a column with type 'timestamp'.

Ah yes. I've been paying more attention to the energy people have been
willing to put into helping the OP than into what the problem really
is. Thanks for clarifying.

--
dim
Craig,

> I know it's not what you want (Pg behaving how you expect out of the
> box) but creating implicit casts to the desired types will resolve
> your immediate issue. You still have to run some Pg-specific code, but
> it can be restricted to your DDL where there's (presumably) already
> plenty.
My main concern is identifying this issue by PosgtreSQL developers so it
can be addressed as some point in the future. I believe that is done now
and the issue is identified.

As it happens the sql in question is part of a common generic sql script
file where the same script file is run against 10+ different DBMSs to
update the database data from a previous release (all 10+ DBMS schemas
have the same table/column names, logical structure, logical data types,
etc). Having said that, I can introduce a PG-only data update file as a
workaround for PG in this case but not without complicating things quite
a bit. However, due to limited time and schedule, I will be just
accepting the error (based on cost-analysis of the error) in this case.

Best regards,
Farid


On 6/6/2010 11:17 PM, Craig Ringer wrote:
> I know it's not what you want (Pg behaving how you expect out of the
> box) but creating implicit casts to the desired types will resolve
> your immediate issue. You still have to run some Pg-specific code, but
> it can be restricted to your DDL where there's (presumably) already
> plenty.
>
> See:
>
> http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/
>
>
> http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html
>
>
> http://wiki.postgresql.org/images/d/d1/Pg83-implicit-casts.sql
>
> --
> Craig Ringer
>
>

--
www.zidsoft.com <http://www.zidsoft.com/> CompareData: compare and
synchronize SQL DBMS data visually between two databases using ODBC drivers