Thread: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
"Farid Zidan"
Date:
The following bug has been logged online: Bug reference: 5490 Logged by: Farid Zidan Email address: farid@zidsoft.com PostgreSQL version: 8.4.1 Operating system: Windows XP 32-bit Description: Using distinct for select list causes insert of timestamp string literal to fail Details: If you use keyword 'distinct' for the source select of the insert statement the insert fails. Insert succeeds if 'distinct' is not used in select list. Example, create table test_insert ( col1 char(8) not null, col2 TIMESTAMP not null default CURRENT_TIMESTAMP); insert into test_insert (col1, col2) values ('a', '2010-04-30 00:00:00'); insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' ERROR: column "col2" is of type timestamp without time zone but expression is of type text LINE 16: '2010-04-30 00:00:00' ^ HINT: You will need to rewrite or cast the expression. ********** Error ********** ERROR: column "col2" is of type timestamp without time zone but expression is of type text SQL state: 42804 Hint: You will need to rewrite or cast the expression. Character: 282
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Tom Lane
Date:
"Farid Zidan" <farid@zidsoft.com> writes: > If you use keyword 'distinct' for the source select of the insert statement > the insert fails. Insert succeeds if 'distinct' is not used in select list. This isn't a bug, it's a consequence of the fact that you're not specifying the types of the literal constants. DISTINCT forces the parser to assign a data type to the constants (otherwise there is no way to understand what duplicate-elimination means) and what it will fall back to is "text". Try attaching an explicit cast, eg '2010-04-30 00:00:00'::timestamp regards, tom lane
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
"Kevin Grittner"
Date:
"Farid Zidan" <farid@zidsoft.com> wrote: > insert into test_insert > (col1, col2) > select distinct > 'b', > '2010-04-30 00:00:00' > > > ERROR: column "col2" is of type timestamp without time zone but > expression is of type text > LINE 16: '2010-04-30 00:00:00' > ^ > HINT: You will need to rewrite or cast the expression. Try using a timestamp literal instead of a bare literal: insert into test_insert (col1, col2) select distinct 'b', timestamp '2010-04-30 00:00:00' This is actually working as intended in all the cases you showed, so it isn't a bug. If we were strictly complying with the SQL standard, your first example would also fail, but we are more lenient than the standard where we can be, to allow an unadorned literal to be an UNKNOWN type until something causes it to be resolved, to allow people to omit the type decoration in many cases. To determine that something is a distinct value, you have to determine a type for it (otherwise you won't know if '2010-04-30 00:00:00' is the same as '2010-04-30 00:00:00.0', for example), so if you don't tell it otherwise, it will assume text -- leading to the behavior you saw. -Kevin
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
"Kevin Grittner"
Date:
>Farid Zidan <farid@zidsoft.com> wrote: >> If we were strictly complying with the SQL standard, > Considering the statement works in all the 9 DBMS systems+ that I > have tested so far as mentioned above, I would say PostgreSQL is > not compliant with SQL standard in this regard. The SQL standard is a document published by the International Standards Organization (ISO) and also adopted by the American National Standards Institute (ANSI). Those documents don't require a query in either of the forms you presented to work. Because of the convenience factor, most database products have non-standard extensions to omit type specification in some places. PostgreSQL's extensions are oriented more toward user-installable data types (such as geometric shapes or global coordinates), so the particulars of our non-standard extensions differ so that use of those features is as easy as practicable. That does result in some non-standard extensions which work in other products not working in PostgreSQL. I think you'll find that the syntax I suggested (using the standard timestamp literal instead of a bare character string literal) will work in all of the databases you mentioned; if you want portable code, it is best to follow the standard rather than some inferred popular convention. I hope this helps. -Kevin
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
"Kevin Grittner"
Date:
>Farid Zidan <farid@zidsoft.com> wrote: > can be eliminated by appropriately handling the distinct keyword > and does not have to occur. Based on previous discussions around our approaching data types, I don't think any of the regular PostgreSQL developers are likely to agree with you; but if you see a way to make it work, feel free to submit a patch. See this page for the process: http://wiki.postgresql.org/wiki/Submitting_a_Patch > The ISO-datetime string literal format I am using the most > general/standard for datetime/timestamp and is not the issue here. The format in your string literal is the portable one; however, a timestamp literal requires the TIMESTAMP keyword ahead of the string literal, which you have chosen to omit. Did you try the query with a proper timestamp literal, as I suggested, against all these databases? If using standard syntax works, why not use it? > The 'distinct' keyword is causing the error. No, non-standard syntax is causing the error in the case of DISTINCT, because our extension to the standard does not cover that case, even though it covers the other. There are good reasons for that, which you'll probably discover in short order if you work on a patch for the issue. -Kevin
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> On 6/4/2010 11:53 AM, Tom Lane wrote: DISTINCT forces the parser to assign a data type to the constants (otherwise there is no way to understand what duplicate-elimination means) and what it will fall back to is "text" I am including the column list for the insert, so parser knows col2 data type is TIMESTAMP and it has to convert from text to timestamp to do the insert. It should be able to do that without generating an error. It is the same select list, the same data types, nothing has changed except using the 'distinct' keyword to eliminate duplicates. The parse behavior after duplicates have been eliminated should be the same as when 'distinct' is not used. Whether 'distinct' is used or not should not affect the semantics of the insert statement (it should only remove duplicate rows). I have used this statement in Firebrid, MS SQL Server, Oracle, MySQL, SQLAnywhere, DB2, Derby, Informix, etc, and all of them do not generate an error because I need to use 'distinct' to eliminate duplicates from being inserted. If we were strictly complying with the SQL standard, Considering the statement works in all the 9 DBMS systems+ that I have tested so far as mentioned above, I would say PostgreSQL is not compliant with SQL standard in this regard. I guess, what I am saying, is that what the parser is doing is not the desired behavior. I understand there are technical things going on behind the scene, but that's what needs to be fixed to ensure PostgreSQL compatibility with SQL standard and interoperability with generic sql statements. best regards, Farid On 6/4/2010 11:57 AM, Kevin Grittner wrote: <blockquote cite="mid:4C08DC140200002500031F7C@gw.wicourts.gov" type="cite"> "Farid Zidan" <farid@zidsoft.com> wrote: insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' ERROR: column "col2" is of type timestamp without time zone but expression is of type text LINE 16: '2010-04-30 00:00:00' ^ HINT: You will need to rewrite or cast the expression. Try using a timestamp literal instead of a bare literal: insert into test_insert (col1, col2) select distinct 'b', timestamp '2010-04-30 00:00:00' This is actually working as intended in all the cases you showed, so it isn't a bug. If we were strictly complying with the SQL standard, your first example would also fail, but we are more lenient than the standard where we can be, to allow an unadorned literal to be an UNKNOWN type until something causes it to be resolved, to allow people to omit the type decoration in many cases. To determine that something is a distinct value, you have to determine a type for it (otherwise you won't know if '2010-04-30 00:00:00' is the same as '2010-04-30 00:00:00.0', for example), so if you don't tell it otherwise, it will assume text -- leading to the behavior you saw. -Kevin -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> Hello Kevin, My bug report is about using 'distinct' in the select list which is causing a side-effect. That's why I classify this as a bug. Distinct should not have unintended side-effects. This side-effect is implementation-dependent and is manifested in the current PostgreSQL query processing but can be eliminated by appropriately handling the distinct keyword and does not have to occur. The ISO-datetime string literal format I am using the most general/standard for datetime/timestamp and is not the issue here. The 'distinct' keyword is causing the error. Farid On 6/4/2010 12:52 PM, Kevin Grittner wrote: <blockquote cite="mid:4C08E8F10200002500031FAD@gw.wicourts.gov" type="cite"> Farid Zidan <farid@zidsoft.com> wrote: If we were strictly complying with the SQL standard, Considering the statement works in all the 9 DBMS systems+ that I have tested so far as mentioned above, I would say PostgreSQL is not compliant with SQL standard in this regard. The SQL standard is a document published by the International Standards Organization (ISO) and also adopted by the American National Standards Institute (ANSI). Those documents don't require a query in either of the forms you presented to work. Because of the convenience factor, most database products have non-standard extensions to omit type specification in some places. PostgreSQL's extensions are oriented more toward user-installable data types (such as geometric shapes or global coordinates), so the particulars of our non-standard extensions differ so that use of those features is as easy as practicable. That does result in some non-standard extensions which work in other products not working in PostgreSQL. I think you'll find that the syntax I suggested (using the standard timestamp literal instead of a bare character string literal) will work in all of the databases you mentioned; if you want portable code, it is best to follow the standard rather than some inferred popular convention. I hope this helps. -Kevin -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> Hello Kevin, I strongly disagree with your analysis of this issue. Like I said, this syntax works with 9 different databases, so obviously whatever PosgreSQL query procesor is doing in this case is not the desired behavior. To ensure PosgreSQL success, the query processor must behave in a compliant manner with established standards whether those standards are set by SQL ISO specs or are de facto standards. It is too much asking developers to change their sql to overcome implementation-dependent side-effects of PostgreSQL query processor. If a simple SQL statement works on 9+ different databases, then it should also work in PostreSQL with no need for developers to special-code for PostgreSQL. Very basic feature is converting a string literal to a datetime/timestamp value and developers should not do any special coding to accomplish this simple conversion. '2010-04-30 00:00:00' should convert to timestamp in PostgreSQL with no other flags or syntax decoration (it already does except when 'distinct' is used). Compatibility is very high on desired features for a DBMS and is a requirement for smooth porting of applications from other databases to PostreSQL and cross-dbms applications. It really boils down to making it work, technical details are what developers love and I am sure PostgreSQL developers can make this simple sql insert work on PostreSQL just like all the other developers have done for the other DBMSs. Anyway, I have reported this issue because I encountered it and it negatively impacts my project. I don't expect it to be fixed right now, that's something that PostgreSQL developers can debate and prioritize. I only ask that this issue is identified, since it does not work in my case when the target dbms is PostgreSQL and I am sure it can impact other developers projects and it would need to be addressed at some point in the future with a solution where it just work like it does in all the other DBMSs. Farid On 6/4/2010 1:36 PM, Kevin Grittner wrote: <blockquote cite="mid:4C08F33E0200002500031FBA@gw.wicourts.gov" type="cite"> Farid Zidan <farid@zidsoft.com> wrote: can be eliminated by appropriately handling the distinct keyword and does not have to occur. Based on previous discussions around our approaching data types, I don't think any of the regular PostgreSQL developers are likely to agree with you; but if you see a way to make it work, feel free to submit a patch. See this page for the process: http://wiki.postgresql.org/wiki/Submitting_a_Patch The ISO-datetime string literal format I am using the most general/standard for datetime/timestamp and is not the issue here. The format in your string literal is the portable one; however, a timestamp literal requires the TIMESTAMP keyword ahead of the string literal, which you have chosen to omit. Did you try the query with a proper timestamp literal, as I suggested, against all these databases? If using standard syntax works, why not use it? The 'distinct' keyword is causing the error. No, non-standard syntax is causing the error in the case of DISTINCT, because our extension to the standard does not cover that case, even though it covers the other. There are good reasons for that, which you'll probably discover in short order if you work on a patch for the issue. -Kevin -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Greg Stark
Date:
On Fri, Jun 4, 2010 at 7:18 PM, Farid Zidan <farid@zidsoft.com> wrote: > If a simple SQL statement works on 9+ different databases For what it's worth are you sure it works as you expect in these other databases? I suspect what they're doing is doing a DISTINCT of the text values and then converting the results. That's not the same as what you're suggesting it do (insert distinct timestamp values) since different text values can represent the same timestamp. For example look at what this does: select cast(x as timestamp with time zone) from (select distinct x from (values ('2010-01-01 12:00PM UTC'), ('2010-01-01 7:00AM EST')) as x(x)) as y; If you inserted those values into a table with a timestamp with time zone column you would get duplicate values even with the distinct. This is the problem with depending on non-standard extensions. You're never really sure that they're working. They be working on some systems but doing something unexpected on other systems. -- greg
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
"Kevin Grittner"
Date:
>Farid Zidan <farid@zidsoft.com> wrote: > but when it gets to use the resultset of the subquery in the > insert it "forgets" how to convert '2010-04-30 00:00:00' to > timestamp value Not really. In versions prior to 8.3 it did automagically convert like that. PostgreSQL has some pretty fancy features involving custom data types where this magic caused problems, so a deliberate decision was taken to no longer provide automatic casts from text to other data types. > (but forgets only when 'distinct' is used in the subquery!) That is because (as I tried to explain earlier, but apparently didn't do a good job of communicating), an unadorned literal in single quotes is *not* taken to be a character string in PostgreSQL. Its type is held as "unknown" until it is forced to be resolved in some operation. This allows easier coding of custom data types, but does create a few deviations from standard behavior in corner cases, and breaks from the non-standard "conventional" behavior of many other databases. Because of this design choice, for example, the FAA can more easily write the code they use to map their runways and other airport facilities. The cost is that in situations such as you describe, you need to force the type before it is used in the comparisons necessary to determine a distinct value. The only way to get the behavior you want without breaking a great many useful cases, would be to determine where the result was going to be later used, and use that information to force the type to something other than text (the default, when no other information is available). That would be a *major* and destabilizing change. For those reasons, the chance of getting *anybody* here to consider this a bug are close to nil. The choice to more conveniently handle advanced cases at the expense of occasionally needing to specify a type is unlikely to be reversed, to put it mildly. I can't help but wonder why you resist using the standard syntax. The reason the standard exists is to help those trying to write portable code, so they don't have to count on the vagaries of "parallel evolution." -Kevin
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> Hello Greg, I suspect what they're doing is doing a DISTINCT of the text values and then converting the results. That's not the same as what you're suggesting it do (insert distinct timestamp values) since different text values can represent the same timestamp. For example look at what That's a good point. I think you are correct. When the query parser is in the nested subselect it only sees string literals for the timestamp column values (does not know it is a timestamp yet). However, when it gets to do the insert it then must convert the string literals to timestamp values because at that point it knows that the string literal is to be inserted into a timestamp column. Since I am using a constant string literal for the timestamp it really does not matter when the conversion takes place. select distinct <col1>, <col2>, .. '2010-04-30 00:00:00' from .... the timestamp string literal is a constant and really does not affect the distinct resultset in anyway. I do need to stamp all the inserts with a specific timestamp value and that's why I am using a constant string literal. If I used an expression such as current_timestamp/(ODBC {fn now()}, then that would factor into the distinct clause and pollute the distinctness of subquery reulsultset. Here is actual statements I am running and like I said they work for all 9+ DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the current user ID): insert into in_sync_node_toolbar (node_no, sync_cd, toolbar_cd, ctrl_ins_dtm, ctrl_upd_dtm, ctrl_usr_id) select distinct isnr.node_no, case when isr.rs_type_cd = 'TABLELS' then 'CMPTS' when isr.rs_type_cd = 'PROCLS' then 'CMPPROCS' when isr.rs_type_cd = 'SEQLS' then 'CMPSEQS' else null end, 'TBCSCPT1', '2010-04-30 00:00:00', '2010-04-30 00:00:00', {fn user() } from in_sync_node_resultset isnr, in_sync_object_pattern isop, in_sync_resultset isr where (isnr.rs_oid = isr.rs_oid or isnr.rs_oid_other = isr.rs_oid) and isr.rs_oid = isop.rs_oid and isr.rs_type_cd in ('TABLELS', 'PROCLS', 'SEQLS'); insert into in_sync_node_toolbar (node_no, sync_cd, toolbar_cd, ctrl_ins_dtm, ctrl_upd_dtm, ctrl_usr_id) select distinct isnr.node_no, case when isr.rs_type_cd = 'TBLVIEW' then 'CMPTABLE' when isr.rs_type_cd = 'PROC' then 'CMPPROC' when isr.rs_type_cd = 'SEQ' then 'CMPSEQ' else null end, 'TBCSCPT1', '2010-04-30 00:00:00', '2010-04-30 00:00:00', {fn user() } from in_sync_node_resultset isnr, in_sync_object iso, in_sync_resultset isr where (isnr.rs_oid = isr.rs_oid or isnr.rs_oid_other = isr.rs_oid) and isr.rs_oid = iso.rs_oid and isr.rs_type_cd in ('TBLVIEW', 'PROC', 'SEQ'); This is the problem with depending on non-standard extensions. You're never really sure that they're working. They be working on some systems but doing something unexpected on other systems. All the other DBMSs doing is a select distinct on the subquery that has the constant timestamp string literals. There is nothing non-standard or ambiguous there. As far as the DBMS is concerned the constant string expression is just a string literal and can represent anything. Now the issue is that when the other DBMSs get to do the insert part they are able, as one would expect, to convert the subquery resultset string literal column to a timestamp column. I think PostreSQL is doing the first part (subquery with distinct clause correctly), but when it gets to use the resultset of the subquery in the insert it "forgets" how to convert '2010-04-30 00:00:00' to timestamp value (but forgets only when 'distinct' is used in the subquery!) Farid On 6/4/2010 4:18 PM, Greg Stark wrote: <blockquote cite="mid:AANLkTilLvx4m4TlHxeFERL60Xubiz0IhTjUXqTkoH6Le@mail.gmail.com" type="cite"> On Fri, Jun 4, 2010 at 7:18 PM, Farid Zidan <farid@zidsoft.com> wrote: If a simple SQL statement works on 9+ different databases For what it's worth are you sure it works as you expect in these other databases? I suspect what they're doing is doing a DISTINCT of the text values and then converting the results. That's not the same as what you're suggesting it do (insert distinct timestamp values) since different text values can represent the same timestamp. For example look at what this does: select cast(x as timestamp with time zone) from (select distinct x from (values ('2010-01-01 12:00PM UTC'), ('2010-01-01 7:00AM EST')) as x(x)) as y; If you inserted those values into a table with a timestamp with time zone column you would get duplicate values even with the distinct. This is the problem with depending on non-standard extensions. You're never really sure that they're working. They be working on some systems but doing something unexpected on other systems. -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> Hello Kevin, I can't help but wonder why you resist using the standard syntax. I am using the standard syntax. Single quote in sql denotes a string. so '2010-04-30 00:00:00' is string literal. That's universal. Now you want me to use PG-specific timestamps and that's like I said is not standard/cross-dbms. I have just finished testing with Ingre 9.2 and it works there too. That's 10 DBMSs systems that use single quotes to denote a string literal and can covert ISO-standard datetime string literal to timestamp. You can't not interpret string literals one way in one statement and just because user uses the word 'distinct' decide to switch paradigms. That's not good design or planning. Of course you can decide to do whatever you want, just do not expect developers to start special-coding just for PostreSQL because you decide to cast correctly or not correctly depending on whim. Let me reiterate the example, maybe it was too terse and you did not read it carefully, create table test_insert ( col1 char(8) not null, col2 TIMESTAMP not null default CURRENT_TIMESTAMP); >create the test table. No issue. insert into test_insert (col1, col2) values ('a', '2010-04-30 00:00:00'); >Works like expected, PG correctly converts standard ISO-datetime string literal to timestamp. No issue. insert into test_insert (col1, col2) select 'b', '2010-04-30 00:00:00' >That works too. No issue. insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' >Does not work. That's a bug. Now this not rocket science, it's simple insert statement where we do not want duplicates inserted. Works on 10 other DBMSs. FAA stuff and other is not related to this bug. I would think the FAA and other organizations want a standard-compliant DBMS system that knows how to convert a simple ISO-formatted valid string literal to a timestamp value in more than one variation of sql statement. You can ignore this bug report and do whatever you want, just do not say this is an accepted, standard or desired behavior of the server or is by design. It's not by design that the error happens it is by faulty handling of the distinct keyword. I think you have all the information you need to debate and resolve this issue. If you need any other information you can contact me and I will be happy to oblige. Farid On 6/4/2010 5:40 PM, Kevin Grittner wrote: <blockquote cite="mid:4C092C650200002500032027@gw.wicourts.gov" type="cite"> I can't help but wonder why you resist using the standard syntax. The reason the standard exists is to help those trying to write portable code, so they don't have to count on the vagaries of "parallel evolution." -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Kris Jurka
Date:
On Fri, 4 Jun 2010, Farid Zidan wrote: > Here is actual statements I am running and like I said they work for all 9+ > DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the > current user ID): > > '2010-04-30 00:00:00', > '2010-04-30 00:00:00', > {fn user() } > If you're into using standard ODBC escapes for portability, shouldn't you be using {ts '2010-04-30 00:00:00'}? http://msdn.microsoft.com/en-us/library/ms712360%28VS.85%29.aspx Kris Jurka
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Greg Stark
Date:
On Fri, Jun 4, 2010 at 11:15 PM, Farid Zidan <farid@zidsoft.com> wrote: > Now this not rocket science, it's simple insert statement where we do not > want duplicates inserted. Works on 10 other DBMSs. > I find usually when one person is arguing something is complex and someone else is arguing it's simple it's the person who's claiming it's simple who is wrong. The other databases are not, I believe, preventing duplicates from being inserted as you describe. They are removing duplicates from the string constants and then silently converting to a different datatype before inserting. When postgres removed these default casts to text it turned up many instances where users had buggy code and Postgres had been hiding from them by silently using string operators which was not what users were expecting. In other words, while it might not matter in this case, in general if you code in this style your code is buggy and these other database implementations are not doing you any favours by making it appear to work correctly most of the time. -- greg
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, Jun 04, 2010 at 06:15:09PM -0400, Farid Zidan wrote: [...] > Let me reiterate the example, maybe it was too terse and you did not > read it carefully,<br> No. I think most readers here have understood your problem perfectly. Don't underestimate the folks here. [...] > Now this not rocket science, it's simple insert statement where we do > not want duplicates inserted. Works on 10 other DBMSs.<br> Except on those "other 10 DBMSs" you are most probably getting (silently!) something different as you'd expect (DSTINCT interpreted as text, whereas you are "seeing" timestamps). How is that better? > FAA stuff and other is not related to this bug. I would think the FAA > and other organizations want a standard-compliant DBMS system that > knows how to convert a simple ISO-formatted valid string literal to a > timestamp value in more than one variation of sql statement.<br> Except that the behaviour of those "other 10 DBMSs" is *beyond standard*, the standard just rules the case where you state explicitly the type of the constant. You will find multitude of cases where DMBSs differ on those cases beyond standard -- that's due to different design decisions. What Kevin was trying to convey is that PostgreSQL's design decisions allow its users to do things other DBMSs can't -- and that's the price we'll have to pay. Note that behaviour is still within the standard (and not, as you seem to suggest), so not really a problem: you can write the query in a way which will work on "all those 11 DBMSs": just stick to the standard. > You can ignore this bug report and do whatever you want, just do not > say this is an accepted, standard or desired behavior of the server or > is by design. It's not by design that the error happens it is by faulty > handling of the distinct keyword.<br> Accepted -- by whom? Standard -- which standard? (because it is not required by ISO/ANSI, and there is no other "SQL standard" that I'm aware of). Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFMCdHgBcgs9XrR2kYRAjfsAJ0WVvm3AiFfN2jqIc24dqHVbyXM0QCeJqiQ I31OBlckZ7go48bXZx+YRpQ= =a7Pw -----END PGP SIGNATURE-----
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Craig Ringer
Date:
On 05/06/10 06:15, Farid Zidan wrote: > insert into test_insert > (col1, col2) > select *distinct* > 'b', > '2010-04-30 00:00:00' > >>Does not work. That's a bug. Not really. select distinct * from (VALUES ('b','2010-04-30 00:00:00'), ('b','2010-04-30 00:00:00'), ('b','20100430 000000') ) AS x(a,b); Does that produce the result you expected? It certainly didn't deduplicate the timestamps, yet it's doing exactly the correct thing. So this won't work: create table test_insert ( col1 char(8) not null, col2 TIMESTAMP not null default CURRENT_TIMESTAMP, UNIQUE(col2) ); insert into test_insert (col1, col2) select a, b::timestamp from ( select distinct * from (VALUES ('b','2010-04-30 00:00:00'), ('b','2010-04-30 00:00:00'), ('b','20100430 000000') ) AS x(a,b)) AS y; ... which is why your example is unsafe, and even if it appears to work on other databases it is buggy. Instead, write: insert into test_insert (col1, col2) select distinct 'b', CAST('2010-04-30 00:00:00' AS timestamp); ... which will be safe on any database, is (AFAIK) perfectly standard, and is fuss free. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> I only use ODBC escape sequences when necessary. Obviously I want to use standard sql syntax as much as possible. {fn user() } is handy because it works in all the databases that I work with and there is no substitute standard sql function for getting current userid that is cross-dbms. I also use {fn now()} which works across most ODBC drivers, but I can't in this case because I need to use a constant timestamp value so as not to change distinctness of the subquery that is the source for the insert. The datetime ISO-standard string format I am using works in all the databases I use 14+ (including PG), except in this case where 'distinct' is used with subquery in PG. Also not all PG clients use ODBC, so other PG clients will encounter this issue using standard ISO datetime string format when not using ODBC. I don't want to limit users to using ODBC for loading/updating the database by running sql scripts (which is what the sql for this issue is used for) so almost all of the database update/load scripts use generic sql where timestamp/datetime values are are written as ISO datetime format strings same format as '2010-04-30 00:00:00' BTW, I have also tested the sql in question with SQLite, MS Access, MS Excel and Sybase Adaptive Server 15 and it works with no error, so now that's 14 different DBMSs that have no issue with the ISO standard string format and distinct keyword. I guess I can find some workaround for this to work with ODBC just for the specific sql statements causing errors with PG, but that does not resolve the issue for PG clients not using ODBC. Like I said, I am reporting this issue so it can be identified and hopefully addressed at some point in the future, it is not critical for me for it to work right now, but that would be nice otherwise user will see a bunch of one-time errors and lose some ease of use but otherwise will not be too badly affected. Farid On 6/4/2010 9:42 PM, Kris Jurka wrote: <blockquote cite="mid:alpine.BSO.2.00.1006042140040.6416@leary.csoft.net" type="cite"> On Fri, 4 Jun 2010, Farid Zidan wrote: Here is actual statements I am running and like I said they work for all 9+ DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the current user ID): '2010-04-30 00:00:00', '2010-04-30 00:00:00', {fn user() } If you're into using standard ODBC escapes for portability, shouldn't you be using {ts '2010-04-30 00:00:00'}? http://msdn.microsoft.com/en-us/library/ms712360%28VS.85%29.aspx Kris Jurka -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> Greg, Obviously I do not agree. When 14 different databases by 14 different DBMS vendors from the largest to the smallest in the market can do a simple thing as a using a subquery that has distinct keyword and your DBMS can't, I would say your DBMS is at fault and is not better, rather is lacking in this respect. I am not expecting favors from the DBMS by its doing what I expect it to do. I do not want to beat an already dead horse, but if you review my example, you will see that it is very simple, PG already does conversion correctly from ISO string to timestamp column for inserting so you can't say we removed all conversions and that is a good thing, it is not. Basic feature of DBMS is allowing data entry into different data type columns using plain string literals. PG already does that and all other DBMS do that as well. For reference, although ODBC is not a DBMS, ODBC specification requires that an ODBC driver can convert all source DBMS data types from/to chars. This is not by accident, it is a necessity and is by design. I can understand that having multiple data formats for conversion to native data types from text can cause bugs and that's why we have established standards such as ISO for datetime/timestamp string formats and PG supports the conversion already. The issue is the PG is not doing it correctly when 'distinct' keyword is used in the select statement. There is nothing buggy with using ISO datetime string literals to insert into a table timestamp column. There is no behind the scene magic going on. 1 Execute subquery: string literals are just that can be 'aa', 'bb', '2010-04-30 00:00:00', whatever, it does not matter what the string literal is. 2 Eliminate duplicates 3 Now a string literal is being inserted into a timestamp column, you have a string literal and you are asked to insert into a timestamp colum -> convert string literal to timestamp and do the insert As you can see there is nothing buggy or heinous here, just simple select with distinct keyword in step 1, 2 and conversion from string literal to timestamp value in step 3 There is no ambiguity or magic to happen. Obviously in PG case there is some design or fault somewhere in this use-case when distinct keyword is used and is processed in step 2, that's all. Farid On 6/4/2010 10:41 PM, Greg Stark wrote: <blockquote cite="mid:AANLkTinvpWLi3CoBWYgNPxVdFbbLhEHZuYdp9buvDubV@mail.gmail.com" type="cite"> On Fri, Jun 4, 2010 at 11:15 PM, Farid Zidan <farid@zidsoft.com> wrote: Now this not rocket science, it's simple insert statement where we do not want duplicates inserted. Works on 10 other DBMSs. I find usually when one person is arguing something is complex and someone else is arguing it's simple it's the person who's claiming it's simple who is wrong. The other databases are not, I believe, preventing duplicates from being inserted as you describe. They are removing duplicates from the string constants and then silently converting to a different datatype before inserting. When postgres removed these default casts to text it turned up many instances where users had buggy code and Postgres had been hiding from them by silently using string operators which was not what users were expecting. In other words, while it might not matter in this case, in general if you code in this style your code is buggy and these other database implementations are not doing you any favours by making it appear to work correctly most of the time. -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> Craig I am not asking you to re-write my sql so the bug will not show. I am presenting you with sql that fails and shows the bug. If every time someone reported a bug you ask them to re-write their sql so the bug is not hit, that would not eliminate the bug. Also, you are using different timestamp string literals in your subquery. I am using the same constant datetime string literal in my example that the query processor does not need to cast to timestamp or anything to do the distinct part and eliminate duplicates. insert into test_insert (col1, col2) select distinct 'b', cast('2010-04-30 00:00:00' as timestamp) >This works as expected. However is not an option because it is not generic sql. In PG timestamp data type is called 'timestamp' but in another DBMS it may be called 'datetime', etc. ... which is why your example is unsafe, and even if it appears to work on other databases it is buggy. Instead, write: My example is safe and is cross-dbms. I am not doing anything extra-ordinary just select distinct where a constant string expression is used in the select list. select distinct 'b', '2010-04-30 00:00:00' Why is the sql above unsafe? It is not. It is simple select statement with two constant string expressions and distinct keyword. Now use the result of the sql above as source for inserting into test_table (col1, col2): insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' There is nothing unsafe here. You have a resultset that has one row with the values 'b', '2010-04-30 00:00:00' being used to insert int col1, col2. Why would you say that's unsafe? '2010-04-30 00:00:00' is an ISO string literal being inserted into col2 whose data type is timestamp, perfectly safe. Farid On 6/5/2010 3:26 AM, Craig Ringer wrote: <blockquote cite="mid:4C09FC41.4000304@postnewspapers.com.au" type="cite"> On 05/06/10 06:15, Farid Zidan wrote: insert into test_insert (col1, col2) select *distinct* 'b', '2010-04-30 00:00:00' Does not work. That's a bug. Not really. select distinct * from (VALUES ('b','2010-04-30 00:00:00'), ('b','2010-04-30 00:00:00'), ('b','20100430 000000') ) AS x(a,b); Does that produce the result you expected? It certainly didn't deduplicate the timestamps, yet it's doing exactly the correct thing. So this won't work: create table test_insert ( col1 char(8) not null, col2 TIMESTAMP not null default CURRENT_TIMESTAMP, UNIQUE(col2) ); insert into test_insert (col1, col2) select a, b::timestamp from ( select distinct * from (VALUES ('b','2010-04-30 00:00:00'), ('b','2010-04-30 00:00:00'), ('b','20100430 000000') ) AS x(a,b)) AS y; ... which is why your example is unsafe, and even if it appears to work on other databases it is buggy. Instead, write: insert into test_insert (col1, col2) select distinct 'b', CAST('2010-04-30 00:00:00' AS timestamp); ... which will be safe on any database, is (AFAIK) perfectly standard, and is fuss free. -- Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually <font size="-1">between two databases using ODBC drivers
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Dimitri Fontaine
Date:
Farid Zidan <farid@zidsoft.com> writes: > I am not asking you to re-write my sql so the bug will not show. I am > presenting you with sql that fails and shows the bug. If every time > someone reported a bug you ask them to re-write their sql so the bug > is not hit, that would not eliminate the bug. You're not reading us. Your example makes no sense at all, and hides the problem you have. You are failing to understand the situation because of this. If you want to understand the problem you have, please go test your 14 databases with the meaningful tests that have been offered to you. Until and unless you do so, I don't think posting here again on this issue has any value, for you nor for us. You're abusing the time of lots of people who are both busy and helpful, so please try to understand the advices and analysis they've been offering to you. Regards, -- dim
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
Dimitri This is the last you will hear from me about this issue. I would have stopped repsonding long time ago, but I kept getting responses that required reply. Do not knock on the door if you don't want someone to answer the door. There is no point in spending more of my time explaining to you simple things you seem incapable of understanding. You make no technical point, you make a personal opinion. You are entitled to your opinion. Software is not built on personal opinions, software is built on good technical/logical analysis of issues of which you offer none. I really did not expect a debate. I reported an issue that is clearly a bug. I expected "yes, we see, we will address this at some point, thank you." I really don't mind explaining the issue, but what I do mind is people like you who don't have a technical understanding of the issue and nonetheless appoint themselves to speak for PostgreSQL/MySQL or any organization about things they do not understand. Dmitri, it is good you don't work for me, for if you did you would not have a job. The plight of the software industry is people who when confronted with a logical argument revert to ad hominem response because they have no valid technical response. And of course you should speak only for yourself because you are not PostgreSQL and you don't not represent PostgreSQL or anbody else in the collaborative software development spirit. The arrogance you show is appalling. > Until and unless you do so, I don't think posting here again on this > issue has any value, for you nor for us. You're abusing the time of lots > of people who are both busy and helpful, so please try to understand the > advices and analysis they've been offering to you I see no technical analysis in your response. If you too busy to engage in logical debate, you should remove yourself from the bug list. Software is built on logical analysis. You are too busy, do not participate. Farid On 6/5/2010 4:16 PM, Dimitri Fontaine wrote: > Farid Zidan<farid@zidsoft.com> writes: > >> I am not asking you to re-write my sql so the bug will not show. I am >> presenting you with sql that fails and shows the bug. If every time >> someone reported a bug you ask them to re-write their sql so the bug >> is not hit, that would not eliminate the bug. >> > You're not reading us. > Your example makes no sense at all, and hides the problem you have. You > are failing to understand the situation because of this. > > If you want to understand the problem you have, please go test your 14 > databases with the meaningful tests that have been offered to you. > > Until and unless you do so, I don't think posting here again on this > issue has any value, for you nor for us. You're abusing the time of lots > of people who are both busy and helpful, so please try to understand the > advices and analysis they've been offering to you. > > Regards, > -- www.zidsoft.com <http://www.zidsoft.com/> CompareData: compare and synchronize SQL DBMS data visually between two databases using ODBC drivers
On 5/06/2010 9:39 PM, Farid Zidan wrote: > Craig > > I am not asking you to re-write my sql so the bug will not show. I am > presenting you with sql that fails and shows the bug. Before writing this off completely, please read on. I suspect I may've misunderstood your argument, and on further examination think there's something that bears design consideration here. It's not a bug, as it's working exactly as designed, but it _is_ something where the design might benefit from a tweak. In the mean time, you have a system that doesn't work how you want it do - design choice or bug, workaround or bug-hiding, the effect is the same. You must already be handing differing names of timestamp/datetime types in your DDL, so why not use the same mapping in your DML? As for the root of the issue: Here's why implicitly casting those literals to timestamps would be a bad idea for solid, technical reasons: - The semantics of a nested/sub query should not be affected by the calling context, ie the surrounding query. To have a function affected by what the caller is doing would be bizarre and confusing; so it is with a subquery. You cannot test functional units or rely on any kind of consistent behaviour if calling context changes callee behaviour. - A SELECT as a value-supplier to an INSERT is a type of subquery - If your SELECT ran differently in the context of the INSERT to how it ran standalone, that would not only be incredibly confusing but also clearly a bug. ... so we clearly can't use type information from the surrounding INSERT to determine the data type of the literals used in the SELECT. With no type information to the contrary they must be interpreted as text. So, after SELECT evaluation your query goes from: insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00'; to effectively: insert into test_insert (col1, col2) values ('b'::text, '2010-04-30 00:00:00'::text); Now, when executed with Pg both those queries result in: ERROR: column "col2" is of type timestamp without time zone but expression is of type text showing that your issue isn't actually with DISTINCT at all, but with Pg's unwillingness to *implicitly* cast a value of explict text type to another type. This is the real core of your complaint. You want Pg to cast from 'text' to 'timestamp' on INSERT. Pg doesn't. In the general case (not specific to INSERT) there are some pretty good reasons for that, but for INSERTs... should it? In the context of an INSERT, where the desired data type is explicitly specified and obvious, I'm not sure, but I suspect that it should. Anyone familiar with the details of the implicit-text-cast bugs want to pipe up with whether it'd be safe to force a cast in the context of an INSERT? ( As for why I adjusted your example: Your example query was unrealistically simplistic and clearly couldn't be what you were using in your app. Using 'DISTINCT' with one value is pointless. So, my example added some dummy values to illustrate why it might be unsafe to use it how you're trying to. As it turns out, in your app's case you can guarantee input formatting consistency, so it's safe for you so long as you stick very closely to timestamp formatting specifics, but I'm sure you can see that a database's behaviour depending on the formatting of timestamps is probably not something that most people with most apps would be happy with. Even if INSERT did convert the results of the SELECT DISTINCT subquery to timestamps, it'd still be pretty unsafe.) -- Craig Ringer
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Dimitri Fontaine
Date:
Farid Zidan <farid@zidsoft.com> writes: > I see no technical analysis in your response. I made a clear reference to what others have been saying, and that you have been refusing to read. I don't see any point in getting technical again, as I'm sure you will refuse to understand what is happening to you here. But as you insist, let's try once more: - your consider a bug where PostgreSQL wants to know more about the data type you're using in the query in order to be able to enforce a DISTINCT restriction - other products are happy to solve the DISTINCT restriction without any hint as far as what the datatype really is - the error message is perfectly clear about what PostgreSQL needs from you - the reason why PostgreSQL wants you to give it details is clear to: what means DISTINCT depends on the datatype, you can easily have two different text representations of the same timestamptz, for example - it could be considered a possible area of improvement in the system that the SELECT part of the INSERT INTO ... SELECT ... could determine the SELECT output columns type by looking at the INSERT target table definition - it would be asking for a new feature to be considered, not for a bug to be solved: the system currently works exactly as designed. Now if you keep wanting not to understand how the system is currently designed, that makes it your problem, no one else. Regards, -- dim
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Craig Ringer
Date:
On 7/06/2010 3:51 AM, Dimitri Fontaine wrote: > - other products are happy to solve the DISTINCT restriction without > any hint as far as what the datatype really is ... and so is Pg. That's not the problem; Pg complains after resolving the SELECT DISTINCT, when it finds that it's trying to insert values of type 'text' into a column with type 'timestamp'. You'll get exactly the same error if you replace the OP's SELECT DISTINCT subquery with a VALUES list that explicitly specifies TEXT type. The other clue as to what's happening is that if you run the SELECT part of the query standalone, it executes fine, treating the passed values as 'text'. Personally, I do think this is a bit of a wart. I know why the explicit casts around text were removed, but in the case of INSERT I'm not sure the current behaviour is desirable. I initially thought the OP was asking for Pg to infer the type of the timestamp literals from the surrounding INSERT, and for that reason was very much against the idea. After realizing that what they really expect is for the SELECT to interpret the literals as 'text' (just as it does) then Pg to implicitly cast the 'text' query results to 'timestamp', I can see why they want it and why they're frustrated with the current behaviour. > - the error message is perfectly clear about what PostgreSQL needs from > you Apparently not, as you seem to have misunderstood it ;-) > - the reason why PostgreSQL wants you to give it details is clear to: > what means DISTINCT depends on the datatype, you can easily have two > different text representations of the same timestamptz, for example That's not why Pg reports an error. If it was, then the following query would not be legal: SELECT DISTINCT x.* FROM (VALUES ('a'),('b'),('c')) AS x; ... since there's no explicit type info provided. Pg follows the SQL rules and interprets literals as text if there's no explicit type info provided and no immediate context that requires a particular type. So the above work fine, treating 'a', 'b', and 'c' as if they were written: (TEXT 'a'), (TEXT 'b'), (TEXT 'c') > - it could be considered a possible area of improvement in the system > that the SELECT part of the INSERT INTO ... SELECT ... could > determine the SELECT output columns type by looking at the INSERT > target table definition I'd say that'd be a ghastly "solution". It'd change the meaning of the SELECT based on the surrounding INSERT. Imagine trying to figure out what was going on with a query that wasn't doing what you wanted when you couldn't run it standalone and know the results were the same! -- Craig Ringer
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Craig Ringer
Date:
I know it's not what you want (Pg behaving how you expect out of the box) but creating implicit casts to the desired types will resolve your immediate issue. You still have to run some Pg-specific code, but it can be restricted to your DDL where there's (presumably) already plenty. See: http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/ http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html http://wiki.postgresql.org/images/d/d1/Pg83-implicit-casts.sql -- Craig Ringer
Re: Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Dimitri Fontaine
Date:
Craig Ringer <craig@postnewspapers.com.au> writes: >> - other products are happy to solve the DISTINCT restriction without >> any hint as far as what the datatype really is > > ... and so is Pg. That's not the problem; Pg complains after resolving the > SELECT DISTINCT, when it finds that it's trying to insert values of type > text' into a column with type 'timestamp'. Ah yes. I've been paying more attention to the energy people have been willing to put into helping the OP than into what the problem really is. Thanks for clarifying. -- dim
Re: BUG #5490: Using distinct for select list causes insert of timestamp string literal to fail
From
Farid Zidan
Date:
Craig, > I know it's not what you want (Pg behaving how you expect out of the > box) but creating implicit casts to the desired types will resolve > your immediate issue. You still have to run some Pg-specific code, but > it can be restricted to your DDL where there's (presumably) already > plenty. My main concern is identifying this issue by PosgtreSQL developers so it can be addressed as some point in the future. I believe that is done now and the issue is identified. As it happens the sql in question is part of a common generic sql script file where the same script file is run against 10+ different DBMSs to update the database data from a previous release (all 10+ DBMS schemas have the same table/column names, logical structure, logical data types, etc). Having said that, I can introduce a PG-only data update file as a workaround for PG in this case but not without complicating things quite a bit. However, due to limited time and schedule, I will be just accepting the error (based on cost-analysis of the error) in this case. Best regards, Farid On 6/6/2010 11:17 PM, Craig Ringer wrote: > I know it's not what you want (Pg behaving how you expect out of the > box) but creating implicit casts to the desired types will resolve > your immediate issue. You still have to run some Pg-specific code, but > it can be restricted to your DDL where there's (presumably) already > plenty. > > See: > > http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/ > > > http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html > > > http://wiki.postgresql.org/images/d/d1/Pg83-implicit-casts.sql > > -- > Craig Ringer > > -- www.zidsoft.com <http://www.zidsoft.com/> CompareData: compare and synchronize SQL DBMS data visually between two databases using ODBC drivers