Thread: Junk date getting uploaded into date field
We have a csv file which we upload into postgres DB. If there are some errors, like a data mismatch with the database table columns, postgres should raise and error and upload should fail. What is happening now is that, in case we get some junk date in the upload file, postgres does auto-correction and does not raise an error! A wrong date like ,say, 33-Oct-2013 gets converted as 2-Nov-2013 when the data is loaded into a date field. No error raised.Data gets uploaded! How can I prevent this ? I don't want this junk data to get loaded. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Junk-date-getting-uploaded-into-date-field-tp5776969.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 11/5/2013 1:04 AM, bsreejithin wrote: > A wrong date like ,say, 33-Oct-2013 gets converted as 2-Nov-2013 when the > data is loaded into a date field. No error raised.Data gets uploaded! ummmmm. postgresql won't do that conversion postgres=# select '33-Oct-2013'::date; ERROR: date/time field value out of range: "33-Oct-2013" LINE 1: select '33-Oct-2013'::date; ^ . how exactly are you inserting this CSV data into postgres ? -- john r pierce 37N 122W somewhere on the middle of the left coast
bsreejithin, 05.11.2013 10:04: > We have a csv file which we upload into postgres DB. If there are some > errors, like a data mismatch with the database table columns, postgres > should raise and error and upload should fail. > > What is happening now is that, in case we get some junk date in the upload > file, postgres does auto-correction and does not raise an error! > > A wrong date like ,say, 33-Oct-2013 gets converted as 2-Nov-2013 when the > data is loaded into a date field. No error raised.Data gets uploaded! > > How can I prevent this ? I don't want this junk data to get loaded. > I'm pretty sure Postgres will not allow that. If I had to guess: you are using a Java program which uses SimpleDateFormat and the "lenient" parsing was not disabled. Because that's exactly what happens with a SimpleDateFormat in it's default configuration. http://docs.oracle.com/javase/6/docs/api/java/text/DateFormat.html#setLenient%28boolean%29
Not able to post the attached details as a comment in the reply box, so attaching it as an image file : <http://postgresql.1045698.n5.nabble.com/file/n5776987/Untitled.png> -- View this message in context: http://postgresql.1045698.n5.nabble.com/Junk-date-getting-uploaded-into-date-field-tp5776969p5776987.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
bsreejithin, 05.11.2013 13:14: > Not able to post the attached details as a comment in the reply box, so > attaching it as an image file : > <http://postgresql.1045698.n5.nabble.com/file/n5776987/Untitled.png> It would have much easier if you had simply used copy & paste to post a text version of that SQL. Does your mail client not allow you to do that? But your test case is essentially this: select to_date('33-OCT-2013', 'dd-mon-yyyy') which indeed returns 2013-11-02 (using 9.3.1) I don't know if this is inteded or actually a bug - I can't find anything in the docs relating to that behaviour. Thomas
I am not using any mail client.I was directly trying to post the content I attached.Got a connection reset by peer error on submit.Thought some issue with the browser - so tried with firefox, chrome as well, in addition to IE - got the same error there also. Any way, that's not the issue. I am not sure why : select to_date('33-OCT-2013', 'dd-mon-yyyy') is returning 2013-11-02. For cases like the issue I am facing, where we need to raise an error saying the data is wrong, DB manipulating the data is not proper. May be there could be some uses cases for such data manipulation. But then, in that case, it would have been better to have a parameter which can switch ON/OFF this behavior. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Junk-date-getting-uploaded-into-date-field-tp5776969p5776992.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Em 05/11/2013 10:36, Thomas Kellerer escreveu: > bsreejithin, 05.11.2013 13:14: >> Not able to post the attached details as a comment in the reply box, so >> attaching it as an image file : >> <http://postgresql.1045698.n5.nabble.com/file/n5776987/Untitled.png> > It would have much easier if you had simply used copy & paste to post a text version of that SQL. > Does your mail client not allow you to do that? > > But your test case is essentially this: > > select to_date('33-OCT-2013', 'dd-mon-yyyy') > > which indeed returns 2013-11-02 (using 9.3.1) > > I don't know if this is inteded or actually a bug - I can't find anything in the docs relating to that behaviour. > > Thomas > > > > I also don't know if this is intended or a bug, but for me, it seems to be right: 2013-11-02 is the 33 day counting from 2013-10-01. Edson
Thomas Kellerer wrote: > bsreejithin, 05.11.2013 13:14: >> Not able to post the attached details as a comment in the reply box, so >> attaching it as an image file : >> <http://postgresql.1045698.n5.nabble.com/file/n5776987/Untitled.png> > > It would have much easier if you had simply used copy & paste to post a text version of that SQL. > Does your mail client not allow you to do that? > > But your test case is essentially this: > > select to_date('33-OCT-2013', 'dd-mon-yyyy') > > which indeed returns 2013-11-02 (using 9.3.1) > > I don't know if this is inteded or actually a bug - I can't find anything in the docs relating to that > behaviour. There is a comment in utils/adt/formatting.c: * This function does very little error checking, e.g. * to_timestamp('20096040','YYYYMMDD') works So at least this is not by accident. On the other hand, I have always thought that these functions are for Oracle compatibility, and sqlplus says: Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production SQL> SELECT to_date('20096040','YYYYMMDD') FROM dual; SELECT to_date('20096040','YYYYMMDD') FROM dual * ERROR at line 1: ORA-01843: not a valid month I don't know if that should be fixed, but fixing it might break SQL that deliberately uses the current behaviour. Yours, Laurenz Albe
On Tue, 2013-11-05 at 04:14 -0800, bsreejithin wrote: > Not able to post the attached details as a comment in the reply box, so > attaching it as an image file : > <http://postgresql.1045698.n5.nabble.com/file/n5776987/Untitled.png> > > > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/Junk-date-getting-uploaded-into-date-field-tp5776969p5776987.html > Sent from the PostgreSQL - general mailing list archive at Nabble.com. > > to achieve what you want.... bdate needs to be defined as a date, not a string. test=# create table temptabl(bdate date); CREATE TABLE Time: 239.358 ms test=# insert into temptabl values('33-OCT-2013'); ERROR: date/time field value out of range: "33-OCT-2013" LINE 1: insert into temptabl values('33-OCT-2013'); ^ HINT: Perhaps you need a different "datestyle" setting. Time: 0.288 ms test=#
Albe Laurenz <laurenz.albe@wien.gv.at> writes: > There is a comment in utils/adt/formatting.c: > * This function does very little error checking, e.g. > * to_timestamp('20096040','YYYYMMDD') works > So at least this is not by accident. No, it isn't. This is in fact the traditional behavior of Unix time conversion utilities such as mktime(3). The glibc man page gives the specific example that "40 October" will be converted to "9 November" rather than throwing an error. The POSIX and C standards are not so explicit, saying only that the inputs are not restricted to the normal ranges (which I think would entitle an implementation to change 40 October to 31 October instead; but I've never heard of anyone doing it that way). regards, tom lane
bsreejithin <bsreejithin@gmail.com> writes: > I am not sure why : select to_date('33-OCT-2013', 'dd-mon-yyyy') > is returning 2013-11-02. > For cases like the issue I am facing, where we need to raise an error saying > the data is wrong, DB manipulating the data is not proper. Then don't use to_date(). Just entering the string through the normal date type input function will do what you want. to_date() is meant for trying to extract data from weird input formats. regards, tom lane
On 11/5/13, bsreejithin <bsreejithin@gmail.com> wrote: > > I am not sure why : select to_date('33-OCT-2013', 'dd-mon-yyyy') > > is returning 2013-11-02. > > For cases like the issue I am facing, where we need to raise an error > saying > the data is wrong, DB manipulating the data is not proper. Try using a cast to date instead: select '33-oct-2013'::date throws an error. -- Mike Nolan
On 11/05/2013 05:29 AM, Albe Laurenz wrote: > Thomas Kellerer wrote: >> bsreejithin, 05.11.2013 13:14: >>> Not able to post the attached details as a comment in the reply box, so >>> attaching it as an image file : >>> <http://postgresql.1045698.n5.nabble.com/file/n5776987/Untitled.png> >> It would have much easier if you had simply used copy & paste to post a text version of that SQL. >> Does your mail client not allow you to do that? >> >> But your test case is essentially this: >> >> select to_date('33-OCT-2013', 'dd-mon-yyyy') >> >> which indeed returns 2013-11-02 (using 9.3.1) >> >> I don't know if this is inteded or actually a bug - I can't find anything in the docs relating to that >> behaviour. > There is a comment in utils/adt/formatting.c: > > * This function does very little error checking, e.g. > * to_timestamp('20096040','YYYYMMDD') works > > I think the place for such warnings in addition to the source-code is in the documentation. This or similar issues with to_date have popped up on the lists a number of times. Perhaps a "see warnings below" by the to_date description in table: http://www.postgresql.org/docs/9.3/static/functions-formatting.html#FUNCTIONS-FORMATTING-TABLE Then under usage notes something like: The to_date and to_timestamp functions do minimal input error-checking and are intended for conversion of non-standard formats that cannot be handled by casting. These functions will attempt to convert illegal dates to the best of their ability, e.g. to_date('33-OCT-2013', 'dd-mon-yyyy') will return 2013-11-02. Users of these functions are advised to perform whatever external error-checking they deem prudent. Cheers, Steve
Steve Crawford wrote: >> There is a comment in utils/adt/formatting.c: >> >> * This function does very little error checking, e.g. >> * to_timestamp('20096040','YYYYMMDD') works >> >> > I think the place for such warnings in addition to the source-code is in > the documentation. This or similar issues with to_date have popped up on > the lists a number of times. > > Perhaps a "see warnings below" by the to_date description in table: > http://www.postgresql.org/docs/9.3/static/functions-formatting.html#FUNCTIONS-FORMATTING-TABLE > > Then under usage notes something like: > > The to_date and to_timestamp functions do minimal input error-checking > and are intended for conversion of non-standard formats that cannot be > handled by casting. These functions will attempt to convert illegal > dates to the best of their ability, e.g. to_date('33-OCT-2013', > 'dd-mon-yyyy') will return 2013-11-02. Users of these functions are > advised to perform whatever external error-checking they deem prudent. I like that. Would you write a patch and add it to the commitfest? Yours, Laurenz Albe
On 11/5/2013 10:29 AM, Steve Crawford wrote: > The to_date and to_timestamp functions do minimal input error-checking > and are intended for conversion of non-standard formats that cannot be > handled by casting. These functions will attempt to convert illegal > dates to the best of their ability, e.g. to_date('33-OCT-2013', > 'dd-mon-yyyy') will return 2013-11-02. Users of these functions are > advised to perform whatever external error-checking they deem prudent. I think this should also point out that this behavior is different than cast, which is much stricter. the fact that these two methods are polar opposites in this behavior is troublesome. -- john r pierce 37N 122W somewhere on the middle of the left coast