Thread: Re: [BUGS] surprising to_timestamp behavior

Re: [BUGS] surprising to_timestamp behavior

From
Jeevan Chalke
Date:



On Thu, Oct 31, 2013 at 10:50 AM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:



On Tue, Oct 29, 2013 at 11:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 29, 2013 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> It turns out that when you use the to_timestamp function, a space in
>> the format mask can result in skipping any character at all, even a
>> digit, in the input string.  Consider this example, where 10 hours are
>> lost:
>
>> rhaas=# select to_timestamp('2013-10-29 10:47:18', 'YYYY-MM-DD  HH24:MI:SS');
>>       to_timestamp
>> ------------------------
>>  2013-10-29 00:47:18-04
>> (1 row)
>
> And that's a bug why?  The format says to ignore two characters before the
> hours field.  I think you're proposing to remove important functionality.
>
> To refine the point a bit, it's absolutely stupid to be using to_timestamp
> at all for sane input data like this example.  Just cast the string to
> timestamp(tz), and the standard datatype input function will do a better
> job than to_timestamp ever would.  The point of to_timestamp, IMNSHO,
> is to extract data successfully from weirdly formatted input; which might
> well include cases where there are stray digits you don't want taken as
> data.  So I'm not on board with proposals to "fix" cases like this by
> making the format string's meaning squishier.

Well, you're the second person to react that way to this proposal, but
the current behavior seems mighty odd to me - even odder, now that I
realize that we'll happily match '"cat'" to 'dog'.  I just work here,
though.

Well, I agree with Tom that user provided two spaces to skip before hours
and this is what we are exactly doing.

Still here are few other observations:


(1) I don't see following as wrong output in postgresql as I already said
above and agreed with Tom. (in input, only one space between DD and HH24, but
in mask we have 2 spaces)

postgres=# select to_timestamp('2011-03-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
       to_timestamp      
---------------------------
 2011-03-18 03:38:15+05:30
(1 row)

(Note that, time 23 became 03, due to extra space in mask eating 2 in 23,
resulting in 3 for HH24. But fair enough, as expected and thus NO issues)


(2) But I see following buggy (both in input and mask we have 2 spaces
between DD and HH24)

postgres=# select to_timestamp('2011-03-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
       to_timestamp      
---------------------------
 2011-03-18 03:38:15+05:30
(1 row)

(Note that, this time we should not end up with eating 2 from 23 as we have
exact spaces in mask and input. NOT so good and NOT expected, looks like BUG)

So I think we need to resolve second case.

Attached patch which fixes this issue.

I have just tweaked the code around ignoring spaces in DCH_from_char() function.

Adding to CommitFest 2014-01 (Open).

Thanks


Thanks
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company




--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: [BUGS] surprising to_timestamp behavior

From
Tom Lane
Date:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
> Attached patch which fixes this issue.

> I have just tweaked the code around ignoring spaces in DCH_from_char()
> function.

> Adding to CommitFest 2014-01 (Open).

I went to review this, and found that there's not actually a patch
attached ...
        regards, tom lane



Re: [BUGS] surprising to_timestamp behavior

From
Jeevan Chalke
Date:


I went to review this, and found that there's not actually a patch
attached ...

                        regards, tom lane

Attached. Sorry for that.

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

Re: [BUGS] surprising to_timestamp behavior

From
Tom Lane
Date:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
>> I went to review this, and found that there's not actually a patch
>> attached ...

> Attached. Sorry for that.

This looks good to me except for one thing: if the upcoming node is a
DCH_FX action (ie, turn on fx_mode), I don't think we want to skip
whitespace.  The original coding had that bug too, but it was only
exposed if the FX prefix was immediately preceded by whitespace in
the format.

It's a bit hard to think of useful cases where this would make a
difference, because it turns out that from_char_parse_int contains
an internal skip-whitespace action that's applied regardless of FX
mode, so it really only matters for non-integer field types.  I
kinda think that maybe now we should remove that extra skip, or at
least count the skipped whitespace against the field width, but
did not experiment to see what the results would be.

Committed with that change and some cosmetic adjustments.
        regards, tom lane