Re: Issue in to_timestamp/to_date while handling the quoted literal string - Mailing list pgsql-hackers

From Brendan Jurd
Subject Re: Issue in to_timestamp/to_date while handling the quoted literal string
Date
Msg-id CADxJZo2WbU8w+vBHA9fZ7HkfVnwABMwe3N6+t5BwsLATi8vNbw@mail.gmail.com
Whole thread Raw
In response to Issue in to_timestamp/to_date while handling the quoted literal string  (Suraj Kharage <suraj.kharage@enterprisedb.com>)
Responses Re: Issue in to_timestamp/to_date while handling the quoted literal string  (Suraj Kharage <suraj.kharage@enterprisedb.com>)
List pgsql-hackers
Hi Suraj,

I think the documentation is reasonably clear about this behaviour, quote:

" In to_date, to_number, and to_timestamp, literal text and double-quoted strings result in skipping the number of characters contained in the string; for example "XX" skips two input characters (whether or not they are XX)."

I can appreciate that this isn't the behaviour you intuitively expected from to_timestamp, and I don't think you'd be the first or the last.  The purpose of these functions was never to validate that your input string precisely matches the non-coding parts of your format pattern.  For that, I think you'd be better served by using regular expressions.

Just as an aside, in the example you gave, the string '2019-05-24T23:12:45' will cast directly to timestamp just fine, so it isn't the kind of situation to_timestamp was really intended for.  It's more for when your input string is in an obscure (or ambiguous) format that is known to you in advance.

I hope that helps.

Cheers
Brendan

On Wed, 24 Jul 2019 at 21:38, Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Hi,

I noticed the issue in to_timestamp()/to_date() while handling the double quote literal string. If any double quote literal characters found in format, we generate the NODE_TYPE_CHAR in parse format and store that actual character in FormatNode->character. n DCH_from_char, we just increment the input string by length of character for NODE_TYPE_CHAR.
We are actually not matching these characters in input string and because of this, date values get changed if quoted literal string is not identical in input and format string.

e.g:

postgres@78619=#select to_timestamp('2019-05-24T23:12:45', 'yyyy-mm-dd"TT"hh24:mi:ss');
       to_timestamp        
---------------------------
 2019-05-24 03:12:45+05:30
(1 row)


In above example, the quoted string is 'TT', so it just increment the input string by 2 while handling these characters and returned the wrong hour value.

My suggestion is to match the exact characters from quoted literal string in input string and if doesn't match then throw an error.

Attached is the POC patch which almost works for all scenarios except for whitespace - as a quote character.

Suggestions?
--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Fetching timeline during recovery
Next
From: "Daniel Verite"
Date:
Subject: Re: psql - add SHOW_ALL_RESULTS option