Re: Bug in to_timestamp(). - Mailing list pgsql-hackers

From Artur Zakirov
Subject Re: Bug in to_timestamp().
Date
Msg-id b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru
Whole thread Raw
In response to Re: Bug in to_timestamp().  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Bug in to_timestamp().  (amul sul <sul_amul@yahoo.co.in>)
Re: Bug in to_timestamp().  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello,

On 14.07.2016 12:16, Pavel Stehule wrote:
>
> last point was discussed in thread related to to_date_valid function.
>
> Regards
>
> Pavel

Thank you.

Here is my patch. It is a proof of concept.

Date/Time Formatting
--------------------

There are changes in date/time formatting rules:

- now to_timestamp() and to_date() skip spaces in the input string and
in the formatting string unless FX option is used, as Amul Sul wrote on
first message of this thread. But Ex.2 gives an error now with this
patch (should we fix this too?).

- in the code space characters and separator characters have different
types of FormatNode. Separator characters are characters ',', '-', '.',
'/' and ':'. This is done to have different rules of formatting to space
and separator characters.
If FX option isn't used then PostgreSQL do not insist that separator in
the formatting string should match separator in the formatting string.
But count of separators should be equal with or without FX option.

- now PostgreSQL check is there a closing quote. Otherwise the error is
raised.

Still PostgreSQL do not insist that text character in the formatting
string should match text character in the input string. It is not
obvious if this should be fixed. Because we may have different character
case or character with accent mark or without accent mark.
But I suppose that it is not right just check text character count. For
example, there is unicode version of space character U+00A0.

Code changes
------------

- new defines:

#define NODE_TYPE_SEPARATOR    4
#define NODE_TYPE_SPACE        5

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.


This patch do not handle all noticed issues in this thread, since still
there is not consensus about them. So this patch in a proof of concept
status and it can be changed.

Of course this patch can be completely wrong. But it tries to introduce
more formal rules for formatting.

I will be grateful for notes and remarks.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Palle Girgensohn
Date:
Subject: Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Next
From: Aleksander Alekseev
Date:
Subject: [Patch] New psql prompt substitution %r (m = master, r = replica)