Home > mailing lists

Re: Allow to_date() and to_timestamp() to accept localized names - Mailing list pgsql-hackers

From	Juan José Santamaría Flecha
Subject	Re: Allow to_date() and to_timestamp() to accept localized names
Date	January 24, 2020 09:48:19
Msg-id	CAC+AXB1fQtg-iWH=4qZnV0xYm08WR6W7AhoApKA3tkFsWgf90A@mail.gmail.com Whole thread Raw
In response to	Re: Allow to_date() and to_timestamp() to accept localized names (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Allow to_date() and to_timestamp() to accept localized names Re: Allow to_date() and to_timestamp() to accept localized names
List	pgsql-hackers

Tree view

On Thu, Jan 23, 2020 at 11:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thank you for your time looking into this.

Here's a v7 patch, rebased over my recent hacking, and addressing
most of the complaints I raised in <31691.1579648824@sss.pgh.pa.us>.
However, I've got some new complaints now that I've looked harder
at the code:

* It's not exactly apparent to me why this code should be concerned
about non-normalized characters when noplace else in the backend is.
I think we should either rip that out entirely, or move the logic
into str_tolower (and hence also str_toupper etc). I'd tend to favor
the former, but of course I don't speak any languages where this would
be an issue. Perhaps a better idea is to invent a new SQL function
that normalizes a given string, and expect users to call that first
if they'd like to_date() to match unnormalized text.

There is an open patch that will make the normalization functionality user visible [1]. So, if a user can call to_date(normalize('01 ŞUB 2010'), 'DD TMMON YYYY') I would vote to drop the normalization logic inside this patch altogether.

* I have no faith in this calculation that decides how long the match
length was:

*len = element_len + name_len - norm_len;

I seriously doubt that this does the right thing if either the
normalization or the downcasing changed the byte length of the
string. I'm not actually sure how we can do that correctly.
There's no good way to know whether the changes happened within
the part of the "name" string that we matched, or the part beyond
what we matched, but we only want to count the former. So this
seems like a pretty hard problem, and even if this logic is somehow
correct as it stands, it needs a comment explaining why.

The proper logic would come from do_to_timestamp() receiving a normalized "date_txt" input, so we do not operate with unnormalize and normalize strings at the same time.

* I'm still concerned about the risk of integer overflow in the
string allocations in seq_search_localized(). Those need to be
adjusted to be more paranoid, similar to the code in e.g. str_tolower.

Please find attached a patch with the normalization logic removed, thus no direct allocations in seq_search_localized().

I would like to rise a couple of questions myself:

* When compiled with DEBUG_TO_FROM_CHAR, there is a warning "‘dump_node’ defined but not used". Should we drop this function or uncomment its usage?

* Would it be worth moving str_tolower(localized_name) from seq_search_localized() into cache_locale_time()?

[1] https://www.postgresql.org/message-id/014866c8-d7ff-2a4f-c185-cf7e3ceb7028%402ndquadrant.com

Regards,

Juan José Santamaría Flecha

Attachment

0001-Allow-localized-month-names-to_date-v8.patch

pgsql-hackers by date:

From: Konstantin Knizhnik
Date: 24 January 2020, 09:43:10
Subject: Re: [Proposal] Global temporary tables

From: Peter Eisentraut
Date: 24 January 2020, 10:04:25
Subject: Re: Preserve versions of initdb-created collations in pg_upgrade

Re: Allow to_date() and to_timestamp() to accept localized names - Mailing list pgsql-hackers

Attachment

Previous

Next