Re: Allow to_date() and to_timestamp() to accept localized names - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Allow to_date() and to_timestamp() to accept localized names
Date
Msg-id 2f83150e-a2c0-6318-3125-d9b86e421aa2@2ndquadrant.com
Whole thread Raw
In response to Re: Allow to_date() and to_timestamp() to accept localized names  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Allow to_date() and to_timestamp() to accept localized names  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2020-01-24 17:22, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> But that's a different POV.  The input to this function could come from
>> arbitrary user input from any application whatsoever.  So the only
>> reason we can get away with that is because the example regression case
>> Juan José added (which uses non-normals) does not conform to the
>> standard.
> 
> I'm unsure about "conforming to standard", but I think it's reasonable
> to put the onus of doing normalization when necessary on the user.
> Otherwise, we need to move normalization logic into basically all
> the string processing functions (even texteq), which seems like a
> pretty huge cost that will benefit only a small minority of people.
> (If it's not a small minority, then where's the bug reports complaining
> that we don't do it today?)

These reports do exist, and this behavior is known.  However, the impact 
is mostly that results "look wrong" (looks the same but doesn't compare 
as equal) rather than causing inconsistency and corruption, so it's 
mostly shrugged off.  The nondeterministic collation feature was 
introduced in part to be able to deal with this; the pending 
normalization patch is another.  However, this behavior is baked deeply 
into Unicode, so no single feature or facility will simply make it go away.

AFAICT, we haven't so far had any code that does a lookup of non-ASCII 
strings in a table, so that's why we haven't had this discussion yet.

Now that I think about it, you could also make an argument that this 
should be handled through collation, so the function that looks up the 
string in the locale table should go through texteq.  However, this 
would mostly satisfy the purists but create a bizarre user experience.

Looking through the patch quickly, if you want to get Unicode-fancy, 
doing a case-insensitive comparison by running lower-case on both 
strings is also wrong in corner cases.  All the Greek month names end in 
sigma, so I suspect that this patch might not work correctly in such cases.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: making the backend's json parser work in frontend code
Next
From: Tom Lane
Date:
Subject: Re: Allow to_date() and to_timestamp() to accept localized names