Thread: BUG #7913: TO_CHAR Function & Turkish collate

BUG #7913: TO_CHAR Function & Turkish collate

From
a_dursun@hotmail.com
Date:
The following bug has been logged on the website:

Bug reference:      7913
Logged by:          TO_CHAR Function & Turkish collate
Email address:      a_dursun@hotmail.com
PostgreSQL version: 9.2.0
Operating system:   Linux
Description:        =


prod=3D# SELECT TO_CHAR('2013-03-01'::date,'DAY');
 to_char
----------
 FR=C4=B0DAY
(1 row)
But it must return as FRIDAY.
Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8.

Best regards,
Adnan DURSUN
Ankar/TURKEY

Re: BUG #7913: TO_CHAR Function & Turkish collate

From
Tom Lane
Date:
a_dursun@hotmail.com writes:
> prod=# SELECT TO_CHAR('2013-03-01'::date,'DAY');
>  to_char
> ----------
>  FRÄ°DAY
> (1 row)
> But it must return as FRIDAY.
> Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8.

It looks like the cause of this is that the result is computed as
str_toupper("Friday"), and str_toupper() applies a collation-sensitive
upcasing rule.

I think the use of str_toupper() is appropriate when processing the
locale-specific string for a TMDAY specification; but plain DAY is not
supposed to be locale-dependent, so we probably should use an ASCII-only
upcasing rule in the non-TM code path.

Anybody have an opinion on whether to back-patch such a fix?  It seems
conceivable that somebody out there is relying on the current behavior.
OTOH, I believe that only Turkish UTF8 locales exhibit this behavior
(the single-byte-encoding code path in str_toupper acts differently for
historical reasons).  So it's pretty inconsistent as it stands.

            regards, tom lane

Re: BUG #7913: TO_CHAR Function & Turkish collate

From
Peter Eisentraut
Date:
On Sun, 2013-03-03 at 10:42 -0500, Tom Lane wrote:
> I think the use of str_toupper() is appropriate when processing the
> locale-specific string for a TMDAY specification; but plain DAY is not
> supposed to be locale-dependent, so we probably should use an
> ASCII-only upcasing rule in the non-TM code path.

Agreed.

> Anybody have an opinion on whether to back-patch such a fix?

I think it's a bug that should be backpatched.

Re: BUG #7913: TO_CHAR Function & Turkish collate

From
Euler Taveira
Date:
On 03-03-2013 12:42, Tom Lane wrote:
> Anybody have an opinion on whether to back-patch such a fix?  It seems
> conceivable that somebody out there is relying on the current behavior.
> OTOH, I believe that only Turkish UTF8 locales exhibit this behavior
> (the single-byte-encoding code path in str_toupper acts differently for
> historical reasons).  So it's pretty inconsistent as it stands.
>
Nope. I'm not aware of the Turkish weird rules. Mea culpa. :(

As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the
right fix. I'm not aware of another locale that would break if we apply such a
change in a stable branch. Are you want me to post a fix?


--
   Euler Taveira de Oliveira - Timbira       http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: BUG #7913: TO_CHAR Function & Turkish collate

From
Tom Lane
Date:
Euler Taveira <euler@timbira.com> writes:
> As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the
> right fix. I'm not aware of another locale that would break if we apply such a
> change in a stable branch. Are you want me to post a fix?

Thanks, but I have a fix mostly written already.

            regards, tom lane

Re: BUG #7913: TO_CHAR Function & Turkish collate

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Sun, 2013-03-03 at 10:42 -0500, Tom Lane wrote:
>> Anybody have an opinion on whether to back-patch such a fix?

> I think it's a bug that should be backpatched.

Done.  In addition to day/month names, I found that there were
case-folding hazards for timezone abbreviations ('tz' format)
and Roman numerals for numbers ('rn' format) ... though, curiously,
not for Roman numerals for months.

            regards, tom lane

Re: BUG #7913: TO_CHAR Function & Turkish collate

From
Devrim GÜNDÜZ
Date:
Hi,

On Tue, 2013-03-05 at 13:08 -0500, Tom Lane wrote:
> > I think it's a bug that should be backpatched.
>
> Done.  In addition to day/month names, I found that there were
> case-folding hazards for timezone abbreviations ('tz' format)
> and Roman numerals for numbers ('rn' format) ... though, curiously,
> not for Roman numerals for months.

Thanks!

Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz