Thread: BUG #7913: TO_CHAR Function & Turkish collate
The following bug has been logged on the website: Bug reference: 7913 Logged by: TO_CHAR Function & Turkish collate Email address: a_dursun@hotmail.com PostgreSQL version: 9.2.0 Operating system: Linux Description: = prod=3D# SELECT TO_CHAR('2013-03-01'::date,'DAY'); to_char ---------- FR=C4=B0DAY (1 row) But it must return as FRIDAY. Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8. Best regards, Adnan DURSUN Ankar/TURKEY
a_dursun@hotmail.com writes: > prod=# SELECT TO_CHAR('2013-03-01'::date,'DAY'); > to_char > ---------- > FRÄ°DAY > (1 row) > But it must return as FRIDAY. > Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8. It looks like the cause of this is that the result is computed as str_toupper("Friday"), and str_toupper() applies a collation-sensitive upcasing rule. I think the use of str_toupper() is appropriate when processing the locale-specific string for a TMDAY specification; but plain DAY is not supposed to be locale-dependent, so we probably should use an ASCII-only upcasing rule in the non-TM code path. Anybody have an opinion on whether to back-patch such a fix? It seems conceivable that somebody out there is relying on the current behavior. OTOH, I believe that only Turkish UTF8 locales exhibit this behavior (the single-byte-encoding code path in str_toupper acts differently for historical reasons). So it's pretty inconsistent as it stands. regards, tom lane
On Sun, 2013-03-03 at 10:42 -0500, Tom Lane wrote: > I think the use of str_toupper() is appropriate when processing the > locale-specific string for a TMDAY specification; but plain DAY is not > supposed to be locale-dependent, so we probably should use an > ASCII-only upcasing rule in the non-TM code path. Agreed. > Anybody have an opinion on whether to back-patch such a fix? I think it's a bug that should be backpatched.
On 03-03-2013 12:42, Tom Lane wrote: > Anybody have an opinion on whether to back-patch such a fix? It seems > conceivable that somebody out there is relying on the current behavior. > OTOH, I believe that only Turkish UTF8 locales exhibit this behavior > (the single-byte-encoding code path in str_toupper acts differently for > historical reasons). So it's pretty inconsistent as it stands. > Nope. I'm not aware of the Turkish weird rules. Mea culpa. :( As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the right fix. I'm not aware of another locale that would break if we apply such a change in a stable branch. Are you want me to post a fix? -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Euler Taveira <euler@timbira.com> writes: > As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the > right fix. I'm not aware of another locale that would break if we apply such a > change in a stable branch. Are you want me to post a fix? Thanks, but I have a fix mostly written already. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > On Sun, 2013-03-03 at 10:42 -0500, Tom Lane wrote: >> Anybody have an opinion on whether to back-patch such a fix? > I think it's a bug that should be backpatched. Done. In addition to day/month names, I found that there were case-folding hazards for timezone abbreviations ('tz' format) and Roman numerals for numbers ('rn' format) ... though, curiously, not for Roman numerals for months. regards, tom lane
Hi, On Tue, 2013-03-05 at 13:08 -0500, Tom Lane wrote: > > I think it's a bug that should be backpatched. > > Done. In addition to day/month names, I found that there were > case-folding hazards for timezone abbreviations ('tz' format) > and Roman numerals for numbers ('rn' format) ... though, curiously, > not for Roman numerals for months. Thanks! Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz