Thread: BUG #16207: localization functions upper() and lower() does not work for text returned by convert_from()
BUG #16207: localization functions upper() and lower() does not work for text returned by convert_from()
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16207 Logged by: toha.bakanovsky Email address: im-scooter@yandex.ru PostgreSQL version: 12.0 Operating system: Ubuntu 16.04.11 64-bit Description: Localization functions upper() and lower() does not work for text returned by convert_from() for cyrillic symbols. select upper(convert_from(convert_to('абв', 'utf8'), 'utf8')) = upper('абв'); -- return false select upper(convert_from('абв'::bytea, 'utf8')), upper('абв'); -- return false hexdecimal presentation of text is identical select convert_to(convert_from('абв'::bytea, 'utf8'), 'utf8') = convert_to('абв', 'utf-8'); -- returns true SHOW lc_ctype; -- -- "en_US.UTF-8" On Postgresql 10.0 it works fine.
Re: BUG #16207: localization functions upper() and lower() does not work for text returned by convert_from()
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > Localization functions upper() and lower() does not work for text returned > by convert_from() for cyrillic symbols. > select upper(convert_from(convert_to('абв', 'utf8'), 'utf8')) = > upper('абв'); -- return false That happens because upper() depends on the collation that the parser determines for its input expression. When you write just upper('абв') you get the database's default collation attached to the literal, since it's implicitly typed as text. However the convert_from() invocation ends up with collation "C", in which non-ASCII characters aren't going to be changed by upper(). You can fix it by assigning another collation explicitly, e.g. # select upper(convert_from(convert_to('абв', 'utf8'), 'utf8')); upper -------- абв (1 row) # select upper(convert_from(convert_to('абв', 'utf8'), 'utf8') collate "default"); upper -------- АБВ (1 row) > On Postgresql 10.0 it works fine. This case sort of accidentally works in pre-v12 versions, because the convert_from call has no collatable inputs so it gets a default collation assignment anyway. However, the function's collation-name argument is of type name, and in v12 that type has collation "C" rather than no collation, so the convert_from result ends up with collation "C" not the database's default. I had not realized till just now that that change would have any side-effects on the behavior of convert_from (or more accurately, on the behavior of expressions involving convert_from as an input to a collation-sensitive function). I'm not sure whether to call it a bug or not --- it's not really obvious that convert_from() ought to be regarded as producing some particular collation rather than another one. In any case, there seems very little that we could do about it in the v12 release series, since those catalog entries are fixed now. Poking around, it seems that all of these functions would be similarly affected: # select oid::regprocedure from pg_proc where 'name'::regtype = any(proargtypes) and prorettype = 'text'::regtype; oid ----------------------------- text(name) obj_description(oid,name) shobj_description(oid,name) convert_from(bytea,name) to_ascii(text,name) (5 rows) regards, tom lane