Hi Анатолий, Tom
Thank you for reporting the issue.
I did some testing and see a similar issue. I'm seeing a behavior that suggests this isn't simply a C locale configuration issue, but rather an inconsistency in how collations are resolved within SQL functions.
My environment is configured with UTF-8 locale:
postgres=# show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
postgres=# show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
(1 row)
Test 1: SQL function with user parameter. Cyrillic is not uppercased
postgres=# create or replace function to_upper_first(param1 text, param2 text)
returns text as $
select upper(param1) || ' ' || upper(param2) || ' ' || param2;
$ language sql;
CREATE FUNCTION
postgres=# select to_upper_first('тест', user::text);
to_upper_first
------------------
тест SURYA surya
(1 row)
Test 2: PL/pgSQL function. Cyrillic is not uppercased
postgres=# create or replace function to_upper_first(param1 text, param2 text)
postgres-# returns text language plpgsql as $$
postgres$# begin
postgres$# return upper(param1) || ' ' || upper(param2) || ' ' || param2;
postgres$# end;
postgres$# $$;
CREATE FUNCTION
postgres=#
postgres=# select to_upper_first('тест', user::text);
to_upper_first
------------------
тест SURYA surya
(1 row)
Test 3: Explicit collation, uppercase works correctly
postgres=# create or replace function to_upper_first(param1 text, param2 text)
postgres-# returns text as $$
postgres$# select upper(param1 collate "en_US.UTF-8")
postgres$# || ' '
postgres$# || upper(param2 collate "en_US.UTF-8")
postgres$# || ' '
postgres$# || param2;
postgres$# $$ language sql;
CREATE FUNCTION
postgres=#
postgres=# select to_upper_first('тест', user::text);
to_upper_first
------------------
ТЕСТ SURYA surya
(1 row)
Test 4: Direct upper() calls - Work correctly
postgres=# select upper('тест');
upper
-------
ТЕСТ
(1 row)
postgres=# select upper('тест'::text);
upper
-------
ТЕСТ
(1 row)
postgres=# select upper(('тест')::text);
upper
-------
ТЕСТ
(1 row)
postgres=# select upper(('тест') collate "en_US.UTF-8");
upper
-------
ТЕСТ
(1 row)
postgres=# select upper(('тест') collate "C");
upper
-------
тест
(1 row)
My observations:
1. Direct calls to upper('тест') correctly uppercases the Cyrillic using the database's en_US.UTF-8 locale
2. Inside functions, upper(param1) fails to uppercase Cyrillic when user::text is passed as param2
i. The issue occurs in both SQL and PL/pgSQL functions
3. Explicitly specifying collate "en_US.UTF-8" in the function resolves the issue.
If select upper('тест'::text) works correctly (i.e it converts to uppercase) outside the function, why does upper(param1) appear to use C collation inside the function when user::text is involved as another parameter?
It seems that the user session variable as a parameter is somehow affecting collation resolution for the independent param1 parameter, which seems unexpected.
I will investigate more into this issue.
Regards,
Surya Poondla