Thread: postgreSQL UPPER Method is converting the character "µ" into "M"

postgreSQL UPPER Method is converting the character "µ" into "M"

From
Sai Teja
Date:
Hi team,

I am using UPPER document name for converting the text from lower case into upper case. 
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :-  TESTµ

Here, that character is converting into M which should not be case
The postgreSQL is hosted in Linux Machine which is using Locale_ctype:- en_US_utf.8

Would request you to please suggest any ideas to resolve this issue.

It'll would be very helpful and appreciated.

Thanks,
Sai Teja

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Tom Lane
Date:
Sai Teja <saitejasaichintalapudi@gmail.com> writes:
> I am using UPPER document name for converting the text from lower case into
> upper case.
> But here for the below example
> Command:- select UPPER(testµ)
> Input :- testµ
> Output:- TESTM
> Expected output :-  TESTµ

According to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this.  But in any case, you are
complaining in the wrong place.  This conversion is determined
by the rules of the selected locale.  If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.

            regards, tom lane



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Sai Teja
Date:
Hi Tom Lane, Thank you for your reply!

Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.

So, while validating the data we have encountered this problem.

We want to have the data in both DB's should be similar (db2 and postgreSQL)

In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)

So, even I want to change locale i cannot change now. Since already db is created. BTW this db is hosted in azure.

Would request to please suggest if any alternatives are there to resolve this issue. 

Thanks,
Sai Teja 



On Wed, 6 Sep, 2023, 7:23 pm Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Sai Teja <saitejasaichintalapudi@gmail.com> writes:
> I am using UPPER document name for converting the text from lower case into
> upper case.
> But here for the below example
> Command:- select UPPER(testµ)
> Input :- testµ
> Output:- TESTM
> Expected output :-  TESTµ

According to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this.  But in any case, you are
complaining in the wrong place.  This conversion is determined
by the rules of the selected locale.  If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.

                        regards, tom lane

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Laurenz Albe
Date:
On Wed, 2023-09-06 at 19:34 +0530, Sai Teja wrote:
> Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but
notas M. 

Then you are using a different collation in DB2, or "upper" (which is not standardized,
as far as I know), is defined differently there.

> So, while validating the data we have encountered this problem.
>
> We want to have the data in both DB's should be similar (db2 and postgreSQL)
>
> In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected (
testµinto TESTµ) 
>
> So, even I want to change locale i cannot change now. Since already db is created. BTW this db is hosted in azure.
>
> Would request to please suggest if any alternatives are there to resolve this issue. 

Specify a collation that does what you want explicitly:

SELECT upper('testµ' COLLATE "C");

 upper
═══════
 TESTµ
(1 row)

But then don't complain if your "ö" and "é" are not uppercased either.

Yours,
Laurenz Albe



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Erik Wienhold
Date:
On 06/09/2023 15:42 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

> I am using UPPER document name for converting the text from lower case into
> upper case.
> But here for the below example
> Command:- select UPPER(testµ)
> Input :- testµ
> Output:- TESTM
> Expected output :- TESTµ
>
> Here, that character is converting into M which should not be case
> The postgreSQL is hosted in Linux Machine which is using Locale_ctype:- en_US_utf.8

Use can use collation C or ucs_basic if you want to uppercase ASCII only:

    postgres=# SELECT upper('testµ' COLLATE "C");
     upper
    -------
     TESTµ
    (1 row)

--
Erik



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Helmut Bender
Date:
Am 06.09.23 um 16:04 schrieb Sai Teja:
> In my local it is windows OS and locale is English_united_states.1252 
> and in local it is converting as expected ( testµ into TESTµ)

You wrote PostgreSQL is hosted on an utf-8 system. That's why it's 
working different.

And be careful: the M isn't the uppercase latin m, but in fact the 
uppercase µ (U+039C in unicode)! If you would compare it to latin 
alphabet 'TESTM' it would be different.

-- 
Gruß Helmut



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Francisco Olarte
Date:
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
> Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but
notas M. 
> So, while validating the data we have encountered this problem.
>
> We want to have the data in both DB's should be similar (db2 and postgreSQL)
Data seems to be the same, function behaviour is what seems different.

> In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected (
testµinto TESTµ) 

Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.

If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).

Francisco Olarte.



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Sai Teja
Date:
Thank you all for your response.
Got one more query,

When I'm trying the below command 
Select UPPER('Mass')
I'm getting MASS as output .
But when I created the column with generated always constraint
For example
Alter table xyz add column xyz varchar(800) generated always as (UPPER(content)) stored

Here content is original string coming from the table
The output is coming as 'µass" when I am selecting the data from the table

Even When I try to use collat C or ucs_basics I'm getting the same when I select the data from the table
But when I select the data explicitly through UPPER netword like select UPPER('Mass') then I'm getting expected output such as MASS

Would request you to please suggest the ideas to resolve this issue.

Thanks,
Sai Teja


On Wed, 6 Sep, 2023, 8:59 pm Francisco Olarte, <folarte@peoplecall.com> wrote:
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
> Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.
> So, while validating the data we have encountered this problem.
>
> We want to have the data in both DB's should be similar (db2 and postgreSQL)
Data seems to be the same, function behaviour is what seems different.

> In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)

Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.

If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).

Francisco Olarte.

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Erik Wienhold
Date:
On 06/09/2023 18:04 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

> When I'm trying the below command
> Select UPPER('Mass')
> I'm getting MASS as output .
> But when I created the column with generated always constraint
> For example
> Alter table xyz add column xyz varchar(800) generated always as (UPPER(content)) stored
>
> Here content is original string coming from the table
> The output is coming as 'µass" when I am selecting the data from the table

Please provide the statements (CREATE TABLE, INSERT, and SELECT) to reproduce
it otherwise it's hard to analyze what's going on.

Homoglyphs are one explanation if you get 'µass' from the generated column as
described.  Another explanation is that you just selected a different column.

--
Erik



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Erik Wienhold
Date:
On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:

> Homoglyphs are one explanation if you get 'µass' from the generated column as
> described.

    postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
     upper
    -------
     𝝻𝚊𝚜𝚜
    (1 row)

The codepoints I picked are:

* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S

--
Erik



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Sai Teja
Date:
I added one column with generated always column with UPPER CASE like below:-

Alter table table_name t add column data varchar(8000) generated always as (UPPER(t.content)) stored 

Data column is generated always constraint here 

This column has many sentences for each row in which some of the characters are in Greek language.
Like µ, ë, ä, Ä etc..
So, for the example testµ when I choose 
1. Select UPPER('testµ') 
Output :- TESTM

But as per mail conversation I have used COLLATE ucs_basic like
2. Select UPPER('testµ' collate "ucs_basic") 
Output :- TESTµ (which is correct)


3. SELECT UPPER('Mass' collate "ucs_basic")
Output :- MASS (which is correct)

4. Select data from table (here data is the column which is created with generated always column like mentioned above)

For some of the rows which contains Greek characters I'm getting wrong output.

For ex:- for the word 'MASS' I'm getting 'µASS' when I select the data from the table

Summary:- I'm getting wrong output when I use upper keyword with collation for the table 
But when I explicitly call upper keyword with collation like mentioned in above I'm getting the results as expected.

Even I tried to add collation in the column itself but it didn't worked.

Alter table table_name t add column data varchar(8000) generated always as (UPPER(t.content, collation "ucs_basic")) stored 
Or 
Alter table table_name t add column data varchar(8000) generated always as (UPPER(t.content) collation "ucs_basic") stored 

Both didn't worked. As I got wrong output when I selected the data from the table.

On Wed, 6 Sep, 2023, 10:18 pm Erik Wienhold, <ewie@ewie.name> wrote:
On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:

> Homoglyphs are one explanation if you get 'µass' from the generated column as
> described.

        postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
         upper
        -------
         𝝻𝚊𝚜𝚜
        (1 row)

The codepoints I picked are:

* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S

--
Erik

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Jehan-Guillaume de Rorthais
Date:
On Wed, 6 Sep 2023 19:34:40 +0530
Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

> In my local it is windows OS and locale is English_united_states.1252 and
> in local it is converting as expected ( testµ into TESTµ)

I'm not familiar with PostgreSQL under Windows. Maybe collation from locale
"English_united_states.1252" is available in PostgreSQL? Could you try creating
a database/table/column using this collation?

Or just look at available collations using:

    select collname from pg_collation

++



Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
Joe Conway
Date:
On 9/6/23 11:22, Helmut Bender wrote:
> Am 06.09.23 um 16:04 schrieb Sai Teja:
>> In my local it is windows OS and locale is English_united_states.1252 
>> and in local it is converting as expected ( testµ into TESTµ)
> 
> You wrote PostgreSQL is hosted on an utf-8 system. That's why it's
> working different.
> 
> And be careful: the M isn't the uppercase latin m, but in fact the
> uppercase µ (U+039C in unicode)! If you would compare it to latin
> alphabet 'TESTM' it would be different.


Yep, there are interactions between the encoding used by your terminal, 
the server encoding, and the client encoding.

You have to be particularly careful when cutting and pasting that the 
client encoding and your terminal encoding match or you can get 
surprising results.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
"Peter J. Holzer"
Date:
On 2023-09-06 17:29:21 +0200, Francisco Olarte wrote:
> On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
> > Actually we are migrating the data from DB2 to postgreSQL. So in DB2
> > the upper method is converting µ as µ only but not as M.
> > So, while validating the data we have encountered this problem.
[...]
> Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
> like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
> which looks like small mu ) while windows-1252 only has 0xB6 as micro.
> OTOH Windows-1253 ( greek ansi ) has all three.
>
> If your small mu are really micro-sign ( which is suspected if youused
> 1252 ) maybe changing them to that helps ( but I do not have the
> resources to test that on hand ).

Nope, it doesn't:

hjp=> select chr(181) as micro, chr(956) as mu;
╔═══════╤════╗
║ micro │ mu ║
╟───────┼────╢
║ µ     │ μ  ║
╚═══════╧════╝
(1 row)

hjp=> select upper(chr(181)) as micro, upper(chr(956)) as mu;
╔═══════╤════╗
║ micro │ mu ║
╟───────┼────╢
║ Μ     │ Μ  ║
╚═══════╧════╝
(1 row)


(At least not on an Ubuntu 22.04 system using the en_US.UTF-8 locale).

IMHO uppercasing MICRO SIGN doesn't make much sense, but that was the
decision that either the libc maintainers ore the Unicode committee
made.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

From
"Peter J. Holzer"
Date:
On 2023-09-06 22:39:42 +0530, Sai Teja wrote:
> This column has many sentences for each row in which some of the
> characters are in Greek language.
> Like µ, ë, ä, Ä etc..

Except for the µ[1], those ̣aren't Greek letters.

        hp

[1] and even that's debatable, since it's a micro sign, not a mu.

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment