Thread: postgreSQL UPPER Method is converting the character "µ" into "M"
Hi team,
I am using UPPER document name for converting the text from lower case into upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ
Here, that character is converting into M which should not be case
The postgreSQL is hosted in Linux Machine which is using Locale_ctype:- en_US_utf.8
Would request you to please suggest any ideas to resolve this issue.
It'll would be very helpful and appreciated.
Thanks,
Sai Teja
Sai Teja <saitejasaichintalapudi@gmail.com> writes: > I am using UPPER document name for converting the text from lower case into > upper case. > But here for the below example > Command:- select UPPER(testµ) > Input :- testµ > Output:- TESTM > Expected output :- TESTµ According to https://en.wikipedia.org/wiki/Greek_alphabet the upper-case form of µ (mu) is in fact M, so I'm not sure what problem you have with this. But in any case, you are complaining in the wrong place. This conversion is determined by the rules of the selected locale. If you don't like it, choose another locale, or make your own, or try to persuade the glibc maintainers that they are wrong. regards, tom lane
Hi Tom Lane, Thank you for your reply!
Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.
We want to have the data in both DB's should be similar (db2 and postgreSQL)
In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)
So, even I want to change locale i cannot change now. Since already db is created. BTW this db is hosted in azure.
Would request to please suggest if any alternatives are there to resolve this issue.
Thanks,
Sai Teja
On Wed, 6 Sep, 2023, 7:23 pm Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Sai Teja <saitejasaichintalapudi@gmail.com> writes:
> I am using UPPER document name for converting the text from lower case into
> upper case.
> But here for the below example
> Command:- select UPPER(testµ)
> Input :- testµ
> Output:- TESTM
> Expected output :- TESTµ
According to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this. But in any case, you are
complaining in the wrong place. This conversion is determined
by the rules of the selected locale. If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.
regards, tom lane
On Wed, 2023-09-06 at 19:34 +0530, Sai Teja wrote: > Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but notas M. Then you are using a different collation in DB2, or "upper" (which is not standardized, as far as I know), is defined differently there. > So, while validating the data we have encountered this problem. > > We want to have the data in both DB's should be similar (db2 and postgreSQL) > > In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµinto TESTµ) > > So, even I want to change locale i cannot change now. Since already db is created. BTW this db is hosted in azure. > > Would request to please suggest if any alternatives are there to resolve this issue. Specify a collation that does what you want explicitly: SELECT upper('testµ' COLLATE "C"); upper ═══════ TESTµ (1 row) But then don't complain if your "ö" and "é" are not uppercased either. Yours, Laurenz Albe
On 06/09/2023 15:42 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote: > I am using UPPER document name for converting the text from lower case into > upper case. > But here for the below example > Command:- select UPPER(testµ) > Input :- testµ > Output:- TESTM > Expected output :- TESTµ > > Here, that character is converting into M which should not be case > The postgreSQL is hosted in Linux Machine which is using Locale_ctype:- en_US_utf.8 Use can use collation C or ucs_basic if you want to uppercase ASCII only: postgres=# SELECT upper('testµ' COLLATE "C"); upper ------- TESTµ (1 row) -- Erik
Am 06.09.23 um 16:04 schrieb Sai Teja: > In my local it is windows OS and locale is English_united_states.1252 > and in local it is converting as expected ( testµ into TESTµ) You wrote PostgreSQL is hosted on an utf-8 system. That's why it's working different. And be careful: the M isn't the uppercase latin m, but in fact the uppercase µ (U+039C in unicode)! If you would compare it to latin alphabet 'TESTM' it would be different. -- Gruß Helmut
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote: > Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but notas M. > So, while validating the data we have encountered this problem. > > We want to have the data in both DB's should be similar (db2 and postgreSQL) Data seems to be the same, function behaviour is what seems different. > In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµinto TESTµ) Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181, which looks like small mu ) while windows-1252 only has 0xB6 as micro. OTOH Windows-1253 ( greek ansi ) has all three. If your small mu are really micro-sign ( which is suspected if youused 1252 ) maybe changing them to that helps ( but I do not have the resources to test that on hand ). Francisco Olarte.
Thank you all for your response.
Got one more query,
When I'm trying the below command
Select UPPER('Mass')
I'm getting MASS as output .
But when I created the column with generated always constraint
For example
Alter table xyz add column xyz varchar(800) generated always as (UPPER(content)) stored
Here content is original string coming from the table
The output is coming as 'µass" when I am selecting the data from the table
Even When I try to use collat C or ucs_basics I'm getting the same when I select the data from the table
But when I select the data explicitly through UPPER netword like select UPPER('Mass') then I'm getting expected output such as MASS
Would request you to please suggest the ideas to resolve this issue.
Thanks,
Sai Teja
On Wed, 6 Sep, 2023, 8:59 pm Francisco Olarte, <folarte@peoplecall.com> wrote:
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
> Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.
> So, while validating the data we have encountered this problem.
>
> We want to have the data in both DB's should be similar (db2 and postgreSQL)
Data seems to be the same, function behaviour is what seems different.
> In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)
Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.
If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).
Francisco Olarte.
On 06/09/2023 18:04 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote: > When I'm trying the below command > Select UPPER('Mass') > I'm getting MASS as output . > But when I created the column with generated always constraint > For example > Alter table xyz add column xyz varchar(800) generated always as (UPPER(content)) stored > > Here content is original string coming from the table > The output is coming as 'µass" when I am selecting the data from the table Please provide the statements (CREATE TABLE, INSERT, and SELECT) to reproduce it otherwise it's hard to analyze what's going on. Homoglyphs are one explanation if you get 'µass' from the generated column as described. Another explanation is that you just selected a different column. -- Erik
On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote: > Homoglyphs are one explanation if you get 'µass' from the generated column as > described. postgres=# SELECT upper('𝝻𝚊𝚜𝚜'); upper ------- 𝝻𝚊𝚜𝚜 (1 row) The codepoints I picked are: * MATHEMATICAL SANS-SERIF BOLD SMALL MU * MATHEMATICAL MONOSPACE SMALL A * MATHEMATICAL MONOSPACE SMALL S -- Erik
I added one column with generated always column with UPPER CASE like below:-
Alter table table_name t add column data varchar(8000) generated always as (UPPER(t.content)) stored
Data column is generated always constraint here
This column has many sentences for each row in which some of the characters are in Greek language.
Like µ, ë, ä, Ä etc..
So, for the example testµ when I choose
1. Select UPPER('testµ')
Output :- TESTM
But as per mail conversation I have used COLLATE ucs_basic like
2. Select UPPER('testµ' collate "ucs_basic")
Output :- TESTµ (which is correct)
Output :- MASS (which is correct)
4. Select data from table (here data is the column which is created with generated always column like mentioned above)
For some of the rows which contains Greek characters I'm getting wrong output.
For ex:- for the word 'MASS' I'm getting 'µASS' when I select the data from the table
Summary:- I'm getting wrong output when I use upper keyword with collation for the table
But when I explicitly call upper keyword with collation like mentioned in above I'm getting the results as expected.
Even I tried to add collation in the column itself but it didn't worked.
Alter table table_name t add column data varchar(8000) generated always as (UPPER(t.content, collation "ucs_basic")) stored
Or
Alter table table_name t add column data varchar(8000) generated always as (UPPER(t.content) collation "ucs_basic") stored
Both didn't worked. As I got wrong output when I selected the data from the table.
On Wed, 6 Sep, 2023, 10:18 pm Erik Wienhold, <ewie@ewie.name> wrote:
On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:
> Homoglyphs are one explanation if you get 'µass' from the generated column as
> described.
postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
upper
-------
𝝻𝚊𝚜𝚜
(1 row)
The codepoints I picked are:
* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S
--
Erik
Re: postgreSQL UPPER Method is converting the character "µ" into "M"
From
Jehan-Guillaume de Rorthais
Date:
On Wed, 6 Sep 2023 19:34:40 +0530 Sai Teja <saitejasaichintalapudi@gmail.com> wrote: > In my local it is windows OS and locale is English_united_states.1252 and > in local it is converting as expected ( testµ into TESTµ) I'm not familiar with PostgreSQL under Windows. Maybe collation from locale "English_united_states.1252" is available in PostgreSQL? Could you try creating a database/table/column using this collation? Or just look at available collations using: select collname from pg_collation ++
On 9/6/23 11:22, Helmut Bender wrote: > Am 06.09.23 um 16:04 schrieb Sai Teja: >> In my local it is windows OS and locale is English_united_states.1252 >> and in local it is converting as expected ( testµ into TESTµ) > > You wrote PostgreSQL is hosted on an utf-8 system. That's why it's > working different. > > And be careful: the M isn't the uppercase latin m, but in fact the > uppercase µ (U+039C in unicode)! If you would compare it to latin > alphabet 'TESTM' it would be different. Yep, there are interactions between the encoding used by your terminal, the server encoding, and the client encoding. You have to be particularly careful when cutting and pasting that the client encoding and your terminal encoding match or you can get surprising results. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2023-09-06 17:29:21 +0200, Francisco Olarte wrote: > On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote: > > Actually we are migrating the data from DB2 to postgreSQL. So in DB2 > > the upper method is converting µ as µ only but not as M. > > So, while validating the data we have encountered this problem. [...] > Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks > like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181, > which looks like small mu ) while windows-1252 only has 0xB6 as micro. > OTOH Windows-1253 ( greek ansi ) has all three. > > If your small mu are really micro-sign ( which is suspected if youused > 1252 ) maybe changing them to that helps ( but I do not have the > resources to test that on hand ). Nope, it doesn't: hjp=> select chr(181) as micro, chr(956) as mu; ╔═══════╤════╗ ║ micro │ mu ║ ╟───────┼────╢ ║ µ │ μ ║ ╚═══════╧════╝ (1 row) hjp=> select upper(chr(181)) as micro, upper(chr(956)) as mu; ╔═══════╤════╗ ║ micro │ mu ║ ╟───────┼────╢ ║ Μ │ Μ ║ ╚═══════╧════╝ (1 row) (At least not on an Ubuntu 22.04 system using the en_US.UTF-8 locale). IMHO uppercasing MICRO SIGN doesn't make much sense, but that was the decision that either the libc maintainers ore the Unicode committee made. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2023-09-06 22:39:42 +0530, Sai Teja wrote: > This column has many sentences for each row in which some of the > characters are in Greek language. > Like µ, ë, ä, Ä etc.. Except for the µ[1], those ̣aren't Greek letters. hp [1] and even that's debatable, since it's a micro sign, not a mu. -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"