Thread: Patch to make Turks happy.
Hi, Yet another problem with Turkish encoding. clean_encoding_name() in src/backend/utils/mb/encnames.c uses tolower() to convert locale names to lower-case. This causes errors if locale name contains capital "I" and current olcale is Turkish. Some examples: aaa=# \l List of databases Name | Owner | Encoding -----------+-------+---------- aaa | pgsql | LATIN5 bbb | pgsql | LATIN5 template0 | pgsql | LATIN5 template1 | pgsql | LATIN5 (4 rows) aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; ERROR: LATIN5 is not a valid encoding name aaa=# \encoding SQL_ASCII aaa=# \encoding SQL_ASCII SQL_ASCII: invalid encoding name or conversion procedure not found aaa=# \encoding LATIN5 LATIN5: invalid encoding name or conversion procedure not found Patch, is a simple change to use ASCII-only lower-case conversion instead of locale-dependent tolower() Best regards, Nic. *** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 --- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 *************** *** 407,413 **** for (p = key, np = newkey; *p != '\0'; p++) { if (isalnum((unsigned char) *p)) ! *np++ = tolower((unsigned char) *p); } *np = '\0'; return newkey; --- 407,416 ---- for (p = key, np = newkey; *p != '\0'; p++) { if (isalnum((unsigned char) *p)) ! if (*p >= 'A' && *p <= 'Z') ! *np++ = *p + 'a' - 'A'; ! else ! *np++ = *p; } *np = '\0'; return newkey;
I am not going to apply this patch because I think it will mess up the handling of other locales. --------------------------------------------------------------------------- Nicolai Tufar wrote: > Hi, > > Yet another problem with Turkish encoding. clean_encoding_name() > in src/backend/utils/mb/encnames.c uses tolower() to convert locale > names to lower-case. This causes errors if locale name contains > capital "I" and current olcale is Turkish. Some examples: > > aaa=# \l > List of databases > Name | Owner | Encoding > -----------+-------+---------- > aaa | pgsql | LATIN5 > bbb | pgsql | LATIN5 > template0 | pgsql | LATIN5 > template1 | pgsql | LATIN5 > (4 rows) > aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; > ERROR: LATIN5 is not a valid encoding name > aaa=# \encoding > SQL_ASCII > aaa=# \encoding SQL_ASCII > SQL_ASCII: invalid encoding name or conversion procedure not found > aaa=# \encoding LATIN5 > LATIN5: invalid encoding name or conversion procedure not found > > > Patch, is a simple change to use ASCII-only lower-case conversion > instead of locale-dependent tolower() > > Best regards, > Nic. > > > > > > > *** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 > --- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 > *************** > *** 407,413 **** > for (p = key, np = newkey; *p != '\0'; p++) > { > if (isalnum((unsigned char) *p)) > ! *np++ = tolower((unsigned char) *p); > } > *np = '\0'; > return newkey; > --- 407,416 ---- > for (p = key, np = newkey; *p != '\0'; p++) > { > if (isalnum((unsigned char) *p)) > ! if (*p >= 'A' && *p <= 'Z') > ! *np++ = *p + 'a' - 'A'; > ! else > ! *np++ = *p; > } > *np = '\0'; > return newkey; > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > I am not going to apply this patch because I think it will mess up the > handling of other locales. As far as I figured from the source code this function only deals with cleaning up locale names and nothing else. Since all the locale names are in plain ASCII I think it will be safe to use ASCII-only lower-case conversion. By the way, I noticed only after sending the patch that compiler complains about ambiguous `else' so it can be rewritten as: if (*p >= 'A' && *p <= 'Z'){ *np++ = *p + 'a' - 'A'; }else{ *np++ = *p; } Regards, Nicolai > > > --------------------------------------------------------------------------- > > Nicolai Tufar wrote: > >>Hi, >> >>Yet another problem with Turkish encoding. clean_encoding_name() >>in src/backend/utils/mb/encnames.c uses tolower() to convert locale >>names to lower-case. This causes errors if locale name contains >>capital "I" and current olcale is Turkish. Some examples: >> >>aaa=# \l >> List of databases >> Name | Owner | Encoding >>-----------+-------+---------- >> aaa | pgsql | LATIN5 >> bbb | pgsql | LATIN5 >> template0 | pgsql | LATIN5 >> template1 | pgsql | LATIN5 >>(4 rows) >>aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; >>ERROR: LATIN5 is not a valid encoding name >>aaa=# \encoding >>SQL_ASCII >>aaa=# \encoding SQL_ASCII >>SQL_ASCII: invalid encoding name or conversion procedure not found >>aaa=# \encoding LATIN5 >>LATIN5: invalid encoding name or conversion procedure not found >> >> >>Patch, is a simple change to use ASCII-only lower-case conversion >>instead of locale-dependent tolower() >> >>Best regards, >>Nic. >> >> >> >> >> >> >>*** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 >>--- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 >>*************** >>*** 407,413 **** >> for (p = key, np = newkey; *p != '\0'; p++) >> { >> if (isalnum((unsigned char) *p)) >>! *np++ = tolower((unsigned char) *p); >> } >> *np = '\0'; >> return newkey; >>--- 407,416 ---- >> for (p = key, np = newkey; *p != '\0'; p++) >> { >> if (isalnum((unsigned char) *p)) >>! if (*p >= 'A' && *p <= 'Z') >>! *np++ = *p + 'a' - 'A'; >>! else >>! *np++ = *p; >> } >> *np = '\0'; >> return newkey; >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 4: Don't 'kill -9' the postmaster >> > >
Bruce Momjian writes: > I am not going to apply this patch because I think it will mess up the > handling of other locales. This patch looks OK to me. Normally, character set names should use identifier case-folding rules anyway, so seems to be a step in the right direction. Much better than saying that users of certain locales can't properly use PostgreSQL. > > > --------------------------------------------------------------------------- > > Nicolai Tufar wrote: > > Hi, > > > > Yet another problem with Turkish encoding. clean_encoding_name() > > in src/backend/utils/mb/encnames.c uses tolower() to convert locale > > names to lower-case. This causes errors if locale name contains > > capital "I" and current olcale is Turkish. Some examples: > > > > aaa=# \l > > List of databases > > Name | Owner | Encoding > > -----------+-------+---------- > > aaa | pgsql | LATIN5 > > bbb | pgsql | LATIN5 > > template0 | pgsql | LATIN5 > > template1 | pgsql | LATIN5 > > (4 rows) > > aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; > > ERROR: LATIN5 is not a valid encoding name > > aaa=# \encoding > > SQL_ASCII > > aaa=# \encoding SQL_ASCII > > SQL_ASCII: invalid encoding name or conversion procedure not found > > aaa=# \encoding LATIN5 > > LATIN5: invalid encoding name or conversion procedure not found > > > > > > Patch, is a simple change to use ASCII-only lower-case conversion > > instead of locale-dependent tolower() > > > > Best regards, > > Nic. > > > > > > > > > > > > > > *** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 > > --- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 > > *************** > > *** 407,413 **** > > for (p = key, np = newkey; *p != '\0'; p++) > > { > > if (isalnum((unsigned char) *p)) > > ! *np++ = tolower((unsigned char) *p); > > } > > *np = '\0'; > > return newkey; > > --- 407,416 ---- > > for (p = key, np = newkey; *p != '\0'; p++) > > { > > if (isalnum((unsigned char) *p)) > > ! if (*p >= 'A' && *p <= 'Z') > > ! *np++ = *p + 'a' - 'A'; > > ! else > > ! *np++ = *p; > > } > > *np = '\0'; > > return newkey; > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > > -- Peter Eisentraut peter_e@gmx.net
OK, Peter, that helps. Thanks. I will apply it. --------------------------------------------------------------------------- Peter Eisentraut wrote: > Bruce Momjian writes: > > > I am not going to apply this patch because I think it will mess up the > > handling of other locales. > > This patch looks OK to me. Normally, character set names should use > identifier case-folding rules anyway, so seems to be a step in the right > direction. Much better than saying that users of certain locales can't > properly use PostgreSQL. > > > > > > > --------------------------------------------------------------------------- > > > > Nicolai Tufar wrote: > > > Hi, > > > > > > Yet another problem with Turkish encoding. clean_encoding_name() > > > in src/backend/utils/mb/encnames.c uses tolower() to convert locale > > > names to lower-case. This causes errors if locale name contains > > > capital "I" and current olcale is Turkish. Some examples: > > > > > > aaa=# \l > > > List of databases > > > Name | Owner | Encoding > > > -----------+-------+---------- > > > aaa | pgsql | LATIN5 > > > bbb | pgsql | LATIN5 > > > template0 | pgsql | LATIN5 > > > template1 | pgsql | LATIN5 > > > (4 rows) > > > aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; > > > ERROR: LATIN5 is not a valid encoding name > > > aaa=# \encoding > > > SQL_ASCII > > > aaa=# \encoding SQL_ASCII > > > SQL_ASCII: invalid encoding name or conversion procedure not found > > > aaa=# \encoding LATIN5 > > > LATIN5: invalid encoding name or conversion procedure not found > > > > > > > > > Patch, is a simple change to use ASCII-only lower-case conversion > > > instead of locale-dependent tolower() > > > > > > Best regards, > > > Nic. > > > > > > > > > > > > > > > > > > > > > *** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 > > > --- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 > > > *************** > > > *** 407,413 **** > > > for (p = key, np = newkey; *p != '\0'; p++) > > > { > > > if (isalnum((unsigned char) *p)) > > > ! *np++ = tolower((unsigned char) *p); > > > } > > > *np = '\0'; > > > return newkey; > > > --- 407,416 ---- > > > for (p = key, np = newkey; *p != '\0'; p++) > > > { > > > if (isalnum((unsigned char) *p)) > > > ! if (*p >= 'A' && *p <= 'Z') > > > ! *np++ = *p + 'a' - 'A'; > > > ! else > > > ! *np++ = *p; > > > } > > > *np = '\0'; > > > return newkey; > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Don't 'kill -9' the postmaster > > > > > > > > > -- > Peter Eisentraut peter_e@gmx.net > > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
OK, patch applied. Peter, should this appear in 7.3.1 too? --------------------------------------------------------------------------- Peter Eisentraut wrote: > Bruce Momjian writes: > > > I am not going to apply this patch because I think it will mess up the > > handling of other locales. > > This patch looks OK to me. Normally, character set names should use > identifier case-folding rules anyway, so seems to be a step in the right > direction. Much better than saying that users of certain locales can't > properly use PostgreSQL. > > > > > > > --------------------------------------------------------------------------- > > > > Nicolai Tufar wrote: > > > Hi, > > > > > > Yet another problem with Turkish encoding. clean_encoding_name() > > > in src/backend/utils/mb/encnames.c uses tolower() to convert locale > > > names to lower-case. This causes errors if locale name contains > > > capital "I" and current olcale is Turkish. Some examples: > > > > > > aaa=# \l > > > List of databases > > > Name | Owner | Encoding > > > -----------+-------+---------- > > > aaa | pgsql | LATIN5 > > > bbb | pgsql | LATIN5 > > > template0 | pgsql | LATIN5 > > > template1 | pgsql | LATIN5 > > > (4 rows) > > > aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; > > > ERROR: LATIN5 is not a valid encoding name > > > aaa=# \encoding > > > SQL_ASCII > > > aaa=# \encoding SQL_ASCII > > > SQL_ASCII: invalid encoding name or conversion procedure not found > > > aaa=# \encoding LATIN5 > > > LATIN5: invalid encoding name or conversion procedure not found > > > > > > > > > Patch, is a simple change to use ASCII-only lower-case conversion > > > instead of locale-dependent tolower() > > > > > > Best regards, > > > Nic. > > > > > > > > > > > > > > > > > > > > > *** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 > > > --- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 > > > *************** > > > *** 407,413 **** > > > for (p = key, np = newkey; *p != '\0'; p++) > > > { > > > if (isalnum((unsigned char) *p)) > > > ! *np++ = tolower((unsigned char) *p); > > > } > > > *np = '\0'; > > > return newkey; > > > --- 407,416 ---- > > > for (p = key, np = newkey; *p != '\0'; p++) > > > { > > > if (isalnum((unsigned char) *p)) > > > ! if (*p >= 'A' && *p <= 'Z') > > > ! *np++ = *p + 'a' - 'A'; > > > ! else > > > ! *np++ = *p; > > > } > > > *np = '\0'; > > > return newkey; > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Don't 'kill -9' the postmaster > > > > > > > > > -- > Peter Eisentraut peter_e@gmx.net > > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/utils/mb/encnames.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/utils/mb/encnames.c,v retrieving revision 1.10 diff -c -c -r1.10 encnames.c *** src/backend/utils/mb/encnames.c 4 Sep 2002 20:31:31 -0000 1.10 --- src/backend/utils/mb/encnames.c 5 Dec 2002 23:19:40 -0000 *************** *** 407,413 **** for (p = key, np = newkey; *p != '\0'; p++) { if (isalnum((unsigned char) *p)) ! *np++ = tolower((unsigned char) *p); } *np = '\0'; return newkey; --- 407,418 ---- for (p = key, np = newkey; *p != '\0'; p++) { if (isalnum((unsigned char) *p)) ! { ! if (*p >= 'A' && *p <= 'Z') ! *np++ = *p + 'a' - 'A'; ! else ! *np++ = *p; ! } } *np = '\0'; return newkey;
Peter, is that patch OK for 7.3.1? I am not sure. --------------------------------------------------------------------------- Peter Eisentraut wrote: > Bruce Momjian writes: > > > I am not going to apply this patch because I think it will mess up the > > handling of other locales. > > This patch looks OK to me. Normally, character set names should use > identifier case-folding rules anyway, so seems to be a step in the right > direction. Much better than saying that users of certain locales can't > properly use PostgreSQL. > > > > > > > --------------------------------------------------------------------------- > > > > Nicolai Tufar wrote: > > > Hi, > > > > > > Yet another problem with Turkish encoding. clean_encoding_name() > > > in src/backend/utils/mb/encnames.c uses tolower() to convert locale > > > names to lower-case. This causes errors if locale name contains > > > capital "I" and current olcale is Turkish. Some examples: > > > > > > aaa=# \l > > > List of databases > > > Name | Owner | Encoding > > > -----------+-------+---------- > > > aaa | pgsql | LATIN5 > > > bbb | pgsql | LATIN5 > > > template0 | pgsql | LATIN5 > > > template1 | pgsql | LATIN5 > > > (4 rows) > > > aaa=# CREATE DATABASE ccc ENCODING='LATIN5'; > > > ERROR: LATIN5 is not a valid encoding name > > > aaa=# \encoding > > > SQL_ASCII > > > aaa=# \encoding SQL_ASCII > > > SQL_ASCII: invalid encoding name or conversion procedure not found > > > aaa=# \encoding LATIN5 > > > LATIN5: invalid encoding name or conversion procedure not found > > > > > > > > > Patch, is a simple change to use ASCII-only lower-case conversion > > > instead of locale-dependent tolower() > > > > > > Best regards, > > > Nic. > > > > > > > > > > > > > > > > > > > > > *** ./src/backend/utils/mb/encnames.c.orig Mon Dec 2 15:58:49 2002 > > > --- ./src/backend/utils/mb/encnames.c Mon Dec 2 18:13:23 2002 > > > *************** > > > *** 407,413 **** > > > for (p = key, np = newkey; *p != '\0'; p++) > > > { > > > if (isalnum((unsigned char) *p)) > > > ! *np++ = tolower((unsigned char) *p); > > > } > > > *np = '\0'; > > > return newkey; > > > --- 407,416 ---- > > > for (p = key, np = newkey; *p != '\0'; p++) > > > { > > > if (isalnum((unsigned char) *p)) > > > ! if (*p >= 'A' && *p <= 'Z') > > > ! *np++ = *p + 'a' - 'A'; > > > ! else > > > ! *np++ = *p; > > > } > > > *np = '\0'; > > > return newkey; > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Don't 'kill -9' the postmaster > > > > > > > > > -- > Peter Eisentraut peter_e@gmx.net > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian writes: > Peter, is that patch OK for 7.3.1? I am not sure. Definitely. It's a bug fix. -- Peter Eisentraut peter_e@gmx.net
Thanks. Applied for 7.3.1. --------------------------------------------------------------------------- Peter Eisentraut wrote: > Bruce Momjian writes: > > > Peter, is that patch OK for 7.3.1? I am not sure. > > Definitely. It's a bug fix. > > -- > Peter Eisentraut peter_e@gmx.net > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073