Thread: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
I do understand the problem, but don't undertstand the decision you guys made. The fact that UPPER/LOWER and some other functions does not work in win32 is surely a problem for some languages, but not a problem for otheres. For example, Japanese (and probably Chinese and Korean) does not have a concept upper/lower. So the fact UPPER/LOWER does not work with UTF-8/win32 is not problem for Japanese (and for some other languages). Just using C locale with UTF-8 is enough in this case. In summary, I think you guys are going to overkill the multibyte support functionality on UTF-8/win32 because of the fact that some langauges do not work. Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well. I strongly object the policy to try to unconditionaly disable UTF-8 support on win32. -- Tatsuo Ishii From: "Magnus Hagander" <mha@sollentuna.net> Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 Date: Sat, 1 Jan 2005 14:48:04 +0100 Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se> > UNICODE/UTF-8 does not work on the win32 server. The reason is that > strcoll() and friends don't work with it. To support it on win32, it > needs to be converted to UTF16 and use the wide-character versions of > the fucntion. Which we do not do. > (See > http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php > and > http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php) > > > I don't *think* we need to disable ito n the client. AFAIK, the client > interfaces don't use any of these functions, and I've seen reports of > people using that long before we had a native win32 server. > > > //Magnus > > > >-----Original Message----- > >From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > >Sent: den 1 januari 2005 01:10 > >To: tgl@sss.pgh.pa.us > >Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org > >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 > > > > > >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the > >problem here? > >-- > >Tatsuo Ishii > > > >> "Magnus Hagander" <mha@sollentuna.net> writes: > >> > We know it's broken and won't be fixed for 8.0. > >> > >> > If we just #ifndef WIN32 the definitions in > >utils/mb/encnames.c it won't > >> > be possible to select that encoding, right? Will that have > >any other > >> > unwanted effects (such as breaking client encodings)? If > >not, I suggest > >> > this is done. > >> > >> I believe the subscripts in those arrays have to match the encoding > >> enum type, so you can't just ifdef out individual entries. > >> > >> > (Or perhaps something can be done in pg_valid_server_encoding?) > >> > >> Making the valid_server_encoding function reject it might work. > >> Tatsuo-san would know for sure. > >> > >> Should we also reject it as a client encoding, or does that work OK? > >> > >> regards, tom lane > >> > > >
Magnus, where are we on this? Seems we should allow unicode encoding and just not unicode locale in pginstaller. Also, Unicode is changing to UTF-8 in 8.1. --------------------------------------------------------------------------- Tatsuo Ishii wrote: > I do understand the problem, but don't undertstand the decision you > guys made. The fact that UPPER/LOWER and some other functions does not > work in win32 is surely a problem for some languages, but not a > problem for otheres. For example, Japanese (and probably Chinese and > Korean) does not have a concept upper/lower. So the fact UPPER/LOWER > does not work with UTF-8/win32 is not problem for Japanese (and for > some other languages). Just using C locale with UTF-8 is enough in > this case. > > In summary, I think you guys are going to overkill the multibyte > support functionality on UTF-8/win32 because of the fact that some > langauges do not work. > > Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well. > > I strongly object the policy to try to unconditionaly disable UTF-8 > support on win32. > -- > Tatsuo Ishii > > From: "Magnus Hagander" <mha@sollentuna.net> > Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 > Date: Sat, 1 Jan 2005 14:48:04 +0100 > Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se> > > > UNICODE/UTF-8 does not work on the win32 server. The reason is that > > strcoll() and friends don't work with it. To support it on win32, it > > needs to be converted to UTF16 and use the wide-character versions of > > the fucntion. Which we do not do. > > (See > > http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php > > and > > http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php) > > > > > > I don't *think* we need to disable ito n the client. AFAIK, the client > > interfaces don't use any of these functions, and I've seen reports of > > people using that long before we had a native win32 server. > > > > > > //Magnus > > > > > > >-----Original Message----- > > >From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > > >Sent: den 1 januari 2005 01:10 > > >To: tgl@sss.pgh.pa.us > > >Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org > > >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 > > > > > > > > >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the > > >problem here? > > >-- > > >Tatsuo Ishii > > > > > >> "Magnus Hagander" <mha@sollentuna.net> writes: > > >> > We know it's broken and won't be fixed for 8.0. > > >> > > >> > If we just #ifndef WIN32 the definitions in > > >utils/mb/encnames.c it won't > > >> > be possible to select that encoding, right? Will that have > > >any other > > >> > unwanted effects (such as breaking client encodings)? If > > >not, I suggest > > >> > this is done. > > >> > > >> I believe the subscripts in those arrays have to match the encoding > > >> enum type, so you can't just ifdef out individual entries. > > >> > > >> > (Or perhaps something can be done in pg_valid_server_encoding?) > > >> > > >> Making the valid_server_encoding function reject it might work. > > >> Tatsuo-san would know for sure. > > >> > > >> Should we also reject it as a client encoding, or does that work OK? > > >> > > >> regards, tom lane > > >> > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tatsuo Ishii wrote: > I do understand the problem, but don't undertstand the decision you > guys made. The fact that UPPER/LOWER and some other functions does not > work in win32 is surely a problem for some languages, but not a > problem for otheres. For example, Japanese (and probably Chinese and > Korean) does not have a concept upper/lower. So the fact UPPER/LOWER > does not work with UTF-8/win32 is not problem for Japanese (and for > some other languages). Just using C locale with UTF-8 is enough in > this case. > > In summary, I think you guys are going to overkill the multibyte > support functionality on UTF-8/win32 because of the fact that some > langauges do not work. > > Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well. > > I strongly object the policy to try to unconditionaly disable UTF-8 > support on win32. I have just applied a patch to CVS HEAD and 8.0.X that disables locale-aware handling of upper/lower/initcap when the locale is C or POSIX. With these changes, it seems safe to allow pginstaller to use UTF8 encoding of the locale is C/POSIX. If we don't do that, I am concerned that Asian users will either make a hacked installer or be required to run initdb manually by following complex instructions. We could throw a warning if the combination is selected as a compromise. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Where are we on this? As far as I can tell, we never disabled UTF8 on Win32 in our code. The only thing we did do was to disable UTF8 in pginstaller. See this FAQ item: http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6 Is the current setup OK? Should we allow UTF8 on Win32 for languages that can use C locale, like Asian languages? --------------------------------------------------------------------------- Tatsuo Ishii wrote: > I do understand the problem, but don't undertstand the decision you > guys made. The fact that UPPER/LOWER and some other functions does not > work in win32 is surely a problem for some languages, but not a > problem for otheres. For example, Japanese (and probably Chinese and > Korean) does not have a concept upper/lower. So the fact UPPER/LOWER > does not work with UTF-8/win32 is not problem for Japanese (and for > some other languages). Just using C locale with UTF-8 is enough in > this case. > > In summary, I think you guys are going to overkill the multibyte > support functionality on UTF-8/win32 because of the fact that some > langauges do not work. > > Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well. > > I strongly object the policy to try to unconditionaly disable UTF-8 > support on win32. > -- > Tatsuo Ishii > > From: "Magnus Hagander" <mha@sollentuna.net> > Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 > Date: Sat, 1 Jan 2005 14:48:04 +0100 > Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se> > > > UNICODE/UTF-8 does not work on the win32 server. The reason is that > > strcoll() and friends don't work with it. To support it on win32, it > > needs to be converted to UTF16 and use the wide-character versions of > > the fucntion. Which we do not do. > > (See > > http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php > > and > > http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php) > > > > > > I don't *think* we need to disable ito n the client. AFAIK, the client > > interfaces don't use any of these functions, and I've seen reports of > > people using that long before we had a native win32 server. > > > > > > //Magnus > > > > > > >-----Original Message----- > > >From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > > >Sent: den 1 januari 2005 01:10 > > >To: tgl@sss.pgh.pa.us > > >Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org > > >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32 > > > > > > > > >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the > > >problem here? > > >-- > > >Tatsuo Ishii > > > > > >> "Magnus Hagander" <mha@sollentuna.net> writes: > > >> > We know it's broken and won't be fixed for 8.0. > > >> > > >> > If we just #ifndef WIN32 the definitions in > > >utils/mb/encnames.c it won't > > >> > be possible to select that encoding, right? Will that have > > >any other > > >> > unwanted effects (such as breaking client encodings)? If > > >not, I suggest > > >> > this is done. > > >> > > >> I believe the subscripts in those arrays have to match the encoding > > >> enum type, so you can't just ifdef out individual entries. > > >> > > >> > (Or perhaps something can be done in pg_valid_server_encoding?) > > >> > > >> Making the valid_server_encoding function reject it might work. > > >> Tatsuo-san would know for sure. > > >> > > >> Should we also reject it as a client encoding, or does that work OK? > > >> > > >> regards, tom lane > > >> > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073