Thread: libpq Unicode support?
Wondering if libpq lib support unicode? Ale.
Ale Raza <araza@esri.com> writes: > Wondering if libpq lib support unicode? What sort of "support" have you got in mind? It passes UTF-8 data through just fine. regards, tom lane
Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special build of libpq for UTF-16. I did not build libpq locally. Ale -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, April 22, 2005 11:10 AM To: Ale Raza Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] libpq Unicode support? Ale Raza <araza@esri.com> writes: > Wondering if libpq lib support unicode? What sort of "support" have you got in mind? It passes UTF-8 data through just fine. regards, tom lane
Ale Raza <araza@esri.com> writes: > Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special > build of libpq for UTF-16. I did not build libpq locally. Nope, you're out of luck on UTF-16. regards, tom lane
Ale Raza wrote: > Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special > build of libpq for UTF-16. I did not build libpq locally. We do not support UTF-16 at this time. Hopefully we will in 8.1. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > We do not support UTF-16 at this time. Hopefully we will in 8.1. Oh? Who's working on it, or even interested? Was there discussion of adding it to TODO? I think it would be an extremely nontrivial change, which is why I am not pleased with making casual promises that it will appear soon (or indeed at all). regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > We do not support UTF-16 at this time. Hopefully we will in 8.1. > > Oh? Who's working on it, or even interested? Was there discussion > of adding it to TODO? > > I think it would be an extremely nontrivial change, which is why > I am not pleased with making casual promises that it will appear > soon (or indeed at all). TODO has: o Add support for Unicode To fix this, the data needs to be converted to/from UTF16/UTF8 so the Win32 wcscoll() can be used, and perhaps other functions like towupper(). However, UTF8 already works with normal locales but provides no ordering or character set classes. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Oh? Who's working on it, or even interested? Was there discussion >> of adding it to TODO? > TODO has: > o Add support for Unicode > To fix this, the data needs to be converted to/from UTF16/UTF8 > so the Win32 wcscoll() can be used, and perhaps other functions > like towupper(). However, UTF8 already works with normal > locales but provides no ordering or character set classes. That's completely unrelated --- it's talking about making correct use of Windows' locale support in one small bit inside the server. To make libpq UTF-16 capable, we'd have to change its API for all strings; either make the strings counted rather than null-terminated, or make the string elements wchar instead of char. After that we'd have to hack the FE/BE protocol too (or more likely, require libpq to translate UTF-16 to UTF-8 before sending to the server). I don't foresee anyone doing any of this, at least not in the near term. Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot more practical. regards, tom lane
Are we not going to lose some characters if we are putting a UTF-16 to UTF-8 translation in front of libpq? Ale. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, April 22, 2005 12:14 PM To: Bruce Momjian Cc: Ale Raza; pgsql-general@postgresql.org Subject: Re: [GENERAL] libpq Unicode support? Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Oh? Who's working on it, or even interested? Was there discussion >> of adding it to TODO? > TODO has: > o Add support for Unicode > To fix this, the data needs to be converted to/from UTF16/UTF8 > so the Win32 wcscoll() can be used, and perhaps other functions > like towupper(). However, UTF8 already works with normal > locales but provides no ordering or character set classes. That's completely unrelated --- it's talking about making correct use of Windows' locale support in one small bit inside the server. To make libpq UTF-16 capable, we'd have to change its API for all strings; either make the strings counted rather than null-terminated, or make the string elements wchar instead of char. After that we'd have to hack the FE/BE protocol too (or more likely, require libpq to translate UTF-16 to UTF-8 before sending to the server). I don't foresee anyone doing any of this, at least not in the near term. Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot more practical. regards, tom lane
Why would you? UTF-16 and UTF-8 are just different representations for the same domain of characters. On Fri, 22 Apr 2005, Ale Raza wrote: > Are we not going to lose some characters if we are putting a UTF-16 to UTF-8 > translation in front of libpq? > > Ale. > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, April 22, 2005 12:14 PM > To: Bruce Momjian > Cc: Ale Raza; pgsql-general@postgresql.org > Subject: Re: [GENERAL] libpq Unicode support? > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Oh? Who's working on it, or even interested? Was there discussion > >> of adding it to TODO? > > > TODO has: > > > o Add support for Unicode > > > To fix this, the data needs to be converted to/from UTF16/UTF8 > > so the Win32 wcscoll() can be used, and perhaps other functions > > like towupper(). However, UTF8 already works with normal > > locales but provides no ordering or character set classes. > > That's completely unrelated --- it's talking about making correct use of > Windows' locale support in one small bit inside the server. > > To make libpq UTF-16 capable, we'd have to change its API for all > strings; either make the strings counted rather than null-terminated, > or make the string elements wchar instead of char. After that we'd > have to hack the FE/BE protocol too (or more likely, require libpq > to translate UTF-16 to UTF-8 before sending to the server). I don't > foresee anyone doing any of this, at least not in the near term. > > Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot > more practical. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
Ale Raza wrote: > Are we not going to lose some characters if we are putting a UTF-16 > to UTF-8 translation in front of libpq? No, they are just different encodings of the same character set. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Oh? Who's working on it, or even interested? Was there discussion > >> of adding it to TODO? > > > TODO has: > > > o Add support for Unicode > > > To fix this, the data needs to be converted to/from UTF16/UTF8 > > so the Win32 wcscoll() can be used, and perhaps other functions > > like towupper(). However, UTF8 already works with normal > > locales but provides no ordering or character set classes. > > That's completely unrelated --- it's talking about making correct use of > Windows' locale support in one small bit inside the server. > > To make libpq UTF-16 capable, we'd have to change its API for all > strings; either make the strings counted rather than null-terminated, > or make the string elements wchar instead of char. After that we'd > have to hack the FE/BE protocol too (or more likely, require libpq > to translate UTF-16 to UTF-8 before sending to the server). I don't > foresee anyone doing any of this, at least not in the near term. > > Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot > more practical. So the Win32 fix and the libpq translation are two different issues. Hmm. Agreed we don't want to support both UTF8 and UTF16 in the backend. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote: > To make libpq UTF-16 capable, we'd have to change its API for all > strings; either make the strings counted rather than null-terminated, > or make the string elements wchar instead of char. After that we'd > have to hack the FE/BE protocol too (or more likely, require libpq > to translate UTF-16 to UTF-8 before sending to the server). I don't > foresee anyone doing any of this, at least not in the near term. Is there any *real* loss of functionality in not supporting UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not, should there be a FAQ item saying why not ? Thanks for a great database, Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert wrote: > Tom Lane wrote: > > To make libpq UTF-16 capable, we'd have to change its API for all > > strings; either make the strings counted rather than null-terminated, > > or make the string elements wchar instead of char. After that we'd > > have to hack the FE/BE protocol too (or more likely, require libpq > > to translate UTF-16 to UTF-8 before sending to the server). I don't > > foresee anyone doing any of this, at least not in the near term. > Is there any *real* loss of functionality in not supporting > UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not, > should there be a FAQ item saying why not ? Is there a reason you have to use UTF16? Can't you convert to UTF8 on input? (I have no idea myself.) Do other databases support both UTf8 and UTF16? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, Apr 22, 2005 at 05:28:28PM -0400, Bruce Momjian wrote: > > UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not, > > should there be a FAQ item saying why not ? > > Is there a reason you have to use UTF16? No. I don't currently use either one (that is I am using a "unicode" database with appropriate "set client_encoding"s which works as expected. I am just wondering whether we should add a FAQ item why UTF16 doesn't need to be supported. > Can't you convert to UTF8 on input? I likely could would I have to. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert wrote: > On Fri, Apr 22, 2005 at 05:28:28PM -0400, Bruce Momjian wrote: > > > > UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not, > > > should there be a FAQ item saying why not ? > > > > Is there a reason you have to use UTF16? > No. I don't currently use either one (that is I am using a > "unicode" database with appropriate "set client_encoding"s > which works as expected. I am just wondering whether we should > add a FAQ item why UTF16 doesn't need to be supported. Well, we need to support UTF16 on Win32 only because Win32 libc libraries doesn't support UTF8, but other than that UTF16 isn't much of an issue for our users. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes: > Tom Lane wrote: >> To make libpq UTF-16 capable, we'd have to change its API for all >> strings; either make the strings counted rather than null-terminated, >> or make the string elements wchar instead of char. After that we'd >> have to hack the FE/BE protocol too (or more likely, require libpq >> to translate UTF-16 to UTF-8 before sending to the server). I don't >> foresee anyone doing any of this, at least not in the near term. > Is there any *real* loss of functionality in not supporting > UTF-16 ? Functionality, no: UTF-16 and UTF-8 are functionally equivalent by definition. I think the reason that it's started to come up lately is that Windows supports UTF-16 better than UTF-8 (whereas the reverse is true on most Unixish platforms). If libpq were the only available API then I'd be more concerned about making it handle this somehow. But if you're working in, say, Java then this issue is all taken care of for you anyway. There are enough other Unix-centricities in libpq that this hardly seems the worst. Possibly someone will be motivated to start a project to design a Windows client library from scratch ... regards, tom lane
> Do other databases support both UTf8 and UTF16? > Oracle supports UTF-8, UTF-16 an some other special UFT encodings. I think some of them are pre UTF-8 becoming ratified, hence they are partially compatible. It's an install time option for an Oracle database. ASCII databases can be upgraded to UTF-8, but not vice versa, and it affects all schema's in the database. I had an oracle system that was non-unicode, some body wanted to support the euro currency symbol. We tried it, it inserted fine, but came back in a select as another character. The only option was custom escaping all over the place, or migrating oracle. Given the amount of regression testing that would be needed for all the apps on the oracle system (200 users, 16 processor box, billions of dollars worth of transactions) it was not worth the effort. People had to type 'EUR' instead of €.