Thread: libpq Unicode support?

libpq Unicode support?

From
Ale Raza
Date:
Wondering if libpq lib support unicode?

Ale.


Re: libpq Unicode support?

From
Tom Lane
Date:
Ale Raza <araza@esri.com> writes:
> Wondering if libpq lib support unicode?

What sort of "support" have you got in mind?  It passes UTF-8 data
through just fine.

            regards, tom lane

Re: libpq Unicode support?

From
Ale Raza
Date:
Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special
build of libpq for UTF-16. I did not build libpq locally.

Ale

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 22, 2005 11:10 AM
To: Ale Raza
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] libpq Unicode support?


Ale Raza <araza@esri.com> writes:
> Wondering if libpq lib support unicode?

What sort of "support" have you got in mind?  It passes UTF-8 data
through just fine.

            regards, tom lane

Re: libpq Unicode support?

From
Tom Lane
Date:
Ale Raza <araza@esri.com> writes:
> Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special
> build of libpq for UTF-16. I did not build libpq locally.

Nope, you're out of luck on UTF-16.

            regards, tom lane

Re: libpq Unicode support?

From
Bruce Momjian
Date:
Ale Raza wrote:
> Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special
> build of libpq for UTF-16. I did not build libpq locally.

We do not support UTF-16 at this time.  Hopefully we will in 8.1.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: libpq Unicode support?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> We do not support UTF-16 at this time.  Hopefully we will in 8.1.

Oh?  Who's working on it, or even interested?  Was there discussion
of adding it to TODO?

I think it would be an extremely nontrivial change, which is why
I am not pleased with making casual promises that it will appear
soon (or indeed at all).

            regards, tom lane

Re: libpq Unicode support?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > We do not support UTF-16 at this time.  Hopefully we will in 8.1.
>
> Oh?  Who's working on it, or even interested?  Was there discussion
> of adding it to TODO?
>
> I think it would be an extremely nontrivial change, which is why
> I am not pleased with making casual promises that it will appear
> soon (or indeed at all).

TODO has:

        o Add support for Unicode

          To fix this, the data needs to be converted to/from UTF16/UTF8
          so the Win32 wcscoll() can be used, and perhaps other functions
          like towupper().  However, UTF8 already works with normal
          locales but provides no ordering or character set classes.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: libpq Unicode support?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Oh?  Who's working on it, or even interested?  Was there discussion
>> of adding it to TODO?

> TODO has:

>         o Add support for Unicode

>           To fix this, the data needs to be converted to/from UTF16/UTF8
>           so the Win32 wcscoll() can be used, and perhaps other functions
>           like towupper().  However, UTF8 already works with normal
>           locales but provides no ordering or character set classes.

That's completely unrelated --- it's talking about making correct use of
Windows' locale support in one small bit inside the server.

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char.  After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server).  I don't
foresee anyone doing any of this, at least not in the near term.

Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
more practical.

            regards, tom lane

Re: libpq Unicode support?

From
Ale Raza
Date:
Are we not going to lose some characters if we are putting a UTF-16 to UTF-8
translation in front of libpq?

Ale.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 22, 2005 12:14 PM
To: Bruce Momjian
Cc: Ale Raza; pgsql-general@postgresql.org
Subject: Re: [GENERAL] libpq Unicode support?


Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Oh?  Who's working on it, or even interested?  Was there discussion
>> of adding it to TODO?

> TODO has:

>         o Add support for Unicode

>           To fix this, the data needs to be converted to/from UTF16/UTF8
>           so the Win32 wcscoll() can be used, and perhaps other functions
>           like towupper().  However, UTF8 already works with normal
>           locales but provides no ordering or character set classes.

That's completely unrelated --- it's talking about making correct use of
Windows' locale support in one small bit inside the server.

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char.  After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server).  I don't
foresee anyone doing any of this, at least not in the near term.

Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
more practical.

            regards, tom lane

Re: libpq Unicode support?

From
Ben
Date:
Why would you? UTF-16 and UTF-8 are just different representations for the
same domain of characters.

On Fri, 22 Apr 2005, Ale Raza wrote:

> Are we not going to lose some characters if we are putting a UTF-16 to UTF-8
> translation in front of libpq?
>
> Ale.
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, April 22, 2005 12:14 PM
> To: Bruce Momjian
> Cc: Ale Raza; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] libpq Unicode support?
>
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Oh?  Who's working on it, or even interested?  Was there discussion
> >> of adding it to TODO?
>
> > TODO has:
>
> >         o Add support for Unicode
>
> >           To fix this, the data needs to be converted to/from UTF16/UTF8
> >           so the Win32 wcscoll() can be used, and perhaps other functions
> >           like towupper().  However, UTF8 already works with normal
> >           locales but provides no ordering or character set classes.
>
> That's completely unrelated --- it's talking about making correct use of
> Windows' locale support in one small bit inside the server.
>
> To make libpq UTF-16 capable, we'd have to change its API for all
> strings; either make the strings counted rather than null-terminated,
> or make the string elements wchar instead of char.  After that we'd
> have to hack the FE/BE protocol too (or more likely, require libpq
> to translate UTF-16 to UTF-8 before sending to the server).  I don't
> foresee anyone doing any of this, at least not in the near term.
>
> Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
> more practical.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>



Re: libpq Unicode support?

From
Peter Eisentraut
Date:
Ale Raza wrote:
> Are we not going to lose some characters if we are putting a UTF-16
> to UTF-8 translation in front of libpq?

No, they are just different encodings of the same character set.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: libpq Unicode support?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Oh?  Who's working on it, or even interested?  Was there discussion
> >> of adding it to TODO?
>
> > TODO has:
>
> >         o Add support for Unicode
>
> >           To fix this, the data needs to be converted to/from UTF16/UTF8
> >           so the Win32 wcscoll() can be used, and perhaps other functions
> >           like towupper().  However, UTF8 already works with normal
> >           locales but provides no ordering or character set classes.
>
> That's completely unrelated --- it's talking about making correct use of
> Windows' locale support in one small bit inside the server.
>
> To make libpq UTF-16 capable, we'd have to change its API for all
> strings; either make the strings counted rather than null-terminated,
> or make the string elements wchar instead of char.  After that we'd
> have to hack the FE/BE protocol too (or more likely, require libpq
> to translate UTF-16 to UTF-8 before sending to the server).  I don't
> foresee anyone doing any of this, at least not in the near term.
>
> Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
> more practical.

So the Win32 fix and the libpq translation are two different issues.
Hmm.

Agreed we don't want to support both UTF8 and UTF16 in the backend.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: libpq Unicode support?

From
Karsten Hilbert
Date:
Tom Lane wrote:
> To make libpq UTF-16 capable, we'd have to change its API for all
> strings; either make the strings counted rather than null-terminated,
> or make the string elements wchar instead of char.  After that we'd
> have to hack the FE/BE protocol too (or more likely, require libpq
> to translate UTF-16 to UTF-8 before sending to the server).  I don't
> foresee anyone doing any of this, at least not in the near term.
Is there any *real* loss of functionality in not supporting
UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
should there be a FAQ item saying why not ?

Thanks for a great database,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: libpq Unicode support?

From
Bruce Momjian
Date:
Karsten Hilbert wrote:
> Tom Lane wrote:
> > To make libpq UTF-16 capable, we'd have to change its API for all
> > strings; either make the strings counted rather than null-terminated,
> > or make the string elements wchar instead of char.  After that we'd
> > have to hack the FE/BE protocol too (or more likely, require libpq
> > to translate UTF-16 to UTF-8 before sending to the server).  I don't
> > foresee anyone doing any of this, at least not in the near term.
> Is there any *real* loss of functionality in not supporting

> UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
> should there be a FAQ item saying why not ?

Is there a reason you have to use UTF16?  Can't you convert to UTF8 on
input?  (I have no idea myself.)  Do other databases support both UTf8
and UTF16?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: libpq Unicode support?

From
Karsten Hilbert
Date:
On Fri, Apr 22, 2005 at 05:28:28PM -0400, Bruce Momjian wrote:

> > UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
> > should there be a FAQ item saying why not ?
>
> Is there a reason you have to use UTF16?
No. I don't currently use either one (that is I am using a
"unicode" database with appropriate "set client_encoding"s
which works as expected. I am just wondering whether we should
add a FAQ item why UTF16 doesn't need to be supported.

> Can't you convert to UTF8 on input?
I likely could would I have to.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: libpq Unicode support?

From
Bruce Momjian
Date:
Karsten Hilbert wrote:
> On Fri, Apr 22, 2005 at 05:28:28PM -0400, Bruce Momjian wrote:
>
> > > UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
> > > should there be a FAQ item saying why not ?
> >
> > Is there a reason you have to use UTF16?
> No. I don't currently use either one (that is I am using a
> "unicode" database with appropriate "set client_encoding"s
> which works as expected. I am just wondering whether we should
> add a FAQ item why UTF16 doesn't need to be supported.

Well, we need to support UTF16 on Win32 only because Win32 libc
libraries doesn't support UTF8, but other than that UTF16 isn't much of
an issue for our users.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: libpq Unicode support?

From
Tom Lane
Date:
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> Tom Lane wrote:
>> To make libpq UTF-16 capable, we'd have to change its API for all
>> strings; either make the strings counted rather than null-terminated,
>> or make the string elements wchar instead of char.  After that we'd
>> have to hack the FE/BE protocol too (or more likely, require libpq
>> to translate UTF-16 to UTF-8 before sending to the server).  I don't
>> foresee anyone doing any of this, at least not in the near term.

> Is there any *real* loss of functionality in not supporting
> UTF-16 ?

Functionality, no: UTF-16 and UTF-8 are functionally equivalent by definition.

I think the reason that it's started to come up lately is that Windows
supports UTF-16 better than UTF-8 (whereas the reverse is true on most
Unixish platforms).

If libpq were the only available API then I'd be more concerned about
making it handle this somehow.  But if you're working in, say, Java
then this issue is all taken care of for you anyway.  There are enough
other Unix-centricities in libpq that this hardly seems the worst.

Possibly someone will be motivated to start a project to design a
Windows client library from scratch ...

            regards, tom lane

Re: libpq Unicode support?

From
David Roussel
Date:
>  Do other databases support both UTf8 and UTF16?
>

Oracle supports UTF-8, UTF-16 an some other special UFT encodings.  I
think some of them are pre UTF-8 becoming ratified, hence they are
partially compatible.

It's an install time option for an Oracle database.  ASCII databases
can be upgraded to UTF-8, but not vice versa, and it affects all
schema's in the database.

I had an oracle system that was non-unicode, some body wanted to
support the euro currency symbol. We tried it, it inserted fine, but
came back in a select as another character.  The only option was custom
escaping all over the place, or migrating oracle.  Given the amount of
regression testing that would be needed for all the apps on the oracle
system (200 users, 16 processor box, billions of dollars worth of
transactions) it was not worth the effort.  People had to type 'EUR'
instead of €.