Thread: UNICODE/UTF-8 on win32

UNICODE/UTF-8 on win32

From
"Magnus Hagander"
Date:
We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in utils/mb/encnames.c it won't
be possible to select that encoding, right? Will that have any other
unwanted effects (such as breaking client encodings)? If not, I suggest
this is done.

(Or perhaps something can be done in pg_valid_server_encoding?)

//Magnus


Re: UNICODE/UTF-8 on win32

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> We know it's broken and won't be fixed for 8.0.

> If we just #ifndef WIN32 the definitions in utils/mb/encnames.c it won't
> be possible to select that encoding, right? Will that have any other
> unwanted effects (such as breaking client encodings)? If not, I suggest
> this is done.

I believe the subscripts in those arrays have to match the encoding
enum type, so you can't just ifdef out individual entries.

> (Or perhaps something can be done in pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does that work OK?

            regards, tom lane

Re: UNICODE/UTF-8 on win32

From
Roland Volkmann
Date:
Hello,

Tom Lane schrieb am 31.12.2004 20:21:
> "Magnus Hagander" <mha@sollentuna.net> writes:
>
>>We know it's broken and won't be fixed for 8.0.
>
>
>>If we just #ifndef WIN32 the definitions in utils/mb/encnames.c it won't
>>be possible to select that encoding, right? Will that have any other
>>unwanted effects (such as breaking client encodings)? If not, I suggest
>>this is done.
>
>
> I believe the subscripts in those arrays have to match the encoding
> enum type, so you can't just ifdef out individual entries.
>
>
>>(Or perhaps something can be done in pg_valid_server_encoding?)
>
>
> Making the valid_server_encoding function reject it might work.
> Tatsuo-san would know for sure.
>
> Should we also reject it as a client encoding, or does that work OK?

what ever you will decide to do, please don't reject UTF-8 as a valid
client encoding. This would break existing applications in our company,
and I'm sure, not only there.


With best regards,

Roland.


Re: UNICODE/UTF-8 on win32

From
Tatsuo Ishii
Date:
Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
problem here?
--
Tatsuo Ishii

> "Magnus Hagander" <mha@sollentuna.net> writes:
> > We know it's broken and won't be fixed for 8.0.
>
> > If we just #ifndef WIN32 the definitions in utils/mb/encnames.c it won't
> > be possible to select that encoding, right? Will that have any other
> > unwanted effects (such as breaking client encodings)? If not, I suggest
> > this is done.
>
> I believe the subscripts in those arrays have to match the encoding
> enum type, so you can't just ifdef out individual entries.
>
> > (Or perhaps something can be done in pg_valid_server_encoding?)
>
> Making the valid_server_encoding function reject it might work.
> Tatsuo-san would know for sure.
>
> Should we also reject it as a client encoding, or does that work OK?
>
>             regards, tom lane
>

Re: UNICODE/UTF-8 on win32

From
"Magnus Hagander"
Date:
UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)


I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.


//Magnus


>-----Original Message-----
>From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
>Sent: den 1 januari 2005 01:10
>To: tgl@sss.pgh.pa.us
>Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
>Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
>
>
>Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
>problem here?
>--
>Tatsuo Ishii
>
>> "Magnus Hagander" <mha@sollentuna.net> writes:
>> > We know it's broken and won't be fixed for 8.0.
>>
>> > If we just #ifndef WIN32 the definitions in
>utils/mb/encnames.c it won't
>> > be possible to select that encoding, right? Will that have
>any other
>> > unwanted effects (such as breaking client encodings)? If
>not, I suggest
>> > this is done.
>>
>> I believe the subscripts in those arrays have to match the encoding
>> enum type, so you can't just ifdef out individual entries.
>>
>> > (Or perhaps something can be done in pg_valid_server_encoding?)
>>
>> Making the valid_server_encoding function reject it might work.
>> Tatsuo-san would know for sure.
>>
>> Should we also reject it as a client encoding, or does that work OK?
>>
>>             regards, tom lane
>>
>

Re: UNICODE/UTF-8 on win32

From
Bruce Momjian
Date:
TODO updated:

        o Disallow encodings like UTF8 which PostgreSQL supports
          but the operating system does not (already disallowed by
          pginstaller)

          To fix UTF8, the data needs to be converted to UTF16 and then
          the Win32 strcoll() can be used.


---------------------------------------------------------------------------

Magnus Hagander wrote:
> UNICODE/UTF-8 does not work on the win32 server. The reason is that
> strcoll() and friends don't work with it. To support it on win32, it
> needs to be converted to UTF16 and use the wide-character versions of
> the fucntion. Which we do not do.
> (See
> http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
> and
> http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)
>
>
> I don't *think* we need to disable ito n the client. AFAIK, the client
> interfaces don't use any of these functions, and I've seen reports of
> people using that long before we had a native win32 server.
>
>
> //Magnus
>
>
> >-----Original Message-----
> >From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
> >Sent: den 1 januari 2005 01:10
> >To: tgl@sss.pgh.pa.us
> >Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
> >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> >
> >
> >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
> >problem here?
> >--
> >Tatsuo Ishii
> >
> >> "Magnus Hagander" <mha@sollentuna.net> writes:
> >> > We know it's broken and won't be fixed for 8.0.
> >>
> >> > If we just #ifndef WIN32 the definitions in
> >utils/mb/encnames.c it won't
> >> > be possible to select that encoding, right? Will that have
> >any other
> >> > unwanted effects (such as breaking client encodings)? If
> >not, I suggest
> >> > this is done.
> >>
> >> I believe the subscripts in those arrays have to match the encoding
> >> enum type, so you can't just ifdef out individual entries.
> >>
> >> > (Or perhaps something can be done in pg_valid_server_encoding?)
> >>
> >> Making the valid_server_encoding function reject it might work.
> >> Tatsuo-san would know for sure.
> >>
> >> Should we also reject it as a client encoding, or does that work OK?
> >>
> >>             regards, tom lane
> >>
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: UNICODE/UTF-8 on win32

From
"Magnus Hagander"
Date:
>TODO updated:
>
>        o Disallow encodings like UTF8 which PostgreSQL supports
>          but the operating system does not (already disallowed by
>          pginstaller)
>
>          To fix UTF8, the data needs to be converted to UTF16 and then
>          the Win32 strcoll() can be used.

Not quite. We'd use the wcscoll() function. strcoll() does not work with
what windows calls "wide characters", which is UTF16, only with
"multibyte characters". The whole point of the fix is to be able to use
wcscoll() instead.

Also, not AFAIK, not only strcoll(), but also whatever is used to
generate UPPER() and LOWER() needs to be fixed. Possibly more?

//Magnus

Re: UNICODE/UTF-8 on win32

From
Bruce Momjian
Date:
Magnus Hagander wrote:
>
> >TODO updated:
> >
> >        o Disallow encodings like UTF8 which PostgreSQL supports
> >          but the operating system does not (already disallowed by
> >          pginstaller)
> >
> >          To fix UTF8, the data needs to be converted to UTF16 and then
> >          the Win32 strcoll() can be used.
>
> Not quite. We'd use the wcscoll() function. strcoll() does not work with
> what windows calls "wide characters", which is UTF16, only with
> "multibyte characters". The whole point of the fix is to be able to use
> wcscoll() instead.

OK, updated.

> Also, not AFAIK, not only strcoll(), but also whatever is used to
> generate UPPER() and LOWER() needs to be fixed. Possibly more?

OK.  I think you mean towupper().

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073