Re: [GENERAL] trouble with to_char('L') - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [GENERAL] trouble with to_char('L')
Date
Msg-id 201004201310.o3KDAIR27248@momjian.us
Whole thread Raw
In response to Re: [GENERAL] trouble with to_char('L')  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
Magnus Hagander wrote:
> > One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
> > on Win32 and then just convert that always to the server encoding with
> > win32_wchar_to_db_encoding(), instead of using the encoding from
> > LC_MONETARY to set LC_CTYPE and having to do double-conversion.
> 
> So, hugely late, reviving this thread.
> 
> Ideally, we should definitely consider doing that. Internally, Windows
> will do it in UTF16 anyway. So we're basically doing
> UTF16->db->UTF16->UTF8->db or something like that with this patch.
> 
> But I'm unsure how that would work. We're talking about the output of
> localeconv(), right? I don't see a version of localeconv() that does
> wide chars anywhere. (You can't just set LC_CTYPE and use the regular
> function - Windows has a separate set of functions for dealing with
> UTF16).

I thought there was an LC_CTYPE for UTF16 that we could use without a
wide version of that function.  If not, forget that idea.

> Looking at the patch, you're passing "item" to db_encoding_strdup()
> but it doesn't seem to be used anywhere. Leftover from previous
> experiments, or forgot to use it? Perhaps you intended for it to be in
> the error messages?

It originally was in the error message but can be removed.  I have now
removed 'item' from my version of the patch.

> Also, won't this need special-casing for UTF8? Per comment in
> mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to
> use MultiByteToWideChar().

Well, we don't support UTF8 for any of the non-encoding locales, e.g.
monetary, numeric, so I never considered that we would support it.  If
we did support it, we would have to _pick_ a locale that is <= 2 bytes
per character and use that, and then convert to UTF8, but what locale
would we pick?  They could use a LC_TYPE that is <= 2 bytes and a
numeric that is UTF8, but I never suspected we would want to support
that, and we would need some logic to detect that case.

> I also note that we have char2wchar() already - we should perhaps just
> call that? Or will that use the wrong locale?

I see char2wchar() calling GetDatabaseEncoding() right away, which does
use the cached value for the server encoding, so I don't think it will
work.  We can't use our existing routines to convert _from_ the current
encoding to wide characters (because our numeric encoding might not
match the server encoding).  However, we can use existing code that
converts from wide to the server encoding, perhaps replacing
win32_wchar_to_db_encoding().

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: testing HS/SR - 1 vs 2 performance
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] trouble with to_char('L')