On Tue, Aug 13, 2024 at 6:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Patch 3 makes sense too, some comments on the details:
> The #ifdefs and the LCONV_MEMBER stuff makes it a bit hard to follow
> what happens in each implementation strategy. I wonder if it would be
> more clear to duplicate more code.
I tried to make it easier to follow.
> There's a comment at the top of pg_locale.c ("!!! NOW HEAR THIS !!!")
> that needs to be removed or adjusted now.
Yeah. We can remove that PSA if we also fix up the equivalent code
for LC_TIME. First attempt at that attached.
> > * The POSIX standard explicitly says that it is undefined what happens if
> > * LC_MONETARY or LC_NUMERIC imply an encoding (codeset) different from
> > * that implied by LC_CTYPE. In practice, all Unix-ish platforms seem to
> > * believe that localeconv() should return strings that are encoded in the
> > * codeset implied by the LC_MONETARY or LC_NUMERIC locale name. Hence,
> > * once we have successfully collected the localeconv() results, we will
> > * convert them from that codeset to the desired server encoding.
>
> The patch loses this comment, leaving just a much shorter comment in the
> WIN32 implementation. But it still seems like a relevant comment for the
> !WIN32 implementation too.
New version makes it much clearer, and also is much more careful about
what exactly happens if you have mismatched encodings.
(Over in CF #3772 I was exploring the idea of banning the use of
locales that are not compatible with the database encoding. As far as
I can guess, that idea must have come from the time when Windows
didn't have native UTF-8 support. Now it does. There I was mostly
interested in killing all the whcar_t conversion code, but maybe we
could also delete a few lines of transcoding around here too?)