Re: Patch: add conversion from pg_wchar to multibyte - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Patch: add conversion from pg_wchar to multibyte
Date
Msg-id CAPpHfdvjejw0d5XyHoLXhvBpNiYiK_YbTN9395KGRjOMpqANPg@mail.gmail.com
Whole thread Raw
In response to Re: Patch: add conversion from pg_wchar to multibyte  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Patch: add conversion from pg_wchar to multibyte
List pgsql-hackers
On Tue, Jul 3, 2012 at 12:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 2, 2012 at 4:33 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov <aekorotkov@gmail.com>
>> wrote:
>> >> MULE also looks problematic.  The code that you've written isn't
>> >> symmetric with the opposite conversion, unlike what you did in all
>> >> other cases, and I don't understand why.  I'm also somewhat baffled by
>> >> the reverse conversion: it treats a multi-byte sequence beginning with
>> >> a byte for which IS_LCPRV1(x) returns true as invalid if there are
>> >> less than 3 bytes available, but it only reads two; similarly, for
>> >> IS_LCPRV2(x), it demands 4 bytes but converts only 3.
>> >
>> > Should we save existing pg_wchar representation for MULE encoding?
>> > Probably,
>> > we can modify it like in 0.1 version of patch in order to make it more
>> > transparent.
>>
>> Changing the encoding would break pg_upgrade, so -1 from me on that.
>
>
> I didn't realize that we store pg_wchar on disk somewhere. I thought it is
> only in-memory representation. Where do we store pg_wchar on disk?

OK, now I'm confused.  I was thinking (incorrectly) that you were
talking about changing the multibyte encoding, which of course is
saved on disk all over the place.  Changing the wchar encoding is a
different kettle of fish, and I have no idea what that would or would
not break.  But I don't see why we'd want to do such a thing.  We just
need to make the MB->WCHAR and WCHAR->MB transformations mirror images
of each other; why is that hard?

So, I provided such transformation in versions 0.3 and 0.4 based on explanation from Tatsuo Ishii. The problem is that both conversions are nontrivial and it's not evident that they are mirror (understanding that they are mirror require some additional assumptions about encodings, not evident just by transformation itself). I though you mention that problem two message back. 

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Next
From: Dimitri Fontaine
Date:
Subject: Re: Event Triggers reduced, v1