Patch: add conversion from pg_wchar to multibyte - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Patch: add conversion from pg_wchar to multibyte
Date
Msg-id CAPpHfdshcHe1ZPQhyd2xhAKnNu0VpdMPuGFtvribqJcnH0K2Ew@mail.gmail.com
Whole thread Raw
Responses Re: Patch: add conversion from pg_wchar to multibyte  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hackers,

attached patch adds conversion from pg_wchar string to multibyte string.
This functionality is needed for my patch on index support for regular expression search http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php .
Analyzing conversion from multibyte to pg_wchar I found following types of conversion:
1) Trivial conversion for single-byte encoding. It just adds leading zeros to each byte.
2) Conversion from UTF-8 to unicode.
3) Conversions from euc* encodings. They write bytes of a character to pg_wchar in inverse order starting from lower byte (this explanation assume little endian system).
4) Conversion from mule encoding. This conversion is unclear for me and also seems to be lossy.

It was easy to write inverse conversion for 1-3. I've changed 4 conversion to behave like 3. I'm not sure my change is ok, because I didn't understand original conversion.

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: plpython triggers are broken for composite-type columns
Next
From: Boszormenyi Zoltan
Date:
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework