Re: Radix tree for character conversion - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Radix tree for character conversion
Date
Msg-id 20161021.173321.105120238.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Radix tree for character conversion  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Radix tree for character conversion  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hello, this is new version of radix charconv.

At Sat, 8 Oct 2016 00:37:28 +0300, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
<6d85d710-9554-a928-29ff-b2d3b80b01c9@iki.fi>
> What I don't want is that the current *.map files are turned into the
> authoritative source files, that we modify by hand. There are no
> comments in them, for starters, which makes hand-editing
> cumbersome. It seems that we have edited some of them by hand already,
> but we should rectify that.

Agreed. So, I identifed source files of each character for EUC_JP
and SJIS conversions to clarify what has been done on them.

SJIS conversion is made from CP932.TXT and 8 additional
conversions for UTF8->SJIS and none for SJIS->UTF8.

EUC_JP is made from CP932.TXT and JIS0212.TXT. JIS0201.TXT and
JIS0208.TXT are useless. It adds 83 or 86 (different by
direction) conversion entries.

http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0212.TXT

Now the generator scripts don't use *.map as source and in turn
generates old-style map files as well as radix tree files.

For convenience, UCS_to_(SJIS|EUC_JP).pl takes parater --flat and
-v. The format generates the old-style flat map as well as radix
map file and additional -v adds source description for each line
in the flat map file.

During working on this, EUC_JP map lacks some conversions but it
is another issue.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: [RFC] Transaction management overhaul is necessary?
Next
From: David Steele
Date:
Subject: Re: Renaming of pg_xlog and pg_clog