On Wed, Feb 15, 2017 at 9:27 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Feb 15, 2017 at 7:58 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 02/09/2017 09:33 AM, Michael Paquier wrote:
>>> Now regarding the shape of the implementation for SCRAM, we need one
>>> thing: a set of routines in src/common/ to build decompositions for a
>>> given UTF-8 string with conversion UTF8 string <=> pg_wchar array, the
>>> decomposition and the reordering. The extension attached roughly
>>> implements that. What we can actually do as well is have in contrib/ a
>>> module that does NFK[C|D] using the base APIs in src/common/. Using
>>> arrays of pg_wchar (integers) to manipulate the characters, we can
>>> validate and have a set of regression tests that do *not* have to
>>> print non-ASCII characters.
>>
>>
>> A contrib module or built-in extra functions to deal with Unicode characters
>> might be handy for a lot of things. But I'd leave that out for now, to keep
>> this patch minimal.
>
> No problem from me. I'll get something for SASLprep in the shape of
> something like the above. It should not take me long.
OK, attached is a patch that implements SASLprep that needs to be
applied on top of the other ones. When working on the table reduction,
the worst size was at 2.4MB. After removing all the characters with a
class of 0 and no decomposition, I am able to get that down to 570kB.
After splitting the decompositions by size into their own tables, it
got down to 120kB, which is even nicer. One thing that I forgot
previously was the handling of the decomposition of Hangul characters
(Korean stuff) which is algorithmic, so you actually do not need a
table for them. The algorithm is here for the curious =>
http://unicode.org/reports/tr15/tr15-18.html#Hangul.
The patch includes the conversion tables, which is why it is large,
and the perl script that I used to generate it. It has been pushed as
well on my github branch. The basics are here I think, still this
portion really needs a careful review. I have done some basic tests
and things are basically working, but I have been able to break things
pretty easily when using some exotic characters. The conversion tables
look correct, I have tested it using my module which implements NFKC
(https://github.com/michaelpq/pg_plugins/tree/master/pg_sasl_prepare),
still much refinement needs to be done.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers