Re: Implementation of SASLprep for SCRAM-SHA-256 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Implementation of SASLprep for SCRAM-SHA-256
Date
Msg-id bcdd548d-04ce-69a2-1328-29627104d212@iki.fi
Whole thread Raw
In response to Re: Implementation of SASLprep for SCRAM-SHA-256  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 04/05/2017 07:23 AM, Michael Paquier wrote:
> fore
>
> On Wed, Apr 5, 2017 at 7:05 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I will continue tomorrow, but I wanted to report on what I've done so far.
>> Attached is a new patch version, quite heavily modified. Notable changes so
>> far:
>
> Great, thanks!
>
>> * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints.
>> IMHO this makes the tables easier to read (to a human), and they are also
>> packed slightly more tightly (see next two points), as you can fit more
>> codepoints in a 16-bit integer.
>
> Using directly codepoints is not much consistent with the existing
> backend, but for the sake of packing things more, OK.

Oh, I see, we already have similar functions in wchar.c. 
unicode_to_utf8() and utf8_to_unicode(). We should probably move those 
to src/common, rather than re-invent the wheel.

> pg_utf8_islegal() and pg_utf_mblen() should as well be moved in their
> own file I think, and wchar.c can use that.

Yeah..

>> * The list of characters excluded from recomposition is currently hard-coded
>> in utf_norm_generate.pl. However, that list is available in machine-readable
>> format, in file CompositionExclusions.txt. Since we're reading most of the
>> data from UnicodeData.txt, would be good to read the exclusion table from a
>> file, too.
>
> Ouch. Those are present here...
> http://www.unicode.org/reports/tr41/tr41-19.html#Exclusions
> Definitely it makes more sense to read them from a file.

Did that.

>> * SASLPrep specifies normalization form KC, but it also specifies that some
>> characters are mapped to space or nothing. Should do those mappings, too.
>
> Ah, right. Those ones are here:
> https://tools.ietf.org/html/rfc3454#appendix-B.1

Yep.


Attached is a new version. Notable changes since yesterday:

* Implemented the rest of the SASLPrep, mapping some characters to 
spaces, leaving out others, and checking for prohibited characters and 
bidirectional strings.

* Moved things around. There's now a separate directory, 
src/common/unicode, which contains the perl scripts and the test code. 
Those are not needed to build from source, as the pre-generated tables 
are put in src/include/common. Similar to the scripts in 
src/backend/utils/mb/Unicode, really.

* Renamed many things from utf_* to unicode_*, since they don't deal 
with utf-8 input anymore.


This is starting to shape up, but still some cleanup work to do. I will 
continue tomorrow..

- Heikki


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Functions Immutable but not parallel safe?
Next
From: Peter Eisentraut
Date:
Subject: Re: partitioned tables and contrib/sepgsql