Re: [HACKERS] SCRAM authentication, take three - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] SCRAM authentication, take three
Date
Msg-id CAB7nPqT6+4NU8aV+tpq=3OMkhgJkfLqdn5A=K0cmayjqu4qOaQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] SCRAM authentication, take three  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] SCRAM authentication, take three  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Thu, Mar 2, 2017 at 9:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I'm studying the normalization of Unicode so I apologize possible
> stupidity in advance.
>
> I looked into this and have some comments. Sorry for the random
> order.

Thanks, this needs a lot of background and I am glad that somebody is
taking the time to look at what I am doing here.

> ====
> Composition version should be written some where.

Sure.

> ====
> Perhaps one of the most important things is that the code exactly
> reflects the TR. pg_utf8_check_string returns true for ASCII
> strings but the TR says that
>
> | Text containing only ASCII characters (U+0000 to U+007F) is left
> | unaffected by all of the normalization forms. This is
> | particularly important for programming languages
>
> And running SASLprepare for apparent ASCII string (which is the
> most case) is a waste of CPU cycles.

Yeah, that's true. We could just for example check in
pg_utf8_check_string() if the length gathered matches strlen(source)
as only ASCII are 1-byte long.

> ====
> From the point of view of reflectivity (please someone teach me an
> appropreate wording for this..), basically the code had better to
> be a copy of the reference code as long as no performance
> degradation occurs. Hangul part of get_decomposed_size(and
> decompose_code, recompose_code) uses different naming from the
> TR. hindex should be sindex and t should be tindex. Magic numbers
> should have names in the TR.
>
> * A bit later, I noticed that these are copies of charlint. If so
>   I want a description about that.)

Yeah, their stuff works quite nicely.

> ====
> The following comment is equivalent to "reordering in canonical
> order". But the definition of "decomposition" includes this
> step. (I'm not sure if it needs rewriting, since I acutually
> could understand that.)
>
>> /*
>>  * Now that the decomposition is done, apply the combining class
>>  * for each multibyte character.
>>  */

I have reworked a bit this one:
    /*
-    * Now that the decomposition is done, apply the combining class
-    * for each multibyte character.
+    * Now end the decomposition by applying the combining class for
+    * each multibyte character.
     */

> ====
>>    * make the allocation of the recomposed string based on that assumption.
>>    */
>>   recomp_chars = (pg_wchar *) malloc(decomp_size * sizeof(int));
>>   lastClass = -1;     /* this eliminates a special check */
>
> utf_sasl_prepare uses variable names with two naming
> conventions. Is there any reason for the two?

OK, did some adjustments here.

> ====
>> starterCh = recomp_chars[0] = decomp_chars[0];
>
> starterCh reads as "starter channel" why not "starter_char"?

Starter character of the current set, which is a character with a
combining class of 0.

> ====
>>   else if (start >= SBASE && start < (SBASE + SCOUNT) &&
>>            ((start - SBASE) % TCOUNT) == 0 &&
>>            code >= TBASE && code <= (TBASE + TCOUNT))
>
> "code <= (TBASE + TCOUNT)" somewhat smells. Then I found the
> original code for the current implementation in charlint and it
> seems correct to me. Some description about the difference
> between them is needed.

Right. I have updated all those things to use constants instead of
hardcoded values.

> ====
> In use_sasl_prepare, the recompose part is the copy of charlint
> but it seems a bit inefficient. It calls recompose_code
> unconditionally but it is required only for the case of
> "lastClass < chClass".
>
> Something like this. (This still calls recompose_code for the
> case that ch is the same position with starterChar so there still
> be room for more improvement.)

Agreed.

> ====
> If I read the TR correctly, "6 Composition Exclusion Table" says
> that there some characters not to be composed. But I don't find
> the corresponding code. (or comments)

Ah, right! There are 100 characters that enter in this category, and
all of them have a combining class of 0, so it is as simple as
removing them from the tables generated.

I am attaching 0009 and 0010 that address those problems (pushed on
github as well) that can be applied on top of the latest set.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Print correct startup cost for the group aggregate.
Next
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] Performance degradation in TPC-H Q18