Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256
Date
Msg-id 7560f076-a3c4-bcf7-09f7-bf7f10be78dd@iki.fi
Whole thread Raw
In response to Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 04/06/2017 07:59 PM, Heikki Linnakangas wrote:
> Another thing I'd like some more eyes on, is how this will work with
> encodings other than UTF-8. We will now try to normalize the password as
> if it was in UTF-8, even if it isn't. That's OK as long as we're
> consistent about it, but there is one worrisome scenario: what if the
> user's password consists mostly of characters, that when interpreted as
> UTF-8, are in the list of ignored characters. IOW, is it realistic that
> a user might have a password in a non-UTF-8 encoding, that gets silently
> mangled into something much shorter? I think that's highly unlikely, but
> can anyone come up with a plausible example of that?

I did some testing on what the byte sequences for the Unicode characters 
that SASLprep ignores mean in other encodings. I created a text file 
containing every ignored character, in UTF-8, and ran "iconv -f <other 
encoding> -t UTF-8//TRANSLIT" on the file, using all supported server 
encodings. The idea is to take each of the ignored byte sequences, and 
pretend that they are in some other encoding. If converting them to 
UTF-8 results in a legit character, then that character means something 
in that encoding, and could be misinterpreted if it's used in a password.

Here are some characters that seem plausible to be misinterpreted and 
ignored by SASLprep:

-------
EUC-JP and EUC-JISX0213:

U+00AD (C2 AD): 足 (meaning "foot", per Unihan database)
U+FE00-FE0F (EF B8 8X): 鏝 (meaning "trowel", per Unihan database)

EUC-CN:

U+00AD (C2 AD): 颅 (meaning "skull", per Unihan database)
U+FE00-FE0FF (EF B8 8X): 锔 (meaning "curium", per Unihan database)
U+FEFF (EF BB BF): 锘 (meaning "nobelium", per Wikipedia)

EUC-KR:

U+FE00-FE0F (EF BB BF): 截 (meanings "cut off, stop, obstruct, 
intersect", per Unihan database
U+FEFF (EF BB BF): 癤 (meanings "pimple, sore, boil", per Unihan database)

EUC-TW:
U+FE00-FE0F: 踫 (meanings "collide, bump into", per Unihan database)
U+FEFF: 踢  (meaning "kick", per Unihan database)

CP866:
U+1806: саЖ
U+180B: саЛ
U+180C: саМ
U+180D: саН
U+200B: тАЛ
U+200C: тАМ
U+200D: тАН
-------

The CP866 cases seem most likely to cause confusion. Those are all 
common words in Russian. I don't know how common those Chinese/Japanese 
characters are.

Overall, I think this is OK. Even though there are those characters that 
can be misinterpreted, for it to be problem all of the following have to 
be true:

1. The client is using one of those encodings.
2. The password string as whole has to look like valid UTF-8.
3. Ignoring those characters/words from the password would lead to a 
significantly weaker password, i.e. it was not very long to begin with, 
or it consisted almost entirely of those characters/words.

Thoughts? Attached is the full results of running iconv with each 
encoding, from which I picked the above cases.

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
Next
From: Magnus Hagander
Date:
Subject: Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build