sulfinu@gmail.com wrote:
> On Tuesday 11 December 2007, Oliver Jowett wrote:
>> A general String certainly might contain non-ASCII, but all Strings
>> returned by UnixCrypt.crypt() contain only ASCII. See the implementation
>> of UnixCrypt.crypt(), and in particular the UnixCrypt.cov_2char array.
>>
>> So I still do not understand why the change quoted above is necessary.
>> Can you explain why it is necessary?
> As I already stated, UnixCrypt looks like a lost cause to me. It truncates a
> lot of the information in the password (try to imagine what happens to a
> Unicode password at line 620), so it's result is a little more relevant than
> nothing.
That's not what I'm complaining about. Why do you need to change the
encoding of the RESULT of UnixCrypt?
>> The server only ever sends a subset of ASCII as crypt salt characters
>> (specifically, the 62 characters a-z A-Z 0-9), so US-ASCII is just fine
>> for decoding. See postmaster.c, RandomSalt() / RemapChar().
> I didn't know that, I'll take your word for it. Is it officially specified
> anywhere? Can it be relied upon in the future?
Yes. My crypt(3) manpage here says:
* salt is a two-character string chosen from the set [a-zA-Z0-9./]
* The returned value points to the encrypted password, a series of 13
printable ASCII characters (the first two characters represent the
salt itself).
* CONFORMING TO: SVr4, 4.3BSD, POSIX.1-2001
> I used the same Encoding.AUTHENTICATION_PHASE_ENCODING in all places for
> consistency and it doesn't harm in any way, since I make sure it is an ASCII
> extension.
Ok, so this is an unnecessary change that we can discard?
I actually think this patch as a whole is a lost cause (unfortunately,
more because you seem to be unwilling to take feedback on it than any
inherent problem). Would a patch that did only UTF-8 encoding, as
suggested by Kris, handle the cases you need?
-O