Re: Hostnames, IDNs, Punycode and Unicode Case Folding - Mailing list pgsql-general

From Mike Cardwell
Subject Re: Hostnames, IDNs, Punycode and Unicode Case Folding
Date
Msg-id 20141230004819.GC24297@glue.grepular.com
Whole thread Raw
In response to Re: Hostnames, IDNs, Punycode and Unicode Case Folding  (Andrew Sullivan <ajs@crankycanuck.ca>)
List pgsql-general
* on the Mon, Dec 29, 2014 at 07:22:21PM -0500, Andrew Sullivan wrote:

>> can't just encode it with punycode and then store the ascii result. For example,
>> these two are the same hostnames thanks to unicode case folding [1]:
>>
>>   tesst.ëxämplé.com
>>   teßt.ëxämplé.com
>
> Well, in IDNA2003 they're the same.  In IDNA2008 (RFC 5890 and suite),
> they're not the same.  In UTS46, they're kind of the same, because
> pre-lookup processing maps one of them to the other (it depends which
> mode you're in which way the mapping goes, which is just fantastic
> because you can't tell at the server which mode the client is in.
> IDNA is an unholy mess); but the lookup is still done using the
> IDNA2008 rules, approximately.

Heh. And I just thought I was finally starting to get to grips with this stuff.

>> They both encode in punycode to the same thing:
>>
>>   xn--tesst.xmpl.com-cib7f2a
>
> Under no circumstances should they encode to that.

Eurgh, you're right. The library I'm using does actually do it right, I just
forgot to split on the dot and encode each label separately when writing the
examples for this email. Sorry for confusing matters.

[snip lots of useful and interesting information]

> You seem to want a bunch of label constraints, not all of which are
> related to IDNA. I think it would be better to break these up into a
> small number of functions.  As it happens, I have a colleague at Dyn
> who I think has some need of some of this too, and so it might be
> worth spinning up a small project to try to get generic functions:
> to_idna2003, to_idna2008, check_ldh, split_labels, and so on.  If this
> seems possibly interesting for collaboration, let me know & I'll try
> to put together the relevant people.

Those functions would be very useful to me. I know a bit of C, but probably not
enough to produce an acceptable patch. If there are people who would also find
these functions useful, and people motivated to implement them, that would
be great...

--
Mike Cardwell  https://grepular.com https://emailprivacytester.com
OpenPGP Key    35BC AF1D 3AA2 1F84 3DC3   B0CF 70A5 F512 0018 461F
XMPP OTR Key   8924 B06A 7917 AAF3 DBB1   BF1B 295C 3C78 3EF1 46B4

Attachment

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Rollback on include error in psql
Next
From: Mike Cardwell
Date:
Subject: Re: Hostnames, IDNs, Punycode and Unicode Case Folding