Re: [PATCH] Expand character set for ltree labels - Mailing list pgsql-hackers

From Garen Torikian
Subject Re: [PATCH] Expand character set for ltree labels
Date
Msg-id CAGXsc+-jhKJvSaqTWYa_PkrmA0ANWPfpte41ijwYpUjx7GNrHQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Expand character set for ltree labels  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
No, not quite.

Valid Punycode characters are `[A-Za-z0-9-]`. This proposal includes `-`, as well as `#` and `;` for HTML entities.  

I double-checked the RFC to see the valid Punycode characters and the set above is indeed correct: https://datatracker.ietf.org/doc/html/draft-ietf-idn-punycode-02#section-5

While it would be nice for ltree labels to support *any* printable character, it can't because symbols like `!` and `%` already have special meaning in the querying. This proposal leaves those as is and does not depend on any existing special character.

On Tue, Oct 4, 2022 at 6:32 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Oct 04, 2022 at 12:54:46PM -0400, Garen Torikian wrote:
> The punycode range of characters is the exact same set as the existing
> ltree range, with the addition of a hyphen (-). Within this system, any
> human language can be encoded using just A-Za-z0-9-.

IIUC ASCII characters like '!' and '<' are valid Punycode characters, but
even with your proposal, those wouldn't be allowed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Move backup-related code to xlogbackup.c/.h
Next
From: Andres Freund
Date:
Subject: Re: problems with making relfilenodes 56-bits