Home > mailing lists

Re: [PATCH] Expand character set for ltree labels - Mailing list pgsql-hackers

From	Garen Torikian
Subject	Re: [PATCH] Expand character set for ltree labels
Date	October 5, 2022 19:34:49
Msg-id	CAGXsc+8ki-dAhX+it1xyyCk4zcMUX79ujVs-+xrrrHjzB5VKCA@mail.gmail.com Whole thread
In response to	Re: [PATCH] Expand character set for ltree labels (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [PATCH] Expand character set for ltree labels
List	pgsql-hackers

Tree view

Hi Tom,

> Perhaps the docs are a bit unclear about that, but it's not

> restricted to ASCII alphanumerics. AFAICS the code will accept
> whatever iswalpha() and iswdigit() will accept in the database's
> default locale.

Sorry but I don't think that is correct. Here is the single definition check of what constitutes a valid character: https://github.com/postgres/postgres/blob/c3315a7da57be720222b119385ed0f7ad7c15268/contrib/ltree/ltree.h#L129

As you can see, there are no `is_*` calls at all. Where in this contrib package do you see `iswalpha`? Perhaps I missed it.

> That seems really pretty random.

Ok. I am trying to avoid a situation where other users may wish to use other delimiters other than `-`, due to its commonplace presence in words (eg., compound ones).

On Wed, Oct 5, 2022 at 2:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Garen Torikian <gjtorikian@gmail.com> writes:
> I am submitting a patch to expand the label requirements for ltree.

> The current format is restricted to alphanumeric characters, plus _.
> Unfortunately, for non-English labels, this set is insufficient.

Hm? Perhaps the docs are a bit unclear about that, but it's not
restricted to ASCII alphanumerics. AFAICS the code will accept
whatever iswalpha() and iswdigit() will accept in the database's
default locale. There's certainly work that could/should be done
to allow use of not-so-default locales, but that's not specific
to ltree. I'm not sure that doing an application-side encoding
is attractive compared to just using that ability directly.

If you do want to do application-side encoding, I'm unsure why
punycode would be the choice anyway, as opposed to something
that can fit in the existing restrictions.

> On top of this, I added support for two more characters: # and ;, which are
> used for HTML entities.

That seems really pretty random.

regards, tom lane

pgsql-hackers by date:

From: Andres Freund
Date: 05 October 2022, 19:08:29
Subject: meson: Add support for building with precompiled headers

From: Tom Lane
Date: 05 October 2022, 19:53:35
Subject: ts_locale.c: why no t_isalnum() test?

Re: [PATCH] Expand character set for ltree labels - Mailing list pgsql-hackers

Previous

Next