Re: ts_locale.c: why no t_isalnum() test? - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: ts_locale.c: why no t_isalnum() test?
Date
Msg-id CADkLM=fgm4_A7b9_pXE=QPCB+JpxD4sTRue4SXKk9TvkB0LWig@mail.gmail.com
Whole thread Raw
In response to ts_locale.c: why no t_isalnum() test?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ts_locale.c: why no t_isalnum() test?
List pgsql-hackers
On Wed, Oct 5, 2022 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I happened to wonder why various places are testing things like

#define ISWORDCHR(c)    (t_isalpha(c) || t_isdigit(c))

rather than using an isalnum-equivalent test.  The direct answer
is that ts_locale.c/.h provides no such test function, which
apparently is because there's not a lot of potential callers in
the core code.  However, both pg_trgm and ltree could benefit
from adding one.

There's no semantic hazard here: the documentation I consulted
is all pretty explicit that is[w]alnum is true exactly when
either is[w]alpha or is[w]digit are.  For example, POSIX saith

    The iswalpha() and iswalpha_l() functions shall test whether wc is a
    wide-character code representing a character of class alpha in the
    current locale, or in the locale represented by locale, respectively;
    see XBD Locale.

    The iswdigit() and iswdigit_l() functions shall test whether wc is a
    wide-character code representing a character of class digit in the
    current locale, or in the locale represented by locale, respectively;
    see XBD Locale.

    The iswalnum() and iswalnum_l() functions shall test whether wc is a
    wide-character code representing a character of class alpha or digit
    in the current locale, or in the locale represented by locale,
    respectively; see XBD Locale.

While I didn't try to actually measure it, these functions don't
look remarkably cheap.  Doing char2wchar() twice when we only need
to do it once seems silly, and the libc functions themselves are
probably none too cheap for multibyte characters either.

Hence, I propose the attached.  I got rid of some places that were
unnecessarily checking pg_mblen before applying t_iseq(), too.

                        regards, tom lane


I see this is already committed, but I'm curious, why do t_isalpha and t_isdigit have the pair of /* TODO */ comments? This unfinished business isn't explained anywhere in the file.


 

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Next
From: Tom Lane
Date:
Subject: Re: ts_locale.c: why no t_isalnum() test?