Home > mailing lists

Re: pg_trgm - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: pg_trgm
Date	May 29, 2010 11:09:44
Msg-id	AANLkTinC6LcLpF16rREFJpOa1q1CzAN8tJAooqXwmalR@mail.gmail.com Whole thread Raw
In response to	Re: pg_trgm (Tatsuo Ishii <ishii@postgresql.org>)
List	pgsql-hackers

Tree view

On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> ! #define iswordchr(c)  (lc_ctype_is_c()? \
> !                                                               ((*(c) & 0x80)? !t_isspace(c) : (t_isalpha(c) ||
t_isdigit(c))): \ 
>

Surely isspace(c) will always be false for non-ascii characters in C locale?

Now it might be sensible to just treat any non-ascii character as a
word-character in addition to alpha and digits, so what might make
sense is
  t_isalpha(c) || t_isdigit(c)) || (lc_ctype_is_c() && *(c)&0x80)

Though I wonder whether it wouldn't be generally more useful to users
to provide the non-space version as an option. I could see that being
useful for people in other circumstances aside from working around
this locale problem.

--
greg

pgsql-hackers by date:

From: Jan Urbański
Date: 29 May 2010, 10:56:58
Subject: Re: tsvector pg_stats seems quite a bit off.

From: Tom Lane
Date: 29 May 2010, 11:31:21
Subject: Re: pg_trgm

Re: pg_trgm - Mailing list pgsql-hackers

Previous

Next