> > I have tested with local-enabled environment and found a bug. Included
> > is the new version of patches.
> Your patch causes crash on tsearch2's installcheck with 'initdb -E UTF8 --locale
> C', simple way to reproduce:
> # select to_tsquery('default', '''New York''');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
It seems it's a bug with original tsearch2. Here is the patches.
------------------------------------------------------------------
*** wordparser/parser.c~ 2007-01-07 09:54:39.000000000 +0900
--- wordparser/parser.c 2007-01-11 10:33:41.000000000 +0900
***************
*** 51,57 **** if (prs->charmaxlen > 1) { prs->usewide = true;
! prs->wstr = (wchar_t *) palloc(sizeof(wchar_t) * prs->lenstr); prs->lenwstr = char2wchar(prs->wstr,
prs->str,prs->lenstr); } else
--- 51,57 ---- if (prs->charmaxlen > 1) { prs->usewide = true;
! prs->wstr = (wchar_t *) palloc(sizeof(wchar_t) * (prs->lenstr+1)); prs->lenwstr =
char2wchar(prs->wstr,prs->str, prs->lenstr); } else
------------------------------------------------------------------
> >> ! static int p_isalnum(TParser *prs) {
> ...
> >> ! if (lc_ctype_is_c())
> >> ! {
> >> ! if (c > 0x7f)
> >> ! return 1;
>
> I have some some doubts that any character greater than 0x7f is an alpha symbol.
> Is it simple assumption or workaround?
Yeah, it's a workaround. Since there's no concept other than
alpha/numeric/latin in tsearch2, Asian characters have to be fall in
one of them.
--
Tatsuo Ishii
SRA OSS, Inc. Japan