Re: to_tsvector in 8.2.3 - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: to_tsvector in 8.2.3
Date
Msg-id 460175E3.40601@sigaev.ru
Whole thread Raw
In response to Re: to_tsvector in 8.2.3  (Magnus Hagander <magnus@hagander.net>)
Responses Re: to_tsvector in 8.2.3  (Magnus Hagander <magnus@hagander.net>)
List pgsql-general
> postgres=# select to_tsvector('test text');
>   to_tsvector
> ---------------
>  'test text':1
> (1 row)
Ok. that's related to
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h
commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit
assumes any character with C locale and multibyte encoding and > 0x7f is alpha.
To check theory, pls, apply attached patch.

If so, I'm confused, we can not assume that 0xa0 is a space symbol in any
multibyte encoding, even in Windows.



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2/wordparser/parser.c.orig    Wed Mar 21 20:41:23 2007
--- ./contrib/tsearch2/wordparser/parser.c    Wed Mar 21 21:10:39 2007
***************
*** 124,130 ****
--- 124,134 ----
               * with C-locale is an alpha character
               */
              if ( c > 0x7f )
+             {
+                 if ( c == 0xa0 )
+                     return 0;
                  return 1;
+             }

              return isalnum(0xff & c);
          }
***************
*** 157,163 ****
--- 161,171 ----
               * with C-locale is an alpha character
               */
              if ( c > 0x7f )
+             {
+                 if ( c == 0xa0 )
+                     return 0;
                  return 1;
+             }

              return isalpha(0xff & c);
          }

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: best way to kill long running query?
Next
From: "Bill Eaton"
Date:
Subject: Re: best way to kill long running query?