Re: to_tsvector in 8.2.3 - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: to_tsvector in 8.2.3
Date
Msg-id 46014E9B.1080301@sigaev.ru
Whole thread Raw
In response to Re: to_tsvector in 8.2.3  (Thomas Pundt <mlists@rp-online.de>)
List pgsql-general
8.2 has fully rewritten text parser based on POSIX is* functions.

Thomas Pundt wrote:
> On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote:
> | I can't reproduce your problem, but I have not Windows box, can anybody
> | reproduce that?
>
> just a guess in the wild; I once had a similar phenomen and tracked it down
> to a "non breaking space character" (0xA0). Since then I'm patching the
> tsearch2 lexer:
>
> --- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l
> +++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l
> @@ -78,8 +78,8 @@
>  /* cyrillic koi8 char */
>  CYRALNUM       [0-9\200-\377]
>  CYRALPHA       [\200-\377]
> -ALPHA          [a-zA-Z\200-\377]
> -ALNUM          [0-9a-zA-Z\200-\377]
> +ALPHA          [a-zA-Z\200-\237\241-\377]
> +ALNUM          [0-9a-zA-Z\200-\237\241-\377]
>
>
>  HOSTNAME       ([-_[:alnum:]]+\.)+[[:alpha:]]+
> @@ -307,7 +307,7 @@
>         return UWORD;
>  }
>
> -[ \r\n\t]+ {
> +[ \240\r\n\t]+ {
>         token = tsearch2_yytext;
>         tokenlen = tsearch2_yyleng;
>         return SPACE;
>
>
> Ciao,
> Thomas
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-general by date:

Previous
From: Benjamin Arai
Date:
Subject: multi terabyte fulltext searching
Next
From: "Joshua D. Drake"
Date:
Subject: Re: [HACKERS] Remove add_missing_from_clause?