Thread: Text Search Configuration Problem

Text Search Configuration Problem

From
Kevin Reynolds
Date:
I'm using Postgresql version 8.3.1 on CentOS 5 and am following the steps in section 12.7 of the documentation for creating a custom text search configuration.
 
When I get to the step that says:
 
CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);
 
I get the following error:
 
ERROR:  invalid byte sequence for encoding "UTF8": 0xe0c020
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
 
I'm using the english ispell files from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
 
Does anyone know how to solve this?


You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.

Re: Text Search Configuration Problem

From
Tom Lane
Date:
Kevin Reynolds <kreynolds98092@yahoo.com> writes:
>   I get the following error:

>   ERROR:  invalid byte sequence for encoding "UTF8": 0xe0c020
> HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 

>   I'm using the english ispell files from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Are you sure those are in UTF8 encoding?

            regards, tom lane

Re: Text Search Configuration Problem

From
Oleg Bartunov
Date:
Kevin,

it looks like you use UTF-8, so the problem in .aff file, which contains
cyrillic comments :) I converted files into UTF-8 encoding using iconv.


Oleg

On Thu, 3 Apr 2008, Kevin Reynolds wrote:

> I'm using Postgresql version 8.3.1 on CentOS 5 and am following the steps in section 12.7 of the documentation for
creatinga custom text search configuration. 
>
>  When I get to the step that says:
>
>  CREATE TEXT SEARCH DICTIONARY english_ispell (
>    TEMPLATE = ispell,
>    DictFile = english,
>    AffFile = english,
>    StopWords = english
> );
>
>  I get the following error:
>
>  ERROR:  invalid byte sequence for encoding "UTF8": 0xe0c020
> HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
>
>  I'm using the english ispell files from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
>
>  Does anyone know how to solve this?
>
>
> ---------------------------------
> You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83