Re: tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3 - Mailing list pgsql-general

From Richard Huxton
Subject Re: tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3
Date
Msg-id 47E1964E.8060403@archonet.com
Whole thread Raw
In response to tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3  ("patrick" <patrick@11h11.com>)
Responses Re: tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Missed the mailing list on the last reply

Richard Huxton wrote:
> patrick wrote:
>> hi richard,
>>
>> thanks for your help! i found something... but first let me answer
>> your question:
>>
>>> UPDATE product SET search_vector = to_tsvector(name);
>>> UPDATE product SET search_vector = setweight(to_tsvector(name), 'A');
>>> UPDATE product SET search_vector = setweight(to_tsvector(name), 'A')
>>> || to_tsvector(description);
>>
>> thoses queries are not working, same message:
>> ERROR: invalid byte sequence for encoding "UTF8": 0xc3
>
> Hmm. OK. Can reproduce that here, but only...
>
>> what i found is in postgresql.conf if i change:
>> default_text_search_config from pg_catalog.french to
>> pg_catalog.english then the query is working fine.
>
> with a "french" configuration. Not only english, but also italian,
> german etc. all seem to work here on Windows 8.3.1.
>
> However, "french" works fine with 8.3.0 compiled from source on Linux.
>
> Comparing the two french.stop lists of stopwords (look in
> .../share/tsearch_data) they are identical.
>
> That leaves the snowball stemming library itself. There seem to be two
> source files for these in src/backend/snowball/libstemmer, one for
> ISO8859-1 and one for UTF-8. These files seem identical between 8.3.0
> and 8.3.1 (assuming I'm working anoncvs.postgresql.org properly).
>
> Possibly a build problem on Windows? I'll test against 8.3.1 on Linux if
> I get a chance.

No changes (from diff -r) between the source on 8.3.0 and 8.3.1 for the
backend/snowball directories. Looks like someone with a Windows build
environment would be useful.

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Joris Dobbelsteen
Date:
Subject: Re: Conditional JOINs ?
Next
From: Tom Lane
Date:
Subject: Re: tsearch2 in postgresql 8.3.1 - invalid byte sequence for encoding "UTF8": 0xc3