Re: 0xc3 error Text Search Windows French - Mailing list pgsql-general
From | Andrew |
---|---|
Subject | Re: 0xc3 error Text Search Windows French |
Date | |
Msg-id | 4862920C.4010801@pacific.net.au Whole thread Raw |
In response to | 0xc3 error Text Search Windows French (Andrew <archa@pacific.net.au>) |
List | pgsql-general |
One additional aspect. I just ran the create text search dictionary command without the stopfile declaration using the OO dictionaries, and it worked fine with the select ts_lexize('public.fr_ispell', 'catalogue'); command executing with no problems. However, after creating an associated catalogue based on a copy of the pg_catalog.french catalogue, calls to ts_debug against my custom French config result in the 0xc3 error. So it is looking like the problem is restricted to the parsing of the stop file. I ran through the other out of the box supplied stemmers, which I have not touched in anyway and it is also occurring with the portuguese catalogue. Cheers Andy Andrew wrote: > I have a feeling that an issue I'm running into is related to this: > http://archives.postgresql.org/pgsql-bugs/2008-06/msg00113.php > > On Windows XP running PgAdmin III 1.8.4 against either PostgreSQL > 8.3.0 or 8.3.3 DB, when attempting to do a: > > select * from ts_debug('french', 'catalogue'); > > getting the following error: > > ERROR: invalid byte sequence for encoding "UTF8": 0xc3 > HINT: This error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". > CONTEXT: SQL function "ts_debug" statement 1 > > I have replaced the french.stop file with the one from the snowball > web site (http://snowball.tartarus.org/algorithms/french/stemmer.html) > to see if that would make any difference. But the same issue. I have > also attempted to load the French Hunspell dictionary from the Open > Office web site > (http://wiki.services.openoffice.org/wiki/Dictionaries), using the > following command: > > CREATE TEXT SEARCH DICTIONARY public.fr_ispell ( > TEMPLATE = pg_catalog.ispell, > DictFile = fr_FR, > AffFile = fr_FR, > StopWords = french > ); > > But getting the same error. I have successfully loaded the English > and Arabic dictionaries and an Arabic stop file I sourced from > elsewhere, and they work fine with the various text search function > calls, so it appears to be specifically related to a French character > occurring in the stop file and the dictionaries. To use the French OO > dictionaries, I had to convert them from an ISO-8859-15 character set > encoding to UTF-8. As it still had the same result as with the > packaged stop file when converting on Windows, I downloaded them and > converted the encoding on a Linux machine before copying them across > to windows to see if that would help, but it didn't. > > However, if I run the ts_debug('french', 'catalogue'); against a Linux > version of PostgreSQL 8.3.1, it works fine. I have not tried version > 8.3.1 on Windows. While there are a lot more combinations to exhaust > before I can make a categorical statement, at this stage it appears to > be pointing towards an issue with the UTF-8 parser of PostgreSQL on > Windows. > > Is this an outstanding defect, or is there something that I'm doing > wrong in my environment? I have attempted to find anything related on > the Internet, but other than the introductory reference, I have not > found anything, which for what I would imagine to be, of the size of > the French user base surprises me. Hence, I'm thinking that perhaps > it may be something in my environment causing the issue. If others > could also reproduce the error on their XP machines, that would > indicate that the issue was not something specific just to me. > > At this stage, it is not that important to me, as I'm just playing > around with text search for my own curiosity and French was just a > language I have randomly picked, along with Arabic (for which I'm > lacking a snowball stemmer). I don't actually read, much less speak > those languages. However, it would still be nice to have them working. > > An additional related topic. OO have for some languages, thesaurus > files which are not in the same format as supported by Pg Full Text > Search. Are there any plans to support the OO thesaurus file > formats? They also have hyphenation files. Are there any plans to > extend the current dictionary files to include hyphenation rules as > captured in the OO hyphenation files? I'm not sure how, if at all > hyphenation rules would improve on indexing and searches, but I > thought as the files exist, I would pose the question. > > Thanks, > > Andy > > > > >
pgsql-general by date: