Re: [OpenFTS-general] AW: tsearch2, ispell, utf-8 and - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: [OpenFTS-general] AW: tsearch2, ispell, utf-8 and |
Date | |
Msg-id | Pine.GSO.4.58.0407221356240.29036@ra.sai.msu.su Whole thread Raw |
In response to | Re: [OpenFTS-general] AW: tsearch2, ispell, utf-8 and german special characters ("Markus Wollny" <Markus.Wollny@computec.de>) |
List | pgsql-general |
Markus, I was not quite correct - different dictionaries hanlde stop words in different way ! Stemmers checked before, while ispell - after normalization. So, in your case, you need 'eint' listed in stop word list. Oleg On Wed, 21 Jul 2004, Markus Wollny wrote: > Hi! > > ts2test=# select * from ts_debug('Jeden Tag wird man ein bisschen weiser'); > ts_name | tok_type | description | token | dict_name | tsvector > ----------------+----------+-------------+----------+-------------+------------ > default_german | lword | Latin word | Jeden | {de_ispell} | > default_german | lword | Latin word | Tag | {de_ispell} | 'tag' > default_german | lword | Latin word | wird | {de_ispell} | > default_german | lword | Latin word | man | {de_ispell} | > default_german | lword | Latin word | ein | {de_ispell} | 'eint' > default_german | lword | Latin word | bisschen | {de_ispell} | 'bisschen' > default_german | lword | Latin word | weiser | {de_ispell} | 'weise' > (7 rows) > > cat german.stop|grep ^ein$ > ein > > 'jeden', 'man', 'wird' and 'ein' are all in german.stop; the first three words are correctly recognozed as stopwords, whereasthe last one is converted to 'eint', although 'ein' is a stopword, too. I still don't understand what exactly is happeningand if I should be concerned by that sort of "wrong guess" - so 'ein' is just converted to 'eint' every time, nomatter if it's in the stopwords-file or not, but on the other hand, as this applies to to_tsvector(), to_tsquery() andlexize(), this behaviour would be consitant throughout tsearch2 - thus making any search containing 'ein' a little bitfuzzier, but nonetheless still usable. It's still some sort of cosmetic bug, though, but I guess that's probably due toGerman being somewhat less IT-friendly than english. > > Kind regards > > Markus > > -----Original Message----- > From: Oleg Bartunov [mailto:oleg@sai.msu.su] > Sent: Wed 7/21/2004 22:24 > To: Markus Wollny > Cc: pgsql-general@postgresql.org; openfts-general@lists.sourceforge.net > Subject: Re: AW: [OpenFTS-general] AW: [GENERAL] tsearch2, ispell, utf-8 and german special characters > On Wed, 21 Jul 2004, Markus Wollny wrote: > > > > > Hi! > > > > > -----Urspr?ngliche Nachricht----- > > > Von: openfts-general-admin@lists.sourceforge.net > > > [mailto:openfts-general-admin@lists.sourceforge.net] Im > > > Auftrag von Markus Wollny > > > Gesendet: Mittwoch, 21. Juli 2004 17:04 > > > An: Oleg Bartunov > > > Cc: pgsql-general@postgresql.org; > > > openfts-general@lists.sourceforge.net > > > Betreff: [OpenFTS-general] AW: [GENERAL] tsearch2, ispell, > > > utf-8 and german special characters > > > > > The issue with the unrecognized stop-word 'ein' which is > > > converted by to_tsvector to 'eint' remains however. Now > > > here's as much detail as I can provide: > > > > > > Ispell is Version 3.1.20 10/10/95, patch 1. > > > > I've just upgraded Ispell to the latest version (International Ispell Version 3.2.06 08/01/01), but that didn't help;by now I think it might be something to do with a german language peculiarity or with something in the german dictionary.In german.med, there is an entry > > ispell itself don't used in tsearch2, only dict,aff files ! > > > > > eint/EGPVWX > > > > So the ts_vector output is just a bit like a wrong guess. Doesn't it evaluate the stopword-list first before doing thelookup in the Ispell-dictionary? > > yes. There is very usefull function for debugging I always recommend to use - > ts_debug. See my notes (http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes) > for examples. > > > > > > > Kind regards > > > > Markus Wollny > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by BEA Weblogic Workshop > > FREE Java Enterprise J2EE developer tools! > > Get your free copy of BEA WebLogic Workshop 8.1 today. > > http://ads.osdn.com/?ad_idG21&alloc_id040&op?k > > _______________________________________________ > > OpenFTS-general mailing list > > OpenFTS-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/openfts-general > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: