Re: TSearch2 / German compound words / UTF-8 - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: TSearch2 / German compound words / UTF-8 |
Date | |
Msg-id | Pine.GSO.4.63.0601271959350.27734@ra.sai.msu.su Whole thread Raw |
In response to | Re: TSearch2 / German compound words / UTF-8 (Alexander Presber <aljoscha@weisshuhn.de>) |
List | pgsql-general |
Alexander, could you try tsearch2 from CVS HEAD ? tsearch2 in 8.1.X doesn't supports UTF-8 and works for someone only by accident :) Oleg On Fri, 27 Jan 2006, Alexander Presber wrote: >>> Tsearch/isepll is not able to break this word into parts, because of the >>> "s" in "Produktion/s/intervall". Misspelling the word as >>> "Produktionintervall" fixes it: >> It should be affixes marked as 'affix in middle of compound word', >> Flag is '~', example look in norsk dictionary: >> >> flag ~\\: >> [^S] > S #~ advarsel > advarsels- >> >> BTW, we develop and debug compound word support on norsk (norwegian) >> dictionary, so look for example there. But we don't know Norwegian, >> norwegians helped us :) > > Hello everyone! > > I cannot get this to work. Neither in a german version, nor with the > norwegian example supplied on the tsearch website. > That means, just like Hannes I can get compound word support without inserted > 's' in german and norwegian: > "Vertragstrafe" works, but not "Vertragsstrafe", which is the right Form. > > So I tried it the other way around: My dictionary consists of two words: > > --- > vertrag/zs > strafe/z > --- > > My affixes file just switches on compounds and allows for s-insertion as > described in the norwegian tutorial: > > --- > compoundwords controlled z > suffixes > flag s: > [^S] > S # endet nicht auf "s": "s" anfuegen und in > compound-check ("Recht" > "Rechts-") > --- > > ts_debug yields: > > tstest=# SELECT tsearch2.ts_debug('vertragstrafe strafevertrag > vertragsstrafe'); > ts_debug > ------------------------------------------------------------------------------------- > (german,lword,"Latin word",vertragstrafe,"{ispell_de,simple}","'strafe' > 'vertrag'") > (german,lword,"Latin word",strafevertrag,"{ispell_de,simple}","'strafe' > 'vertrag'") > (german,lword,"Latin > word",vertragsstrafe,"{ispell_de,simple}",'vertragsstrafe') > (3 Zeilen) > > I would say, the ispell compound support does not honor the s-Flag in > compounds. > Could it be, that this feature got lost in a regression? It must have worked > for norwegian once. (Take the "overtrekksgrilldresser" example from the > tsearch2:compounds tutorial, that I cannot reproduce). > > Any hints? > > Alexander > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: