Re: TSearch2 / German compound words / UTF-8 - Mailing list pgsql-general

From Alexander Presber
Subject Re: TSearch2 / German compound words / UTF-8
Date
Msg-id 6AC64576-AEB6-47C0-AA8C-0242F9296BEA@weisshuhn.de
Whole thread Raw
In response to TSearch2 / German compound words / UTF-8  (Hannes Dorbath <light@theendofthetunnel.de>)
Responses Re: TSearch2 / German compound words / UTF-8
Re: TSearch2 / German compound words / UTF-8
List pgsql-general
>> Tsearch/isepll is not able to break this word into parts, because
>> of the "s" in "Produktion/s/intervall". Misspelling the word as
>> "Produktionintervall" fixes it:
> It should be affixes marked as 'affix in middle of compound word',
> Flag is '~', example look in norsk dictionary:
>
> flag ~\\:
>     [^S]           >        S              #~ advarsel > advarsels-
>
> BTW, we develop and debug compound word support on norsk
> (norwegian) dictionary, so look for example there. But we don't
> know Norwegian, norwegians helped us :)

Hello everyone!

I cannot get this to work. Neither in a german version, nor with the
norwegian example supplied on the tsearch website.
That means, just like Hannes I can get compound word support without
inserted 's' in german and norwegian:
"Vertragstrafe" works, but not "Vertragsstrafe", which is the right
Form.

So I tried it the other way around: My dictionary consists of two words:

---
vertrag/zs
strafe/z
  ---

My affixes file just switches on compounds and allows for s-insertion
as described in the norwegian tutorial:

---
compoundwords controlled z
suffixes
flag s:
   [^S] > S              # endet nicht auf "s": "s" anfuegen und in
compound-check ("Recht" > "Rechts-")
---

ts_debug yields:

tstest=# SELECT tsearch2.ts_debug('vertragstrafe strafevertrag
vertragsstrafe');
                                       ts_debug
------------------------------------------------------------------------
-------------
(german,lword,"Latin
word",vertragstrafe,"{ispell_de,simple}","'strafe' 'vertrag'")
(german,lword,"Latin
word",strafevertrag,"{ispell_de,simple}","'strafe' 'vertrag'")
(german,lword,"Latin
word",vertragsstrafe,"{ispell_de,simple}",'vertragsstrafe')
(3 Zeilen)

I would say, the ispell compound support does not honor the s-Flag in
compounds.
Could it be, that this feature got lost in a regression? It must have
worked for norwegian once. (Take the "overtrekksgrilldresser" example
from the tsearch2:compounds tutorial, that I cannot reproduce).

Any hints?

Alexander

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: PG_RESTORE and database size
Next
From: "John D. Burger"
Date:
Subject: Re: Finding missing records