Re: hunspell and tsearch2 ? - Mailing list pgsql-hackers

From Dirk Lutzebäck
Subject Re: hunspell and tsearch2 ?
Date
Msg-id 5040B70C.70805@thinkproject.com
Whole thread Raw
In response to Re: hunspell and tsearch2 ?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
<div class="moz-cite-prefix">Hi Robert,<br /><br /> there is a note in the pg documentation chapter<br /><br
/><blockquote>12.6.5Ispell Dictionary<br /></blockquote><blockquote><b>Note:</b><span class="APPLICATION">
MySpell</span>does not support compound words. <span class="APPLICATION">Hunspell</span> has sophisticated support for
compoundwords. At present, <span class="PRODUCTNAME">PostgreSQL</span> implements only the basic compound word
operationsof Hunspell.<br /></blockquote> Regards<br /> Dirk<br /><br /><br /> On 08/30/2012 05:39 PM, Robert Haas
wrote:<br/></div><blockquote cite="mid:CA+Tgmob3Mr3PznHK0E15yYKX5PB2xmqJcCHN=ffV62akME_qnQ@mail.gmail.com"
type="cite"><prewrap="">On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck 
<a class="moz-txt-link-rfc2396E"
href="mailto:dirk.lutzebaeck@thinkproject.com"><dirk.lutzebaeck@thinkproject.com></a>wrote: 
</pre><blockquote type="cite"><pre wrap="">we have issues with compound words in tsearch2 using the german (ispell)
dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
<a class="moz-txt-link-freetext"
href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2">http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2</a>
(convertold 
openoffice dict file to ispell suitable for tsearch):

# select ts_lexize('german_ispell', 'vollklimatisiert');    ts_lexize
--------------------{vollklimatisiert}
(1 row)

This should return atleast
{vollklimatisiert, voll, klimatisiert}


The issue with compound words in ispell has been addressed in hunspell. But
this has not been integrated fully to tsearch2 (according to the
documentation).
</pre></blockquote><pre wrap="">
Just out of curiosity, which part of the documentation are you looking
at?  The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.

</pre><blockquote type="cite"><pre wrap="">Are there any plans to fully integrate hunspell into tsearch2? What is
needed to do this? What is the functional delta which is missing? Maybe we
can help...
</pre></blockquote><pre wrap="">
</pre></blockquote><br /><br /><div class="moz-signature">-- <br /><p> Mit freundlichen Grüßen / Best regards,
<p><b>thinkproject! International GmbH & Co. KG</b><p> Dirk Lutzebäck<br /> Geschäftsführer / Managing Director,
CTO<p> Tel +49 30 921 017 90<br /> Fax +49 30 921 017 50<br /><a class="moz-txt-link-abbreviated"
href="mailto:dirk.lutzebaeck@thinkproject.com">dirk.lutzebaeck@thinkproject.com</a><br/><p> Rechtliche Informationen
zumAbsender (Impressum): <a href="http://www.thinkproject.com/de/info">www.thinkproject.com/de/info</a><p> Legal
information(imprint): <a href="http://www.thinkproject.com/en/info">www.thinkproject.com/en/info</a></div> 

pgsql-hackers by date:

Previous
From: Miroslav Šimulčík
Date:
Subject: rows changed in current transaction
Next
From: Bruce Momjian
Date:
Subject: Re: compiler barriers (was: New statistics for WAL buffer dirty writes)