Re: Full text: Ispell dictionary - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Full text: Ispell dictionary |
Date | |
Msg-id | CAF4Au4wytyVOvOwHH_Aft+HRXutcBShHoKFkJmOVaJdAsruJ9A@mail.gmail.com Whole thread Raw |
In response to | Re: Full text: Ispell dictionary (Tim van der Linden <tim@shisaa.jp>) |
Responses |
Re: Full text: Ispell dictionary
|
List | pgsql-general |
btw, take a look on contrib/dict_xsyn, it's more powerful than synonym dictionary. On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden <tim@shisaa.jp> wrote: > Hi Oleg > > Haha, understood! > > Thanks for helping me on this one. > > Cheers > Tim > > > On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov <obartunov@gmail.com> > wrote: >> >> Tim, >> >> you did answer yourself - don't use ispell :) >> >> On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden <tim@shisaa.jp> wrote: >>> >>> On Fri, 2 May 2014 21:12:56 +0400 >>> Oleg Bartunov <obartunov@gmail.com> wrote: >>> >>> Hi Oleg >>> >>> Thanks for the response! >>> >>>> Yes, it's normal for ispell dictionary, think about morphological >>>> dictionary. >>> >>> >>> Hmm, I see, that makes sense. I thought the morphological aspect of the >>> Ispell only dealt with splitting up compound words, but it also deals with >>> deriving the word to a more "stem" like form, correct? >>> >>> As a last question on this, is there a way to disable this dictionary to >>> emit multiple lexemes? >>> >>> >>> The reason I am asking is because in my (fairly new) understanding of >>> PostgreSQL's full text it is always best to have as few lexemes as possible >>> saved in the vector. This to get smaller indexes and faster matching >>> afterwards. Also, if you run a tsquery afterwards to, you can still employ >>> the power of these multiple lexemes to find a match. >>> >>> Or...probably answering my own question...if I do not desire this >>> behavior I should maybe not use Ispell and simply use another dictionary :) >>> >>> Thanks again. >>> >>> Cheers, >>> Tim >>> >>>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <tim@shisaa.jp> >>>> wrote: >>>>> >>>>> Good morning/afternoon all >>>>> >>>>> I am currently writing a few articles about PostgreSQL's full text >>>>> capabilities and have a question about the Ispell dictionary which I >>>>> cannot seem to find an answer to. It is probably a very simple issue, so >>>>> forgive my ignorance. >>>>> >>>>> In one article I am explaining about dictionaries and I have setup a >>>>> sample configuration which maps most token categories to only use a Ispell >>>>> dictionary (timusan_ispell) which has a default configuration: >>>>> >>>>> CREATE TEXT SEARCH DICTIONARY timusan_ispell ( >>>>> TEMPLATE = ispell, >>>>> DictFile = en_us, >>>>> AffFile = en_us, >>>>> StopWords = english >>>>> ); >>>>> >>>>> When I run a simple query like "SELECT >>>>> to_tsvector('timusan-ispell','smiling')" I get back the following tsvector: >>>>> >>>>> 'smile':1 'smiling':1 >>>>> >>>>> As you can see I get two lexemes with the same pointer. >>>>> The question here is: why does this happen? >>>>> >>>>> Is it normal behavior for the Ispell dictionary to emit multiple >>>>> lexemes for a single token? And if so, is this efficient? I >>>>> mean, why could it not simply save one lexeme 'smile' which (same as >>>>> the snowball dictionary) would match 'smiling' as well if later matched with >>>>> the accompanying tsquery? >>>>> >>>>> Thanks! >>>>> >>>>> Cheers, >>>>> Tim >>>>> >>>>> >>>>> -- >>>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-general >>> >>> >>> >>> -- >>> Tim van der Linden <tim@shisaa.jp>
pgsql-general by date: