Home > mailing lists

Re: Flexible configuration for full-text search - Mailing list pgsql-hackers

From	Aleksandr Parfenov
Subject	Re: Flexible configuration for full-text search
Date	August 29, 2018 11:38:31
Msg-id	20180829153831.6b66d264@asp437-ThinkPad-L380 Whole thread Raw
In response to	Re: Flexible configuration for full-text search (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Responses	Re: Flexible configuration for full-text search
List	pgsql-hackers

Tree view

On Tue, 28 Aug 2018 12:40:32 +0700
Aleksandr Parfenov <a.parfenov@postgrespro.ru> wrote:

>On Fri, 24 Aug 2018 18:50:38 +0300
>Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
>>Agreed, backward compatibility is important here.  Probably we should
>>leave old dictionaries for that.  But I just meant that if we
>>introduce new (better) way of stop words handling and encourage users
>>to use it, then it would look strange if default configurations work
>>the old way...  
>
>I agree with Alexander. The only drawback I see is that after addition
>of new dictionaries, there will be 3 dictionaries for each language:
>old one, stop-word filter for the language, and stemmer dictionary.

During work on the new version of the patch, I found an issue in
proposed syntax. At the beginning of the conversation, there was a
suggestion to split stop word filtering and words normalization. At this
stage of development, we can use a different dictionary for stop word
detection, but if we drop the word, the word counter wouldn't increase
and the stop word will be processed as an unknown word.

Currently, I see two solutions:

1) Keep the old way of stop word filtering. The drawback of this
approach is the mixing of word normalization and stop word detection
logic inside of a dictionary. It can be solved by the usage of 'simple'
dictionary in accept=false mode as a stop word filter.

2) Add an action STOPWORD to KEEP and DROP (which is not implemented in
previous patch, but I think it is good to have both of them) in the
meaning of "increase word counter but don't add lexeme to vector".

Any suggestions on the issue?

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

pgsql-hackers by date:

From: Andres Freund
Date: 29 August 2018, 11:37:30
Subject: Re: buildfarm: could not read block 3 in file "base/16384/2662":read only 0 of 8192 bytes

From: Alexander Korotkov
Date: 29 August 2018, 12:01:58
Subject: Re: Reopen logfile on SIGHUP

Re: Flexible configuration for full-text search - Mailing list pgsql-hackers

Previous

Next