Re: Flexible configuration for full-text search - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Flexible configuration for full-text search
Date
Msg-id 26006.1536679481@sss.pgh.pa.us
Whole thread Raw
In response to Re: Flexible configuration for full-text search  (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
Responses Re: Flexible configuration for full-text search  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes:
> As I wrote few weeks ago, there is a issue with stopwords processing in
> proposed syntax for full-text configurations. I want to separate word
> normalization and stopwords detection to two separate dictionaries. The
> problem is how to configure stopword detection dictionary.

> The cause of the problem is counting stopwords, but not using any
> lexemes for them. However, do we have to count stopwords during words
> counting or can we ignore them like unknown words? The problem I see is
> backward compatibility, since we have to regenerate all queries and
> vectors. But is it real problem or we can change its behavior in this
> way?

I think there should be a pretty high bar for forcing people to regenerate
all that data when they haven't made any change of their own choice.

Also, I'm not very clear on exactly what you're proposing here, but it
sounds like it'd have the effect of changing whether stopwords count in
phrase distances ('a <N> b').  I think that's right out --- whether or not
you feel the current distance behavior is ideal, asking people to *both*
rebuild all their derived data *and* change their applications will cause
a revolt.  It's not sufficiently obviously broken that we can change it.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: StandbyAcquireAccessExclusiveLock doesn't necessarily
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors