Thread: Fwd: Re: [GENERAL] Combine multiple text search configuration

Fwd: Re: [GENERAL] Combine multiple text search configuration

From

Johannes Graën

Date:

09 November 2017, 11:11:07

On 2017-11-07 08:27, hmidi slim wrote:
> Hi, 
> Thank for your proposition but when to use this query : 
> (to_tsvector('english', document) || to_tsvector('french', document)) @@
> (to_tsquery('english', query) || to_tsquery('french', query))
> I think that the performance decrease and not a good solution for big
> amount of data. Is it?

You have more lexems when you combine two languages, but not twice as
many as there will be some overlap. That means your index will also be
be bigger than a single language index. Anyhow I would expect this
variant to perform better than querying two single columns
simultaneously. Maybe one of the FTS developers could comment on this?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Combine multiple text search configuration

From

Aleksandr Parfenov

Date:

09 November 2017, 12:28:43

On Thu, 9 Nov 2017 09:11:07 +0100
Johannes Graën <johannes@selfnet.de> wrote:

> On 2017-11-07 08:27, hmidi slim wrote:
> > Hi, 
> > Thank for your proposition but when to use this query : 
> > (to_tsvector('english', document) || to_tsvector('french',
> > document)) @@ (to_tsquery('english', query) || to_tsquery('french',
> > query)) I think that the performance decrease and not a good
> > solution for big amount of data. Is it?
>
> You have more lexems when you combine two languages, but not twice as
> many as there will be some overlap. That means your index will also be
> be bigger than a single language index. Anyhow I would expect this
> variant to perform better than querying two single columns
> simultaneously. Maybe one of the FTS developers could comment on this?

Hi,

You are right in assumption about index size. However, difference
between a shared index and two single indices depends on dictionaries,
because some them doesn't return lexemes for unknown words.

Unfortunately, there is no alternative way in PostgreSQL 10 or earlier
to do multilingual text processing.

I'm working on a patch for flexible full-text search configuration and
one of the problems I'm want to solve is multilingual search without
separate indices for each language. The patch allows combining output
of more than one dictionary using UNION operator.

Current version of the patch is a demonstration of new features and
syntax for FTS configuration. The syntax itself is still at the
discussion stage. You can check it out at pgsql-hackers mailing list if
you are interested in[1]. Any feedback on the patch in terms of
internals, syntax, behavior or idea is welcome.

[1]
https://www.postgresql.org/message-id/flat/20171019172409.731f52a7@asp437-24-g082ur/

--
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general