Home > mailing lists

Re: [HACKERS] [PROPOSAL] Text search configuration extension - Mailing list pgsql-hackers

From	Arthur Zakirov
Subject	Re: [HACKERS] [PROPOSAL] Text search configuration extension
Date	August 21, 2017 15:59:29
Msg-id	20170821125929.GA766@zakirov.localdomain Whole thread
In response to	[HACKERS] [PROPOSAL] Text search configuration extension (Aleksandr Parfenov <a.parfenov@postgrespro.ru>)
List	pgsql-hackers

Tree view

Hello,

On Fri, Aug 18, 2017 at 03:30:38PM +0300, Aleksandr Parfenov wrote:
> Hello hackers!
> 
> I'm working on a new approach in text search configuration and want to
> share my thought with community in order to get some feedback and maybe
> some new ideas.
> 

There are several cases, where the new syntax could be useful:

https://www.postgresql.org/message-id/4733B65A.9030707@students.mimuw.edu.pl
Firstly check is lexeme stopword or not, and only then normalize it.

https://www.postgresql.org/message-id/c6851b7e-da25-3d8e-a5df-022c395a11b4%40postgrespro.ru
Support union of outputs of several dictionaries.

https://www.postgresql.org/message-id/46D57E6F.8020009%40enterprisedb.com
Support of chain of dictionaries using MAP BY operator.

The basic idea of the approach is to bring to a user more control of text search configurations without writing
additionalor modifing existing dictionaries.

> ALTER TEXT SEARCH CONFIGURATION en_de_search ADD MAPPING FOR asciiword,
> word WITH
> CASE
>    WHEN english_hunspell IS NOT NULL THEN english_hunspell
>    WHEN german_hunspell IS NOT NULL THEN german_hunspell
>    ELSE
>      -- stem dictionaries can't be used for language detection
>      english_stem UNION german_stem
> END;

For example, the configuration mentioned above will bring the following results:

=# select d @@ q, d, q from to_tsvector('german_hunspell', 'Dieser Hund wollte ihn jedoch nicht nach Hause begleiten')
d,to_tsquery('en_de_search', 'hause') q;?column? |                      d                       |    q     

----------+----------------------------------------------+----------t        | 'begleiten':9 'hausen':8 'hund':2
'jedoch':5| 'hausen'

(1 row)

This configuration is useful when a query language is unknown.

Best regards,

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

pgsql-hackers by date:

From: Michael Paquier
Date: 21 August 2017, 15:51:24
Subject: Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

From: Mark Rofail
Date: 21 August 2017, 16:43:12
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays

Re: [HACKERS] [PROPOSAL] Text search configuration extension - Mailing list pgsql-hackers

Previous

Next