Home > mailing lists

Re: Flexible configuration for full-text search - Mailing list pgsql-hackers

From	Aleksandr Parfenov
Subject	Re: Flexible configuration for full-text search
Date	April 6, 2018 10:51:38
Msg-id	20180406105138.72ed468c@asp437-manjaro Whole thread Raw
In response to	Re: Flexible configuration for full-text search (Teodor Sigaev <teodor@sigaev.ru>)
Responses	Re: Flexible configuration for full-text search Re: Flexible configuration for full-text search
List	pgsql-hackers

Tree view

On Thu, 5 Apr 2018 17:26:10 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:
> Some notices:
> 
> 0) patch conflicts with last changes in gram.y, conflicts are trivial.

Yes, due to commits with MERGE command with changes in gram.y there
were some conflicts.

> 2) pg_ts_config_map.h, "jsonb       mapdicts" isn't decorated with
> #ifdef CATALOG_VARLEN like other varlena columns in catalog. It it's
> right, pls, explain and add comment.

Since there is only one varlena column it is safe to use it directly. I
add a related comment about it.

> 3) I see changes in pg_catalog, including drop column, change its
> type, change index, change function etc. Did you pay attention to
> pg_upgrade? I don't see it in patch.

The full-text search configuration is migrated via FTS commands such
as CREATE TEXT SEARCH CONFIGURATION. The pg_upgrade uses pg_dump to
create a dump of this part of the catalog where
dictionary_mapping_to_text is used to get a textual representation of
the FTS configuration. Correct me if I'm wrong.

> 4) Seems, it could work:
> ALTER TEXT SEARCH CONFIGURATION russian
>    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                                            word, hword, hword_part
>          WITH english_stem union (russian_stem, simple);
>                  ^^^^^^^^^^^^^^^^^^^^^ simple way
> instead of WITH english_stem union (case russian_stem when match then
> keep else simple end);

I add such ability since it was just a little fix in grammar. I also
add tests for this kind of configurations. The test is a bit
synthetic because I used a synonym dictionary as one which doesn't
accept some input.

> 4) Initial approach suggested to distinguish three state of
> dictionary result: null (unknown word), stopword and usual word. Now
> only two, we lost possibility to catch stopwords. One of way to use
> stopwrods is: let we have to identical fts configurations, except one
> skips stopwords and another doesn't. Second configuration is used for
> indexing, and first one for search by default. But if we can't  find
> anything ('to be or to be' - phrase contains stopwords only) then we
> can use second configuration. For now, we need to keep two variant of
> each dictionary - with and without stopwords. But if it's possible to
> distinguish stop and nonstop words in configuration then we don't
> need to have duplicated dictionaries.

With the proposed way to configure it is possible to create a special
dictionary only for stopword checking and use it at decision-making
time.

For example, we can create dictionary english_stopword which will
return word itself in case of stopword and NULL otherwise. With such
dictionary we create a configuration:

ALTER TEXT SEARCH CONFIGURATION test_cfg ALTER MAPPING FOR asciiword,
                                                           word WITH
    CASE english_stopword WHEN NO MATCH THEN english_hunspell END;

In described example, english_hunspell can be implemented without
processing of stopwords at all and we can divide stopword processing
and processing of other words into separate dictionaries.

The key point of the patch is to process stopwords the same way as
others at the level of the PostgreSQL internals and give users an
instrument to process them in a special way via configurations.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

0001-flexible-fts-configuration-v11.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 06 April 2018, 10:49:20
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key

From: Kyotaro HORIGUCHI
Date: 06 April 2018, 11:20:23
Subject: Re: Problem while setting the fpw with SIGHUP

Re: Flexible configuration for full-text search - Mailing list pgsql-hackers

Attachment

Previous

Next