Re: Flexible configuration for full-text search - Mailing list pgsql-hackers
From | Aleksandr Parfenov |
---|---|
Subject | Re: Flexible configuration for full-text search |
Date | |
Msg-id | 20180406105138.72ed468c@asp437-manjaro Whole thread Raw |
In response to | Re: Flexible configuration for full-text search (Teodor Sigaev <teodor@sigaev.ru>) |
Responses |
Re: Flexible configuration for full-text search
Re: Flexible configuration for full-text search |
List | pgsql-hackers |
On Thu, 5 Apr 2018 17:26:10 +0300 Teodor Sigaev <teodor@sigaev.ru> wrote: > Some notices: > > 0) patch conflicts with last changes in gram.y, conflicts are trivial. Yes, due to commits with MERGE command with changes in gram.y there were some conflicts. > 2) pg_ts_config_map.h, "jsonb mapdicts" isn't decorated with > #ifdef CATALOG_VARLEN like other varlena columns in catalog. It it's > right, pls, explain and add comment. Since there is only one varlena column it is safe to use it directly. I add a related comment about it. > 3) I see changes in pg_catalog, including drop column, change its > type, change index, change function etc. Did you pay attention to > pg_upgrade? I don't see it in patch. The full-text search configuration is migrated via FTS commands such as CREATE TEXT SEARCH CONFIGURATION. The pg_upgrade uses pg_dump to create a dump of this part of the catalog where dictionary_mapping_to_text is used to get a textual representation of the FTS configuration. Correct me if I'm wrong. > 4) Seems, it could work: > ALTER TEXT SEARCH CONFIGURATION russian > ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, > word, hword, hword_part > WITH english_stem union (russian_stem, simple); > ^^^^^^^^^^^^^^^^^^^^^ simple way > instead of WITH english_stem union (case russian_stem when match then > keep else simple end); I add such ability since it was just a little fix in grammar. I also add tests for this kind of configurations. The test is a bit synthetic because I used a synonym dictionary as one which doesn't accept some input. > 4) Initial approach suggested to distinguish three state of > dictionary result: null (unknown word), stopword and usual word. Now > only two, we lost possibility to catch stopwords. One of way to use > stopwrods is: let we have to identical fts configurations, except one > skips stopwords and another doesn't. Second configuration is used for > indexing, and first one for search by default. But if we can't find > anything ('to be or to be' - phrase contains stopwords only) then we > can use second configuration. For now, we need to keep two variant of > each dictionary - with and without stopwords. But if it's possible to > distinguish stop and nonstop words in configuration then we don't > need to have duplicated dictionaries. With the proposed way to configure it is possible to create a special dictionary only for stopword checking and use it at decision-making time. For example, we can create dictionary english_stopword which will return word itself in case of stopword and NULL otherwise. With such dictionary we create a configuration: ALTER TEXT SEARCH CONFIGURATION test_cfg ALTER MAPPING FOR asciiword, word WITH CASE english_stopword WHEN NO MATCH THEN english_hunspell END; In described example, english_hunspell can be implemented without processing of stopwords at all and we can divide stopword processing and processing of other words into separate dictionaries. The key point of the patch is to process stopwords the same way as others at the level of the PostgreSQL internals and give users an instrument to process them in a special way via configurations. -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
pgsql-hackers by date: