Re: new function for tsquery creartion - Mailing list pgsql-hackers
From | Aleksandr Parfenov |
---|---|
Subject | Re: new function for tsquery creartion |
Date | |
Msg-id | 20180403171320.400cd24a@asp437-24-g082ur Whole thread Raw |
In response to | Re: new function for tsquery creartion (Dmitry Ivanov <d.ivanov@postgrespro.ru>) |
Responses |
Re: new function for tsquery creartion
|
List | pgsql-hackers |
On Tue, 03 Apr 2018 14:28:37 +0300 Dmitry Ivanov <d.ivanov@postgrespro.ru> wrote: > I'm sorry, I totally forgot to fix a few more things, the patch is > attached below. The patch looks good to me except two things. I'm not sure about the different result for these queries: SELECT websearch_to_tsquery('simple', 'cat or '); websearch_to_tsquery ---------------------- 'cat' (1 row) SELECT websearch_to_tsquery('simple', 'cat or'); websearch_to_tsquery ---------------------- 'cat' & 'or' (1 row) But I don't have strong opinion about these queries, since input in both of them looks broken in terms of operator usage. I found an odd behavior of the query creation function in case: SELECT websearch_to_tsquery('english', '"pg_class pg"'); websearch_to_tsquery ----------------------------- ( 'pg' & 'class' ) <-> 'pg' (1 row) This query means that lexemes 'pg' and 'class' should be at the same distance from the last 'pg'. In other words, they should have the same position. But default parser will interpret pg_class as two separate words during text parsing/vector creation. The bug wasn't introduced in the patch and can be found in current master. During the discussion of the patch with Dmitry, he noticed that to_tsquery() function shares same odd behavior: select to_tsquery('english', ' pg_class <-> pg'); to_tsquery ----------------------------- ( 'pg' & 'class' ) <-> 'pg' (1 row) This oddity caused by they implementation of makepol. In makepol, each token (parsed by query parser) is sent to FTS parser and in case of further separation of the token, it uses operator selected in functions to_tsquery and friends. So it doesn't change over the runtime. I see two different ways to solve it: 1) Use the same operator inside the parenthesizes. This will mean to parse it as few parts of one word. 2) Remove parenthesizes. This will mean to parse it as few separate words. I prefer the second way since in some languages words can be separated by some special symbol or not separated by any symbols at all and should be extracted by special FTS parser. It also allows us to parse such words as one by using the special parser (as it done for hyphenated word). But in the example with websearch_to_tsquery, I think it should use the same operator for quoted part of the query. For example, we can update the operator in makepol before sending it to pushval (pushval_morph) to do so. It looks like there should be two separated patches, one for websearch_to_tsquery and another one for fixing odd behavior of the query construction. But since the first one may depend on the bugfix, to solve case with quotes, I will mark it as Waiting on Author. -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
pgsql-hackers by date: