Re: phrase search - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: phrase search
Date
Msg-id 488629FB.2030501@sigaev.ru
Whole thread Raw
In response to Re: phrase search  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
>> 1. What is the meaning of such a query operator?
>>
>> foo #5 bar -> true if the document has word "foo" followed by "bar" at
>> 5th position.
>>
>> foo #<5 bar -> true if document has word "foo" followed by "bar" with in
>> 5 positions
>>
>> foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
>> positions

Sounds good, but, may be it's an overkill.

>> etc .....
>>
>> 2. How to implement such query operators?
>>
>> Should we modify QueryItem to include additional distance information or
>> is there any other way to accomplish it?
>>
>> Is the following list sufficient to accomplish this?
>> a. Modify to_tsquery
>> b. Modify TS_execute in tsvector_op.c to check new operator
Exactly

>>
>> Is there anything needed in rewrite subsystem?
Yes, of course - rewrite system should support that operation.

>>
>> 3. Are these valid uses of the operators and if yes what would they
>> mean?
>>
>> foo #5 (bar & cup)
It must support!  Because of lexize might return subtsquery. For example, 
russian ispell can return several lexemes:  "adfg" can become  a 'adf | adfs | 
ad', norwegian and german languages are more complicated: "abc" -> " (ab & c) | 
(a & bc) | abc"


>> 4. If the operator only applies to two query items can we create an
>> index such that (foo, bar)-> documents[min distance, max distance]
>> How difficult it is to implement an index like this?
No, index should execute query 'foo & bar' and mark recheck flag to true to 
execute 'foo #<5 bar' on original tsvector from table.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: Plans for 8.4
Next
From: Shane Ambler
Date:
Subject: Re: Do we really want to migrate plproxy and citext into PG core distribution?