>> 1. What is the meaning of such a query operator?
>>
>> foo #5 bar -> true if the document has word "foo" followed by "bar" at
>> 5th position.
>>
>> foo #<5 bar -> true if document has word "foo" followed by "bar" with in
>> 5 positions
>>
>> foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
>> positions
Sounds good, but, may be it's an overkill.
>> etc .....
>>
>> 2. How to implement such query operators?
>>
>> Should we modify QueryItem to include additional distance information or
>> is there any other way to accomplish it?
>>
>> Is the following list sufficient to accomplish this?
>> a. Modify to_tsquery
>> b. Modify TS_execute in tsvector_op.c to check new operator
Exactly
>>
>> Is there anything needed in rewrite subsystem?
Yes, of course - rewrite system should support that operation.
>>
>> 3. Are these valid uses of the operators and if yes what would they
>> mean?
>>
>> foo #5 (bar & cup)
It must support! Because of lexize might return subtsquery. For example,
russian ispell can return several lexemes: "adfg" can become a 'adf | adfs |
ad', norwegian and german languages are more complicated: "abc" -> " (ab & c) |
(a & bc) | abc"
>> 4. If the operator only applies to two query items can we create an
>> index such that (foo, bar)-> documents[min distance, max distance]
>> How difficult it is to implement an index like this?
No, index should execute query 'foo & bar' and mark recheck flag to true to
execute 'foo #<5 bar' on original tsvector from table.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/