Hi Hackers,
Although PostgreSQL is capable of performing some FTS (full text search)
queries, there's still a room for improvement. Phrase search support could
become a great addition to the existing set of features.
Introduction
============
It is no secret that one can make Google search for an exact phrase (instead
of an unordered lexeme set) simply by enclosing it within double quotes. This
is a really nice feature which helps to save the time that would otherwise be
spent on annoying result filtering.
One weak spot of the current FTS implementation is that there is no way to
specify the desired lexeme order (since it would not make any difference at
all). In the end, the search engine will look for each lexeme individually,
which means that a hypothetical end user would have to discard documents not
including search phrase all by himself. This problem is solved by the patch
below (should apply cleanly to 61ce1e8f1).
Problem description
===================
The problem comes from the lack of lexeme ordering operator. Consider the
following example:
select q @@ plainto_tsquery('fatal error') from
unnest(array[to_tsvector('fatal error'), to_tsvector('error is not fatal')])
as q;
?column?
----------
t
t
(2 rows)
Clearly the latter match is not the best result in case we wanted to find
exactly the "fatal error" phrase. That's when the need for a lexeme ordering
operator arises:
select q @@ to_tsquery('fatal ? error') from unnest(array[to_tsvector('fatal
error'), to_tsvector('error is not fatal')]) as q;
?column?
----------
t
f
(2 rows)
Implementation
==============
The ? (FOLLOWED BY) binary operator takes form of "?" or "?[N]" where 0 <= N <
~16K. If N is provided, the distance between left and right operands must be
no greater that N. For example:
select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres
? (severe ? damage)');
?column?
----------
f
(1 row)
select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres
?[4] (severe ? damage)');
?column?
----------
t
(1 row)
New function phraseto_tsquery([ regconfig, ] text) takes advantage of the "?
[N]" operator in order to facilitate phrase search:
select to_tsvector('postgres has taken severe damage') @@
phraseto_tsquery('severely damaged');
?column?
----------
t
(1 row)
This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
2009, so all credit goes to them. Any feedback is welcome.
--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company