Partial match in GIN - Mailing list pgsql-patches

From Teodor Sigaev
Subject Partial match in GIN
Date
Msg-id 47F68D87.7070009@sigaev.ru
Whole thread Raw
Responses Re: Partial match in GIN  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: Partial match in GIN  (Gregory Stark <stark@enterprisedb.com>)
Re: Partial match in GIN (next vesrion)  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-patches
We (Oleg and me) would like to present patch implements partial match for GIN
index and two extensions which use this new feature. We hope that after short
review they will be committed to CVS.

This work was sponsored by EnterpriseDB.

http://www.sigaev.ru/misc/partial_match_gin-0.7.gz
Implements partial match for GIN. It extends interface of support function but
keeps backward compatibility. The basic idea is to find first greater or equal
value in index and scan sequentially until support function says stop. For each
matched entry all corresponding ItemPointers are collected in TIDBitmap
structure to effective merge ItemPointers from different entries. Patch
introduces following changes in interface:
  - compare function has third (optional) argument, of boolean type, it points to
    kind of compare: partial or exact match. If argument is equal to 'false',
    function should produce comparing as usual, else function's result is
    treated as:
        = 0  - match
        < 0  - doesn't match but continue scan
        > 0  - stop scan
  - extractQuery function has fourth (optional) argument of bool** type. Function
    is responsible to allocate correct memory for that array with the same size
    as returning array of searching entries. if extractQuery wishs to point
    partial match for some entry it should set corresponding element of bool
    array to true.

If function described above hasn't extra arguments then GIN will not be able to
use partial match.

http://www.sigaev.ru/misc/tsearch_prefix-0.6.gz
Implements prefix search. This was one of the most wanted feature of text
search. Lexeme to partial match should be labeled with asterisk:

select count(*) from apod where fti @@ 'star:*';
or even
select count(*) from apod where fti @@ to_tsquery('star:*');

Dictionary may set a normalized lexeme with flag (TSL_PREFIX) to point to its
prefix path.

Here there is a unclean issue: now tsquery has new flag to label prefix search
and cstring representation has backward compatibility, but external binary
hasn't it now. Now, extra byte is used for storage of this flag. In other hand,
there 4 unused bits in external binary representation (in byte stores weights of
lexeme), so it's possible to use one of them to store this flag. What are opinions?

http://www.sigaev.ru/misc/wildspeed-0.10.tgz
docs: http://mira.sai.msu.su/~megera/pgsql/pgdoc/wildspeed.html
       http://www.sai.msu.su/~megera/wiki/wildspeed
In short, it's a contrib module that speeds up LIKE operation with any kind of
expression, like 'foo%bar' or '%foo%' or even '%foo%bar'. This module is based
on partial match patch of GIN.

NOTICE 1: current index support of LIKE believes that only BTree can speed up
LIKE and becomes confused with this module with error 'unexpected opfamily' in
prefix_quals(). For this reason, partial match patch adds small check before
calling expand_indexqual_opclause().

NOTICE 2: it seems to me, that similar technique could be implemented for
ordinary BTree to eliminate hack around LIKE support.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Expose checkpoint start/finish times into SQL.
Next
From: Tom Lane
Date:
Subject: Re: Replace offnum++ by OffsetNumberNext