Partial match in GIN - Mailing list pgsql-patches
From | Teodor Sigaev |
---|---|
Subject | Partial match in GIN |
Date | |
Msg-id | 47F68D87.7070009@sigaev.ru Whole thread Raw |
Responses |
Re: Partial match in GIN
Re: Partial match in GIN Re: Partial match in GIN (next vesrion) |
List | pgsql-patches |
We (Oleg and me) would like to present patch implements partial match for GIN index and two extensions which use this new feature. We hope that after short review they will be committed to CVS. This work was sponsored by EnterpriseDB. http://www.sigaev.ru/misc/partial_match_gin-0.7.gz Implements partial match for GIN. It extends interface of support function but keeps backward compatibility. The basic idea is to find first greater or equal value in index and scan sequentially until support function says stop. For each matched entry all corresponding ItemPointers are collected in TIDBitmap structure to effective merge ItemPointers from different entries. Patch introduces following changes in interface: - compare function has third (optional) argument, of boolean type, it points to kind of compare: partial or exact match. If argument is equal to 'false', function should produce comparing as usual, else function's result is treated as: = 0 - match < 0 - doesn't match but continue scan > 0 - stop scan - extractQuery function has fourth (optional) argument of bool** type. Function is responsible to allocate correct memory for that array with the same size as returning array of searching entries. if extractQuery wishs to point partial match for some entry it should set corresponding element of bool array to true. If function described above hasn't extra arguments then GIN will not be able to use partial match. http://www.sigaev.ru/misc/tsearch_prefix-0.6.gz Implements prefix search. This was one of the most wanted feature of text search. Lexeme to partial match should be labeled with asterisk: select count(*) from apod where fti @@ 'star:*'; or even select count(*) from apod where fti @@ to_tsquery('star:*'); Dictionary may set a normalized lexeme with flag (TSL_PREFIX) to point to its prefix path. Here there is a unclean issue: now tsquery has new flag to label prefix search and cstring representation has backward compatibility, but external binary hasn't it now. Now, extra byte is used for storage of this flag. In other hand, there 4 unused bits in external binary representation (in byte stores weights of lexeme), so it's possible to use one of them to store this flag. What are opinions? http://www.sigaev.ru/misc/wildspeed-0.10.tgz docs: http://mira.sai.msu.su/~megera/pgsql/pgdoc/wildspeed.html http://www.sai.msu.su/~megera/wiki/wildspeed In short, it's a contrib module that speeds up LIKE operation with any kind of expression, like 'foo%bar' or '%foo%' or even '%foo%bar'. This module is based on partial match patch of GIN. NOTICE 1: current index support of LIKE believes that only BTree can speed up LIKE and becomes confused with this module with error 'unexpected opfamily' in prefix_quals(). For this reason, partial match patch adds small check before calling expand_indexqual_opclause(). NOTICE 2: it seems to me, that similar technique could be implemented for ordinary BTree to eliminate hack around LIKE support. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
pgsql-patches by date: