B-Tree emulation for GIN - Mailing list pgsql-hackers

From Teodor Sigaev
Subject B-Tree emulation for GIN
Date
Msg-id 490B3752.3010800@sigaev.ru
Whole thread Raw
Responses Re: B-Tree emulation for GIN
List pgsql-hackers
We'd like to improve usefulness  of a new multicolumn feature of GIN index (
already in CVS HEAD ), which is currently a bit limited because of lack of GIN
support for most data types. For example, it'd be very useful to combine native
GIN-supported data types, like text search, hstore, arrays with other ones, for
example, date, integers, etc. Right now, in the best case, we'll have BitmapAnd
  scan over *two* indexes  - GIN and btree.

We suggest to emulate btree index by GIN to make possible to use only one index
  scan. For that reason we developed btree_gin contrib module, as we do
btree_gist for GIN index. Current implementation provides support for int4 and
text data types, but it's straightforward and easy to add support for other data
types.

However, GIN interface should be extended for effective realization of such
emulation. We've already changed GIN interface, but it was done during 8.4
development cycle, so the new changes can be done safely.

New interface (only changed functions is shown):
Datum *extractQuery(Datum query, int32 *nkeys,
         StrategyNumber n, bool *pmatch[],
    void ** extra_data[])
bool consistent(bool check[],
    StrategyNumber n, Datum query, bool *recheck,
    void *extra_data[])
int comparePartial(Datum partial_key, Datum key, void *extra_data)

extractQuery() might allocate an array (with nkeys elements) of any data,
GIN provides that array as an 5-th argument for consistent() and corresponding
element for comparePartial().

Patch also includes changes in pg_trgm to cache the number of query's trigrams
in extra_data instead of fcinfo->flinfo->fn_extra usage in consistent(), to
improve situation when the same index can be used for several conditions, like (
foo.bar % 'book' and foo.bar 'look' )

TODO/issues
- add support of other data types to btree_gin
- current bulk insert algorithm (used in ginbuild and fast_insert_gin patch)
doesn't optimized for ordered or near ordered table, so time of btree_gin's
index creation is much worse than that for btree.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Changing the result of ExecutorRun
Next
From: "Hitoshi Harada"
Date:
Subject: Re: Window Functions: patch for CommitFest:Nov.