pg_trgm partial-match - Mailing list pgsql-hackers

From Fujii Masao
Subject pg_trgm partial-match
Date
Msg-id CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com
Whole thread Raw
Responses Re: pg_trgm partial-match  (Tomas Vondra <tv@fuzzy.cz>)
Re: pg_trgm partial-match  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hi,

I'd like to propose to extend pg_trgm so that it can compare a partial-match
query key to a GIN index. IOW, I'm thinking to implement the 'comparePartial'
GIN method for pg_trgm.

Currently, when the query key is less than three characters, we cannot use
a GIN index (+ pg_trgm) efficiently, because pg_trgm doesn't support a
partial-match method. In this case, seq scan or index full scan would be
executed, and its response time would be very slow. I'd like to alleviate this
problem.

Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
i.e., if query key contains multibyte characters. In this case, byte length of
the trigram string might be larger than three, and its CRC is used as a
trigram key instead of the trigram string itself. Because of using CRC, we
cannot do a partial-match. Attached patch extends pg_trgm so that it
compares a partial-match query key only when KEEPONLYALNUM is
enabled.

Attached patch is WIP yet. What I should do next is:

* version up pg_trgm from 1.0 to 1.1, i.e., create pg_trgm--1.1.sql, etc.
* write the regression test

Comments? Review? Objection?

Regards,

--
Fujii Masao

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: tuplesort memory usage: grow_memtuples
Next
From: Robert Haas
Date:
Subject: Re: Dumping an Extension's Script